RREME: Regulatory RNA Element Motif Explorer

Project Summary

RREME (Regulatory RNA Element Motif Explorer) is a CGI-based web application for detecting post-transcriptional regulatory elements in RNA sequences. Users may input transcript names (with autocomplete) or raw RNA sequences. The backend scans for matches to AU-rich elements (AREs) and RNA-binding protein (RBP) motifs from the ATtRACT database using regular expressions and position weight matrices (PWMs).

The results are stored in a MySQL database using a Chado-inspired schema and displayed via Jinja2-rendered HTML. The system is designed for extensibility, with planned additions including microRNA motif support from miRBase, m6A data from REPIC, and additional annotations from AREsite2.

Features

PWM and regex motif scanning using curated ATtRACT matrices
Schema-aware use of controlled vocabulary via Chado-style tables (cv, cvterm)
CGI interface with jQuery-based autocomplete and AJAX
Jinja2 templates for dynamic result rendering
SQLAlchemy ORM for database access and integrity
Modular loading architecture for transcripts, motifs, and vocabularies

Technologies Used

Python 3 (CGI scripts, ORM, utilities)
JavaScript (jQuery, autocomplete)
MySQL (InnoDB, Chado-style schema)
Apache2 with CGI enabled
Jinja2 for template rendering

Requirements

Python ≥ 3.8
MySQL
Apache2 (with CGI enabled)
Python packages in requirements.txt:
- sqlalchemy
- mysql-connector-python
- jinja2

Directory Structure

.
├── README.md
├── cgi-bin
│   ├── autofill_suggestions.cgi
│   ├── results.cgi
│   ├── rreme_search.cgi
│   └── test.cgi
├── create_orm_tables.py
├── data
├── db
│   ├── abook3_chado_dump.sql
│   ├── abook3_chado_dump_innodb.sql
│   ├── rreme_db_protein_transcript_backup.sql
│   └── rreme_full_backup.sql  # This is the main backup
├── docs
├── requirements.txt
├── rreme
│   ├── **init**.py
│   ├── loaders
│   │   ├── **init**.py
│   │   ├── attract_loader.py
│   │   ├── chromosome_loader.py
│   │   ├── clear_ncbi_features.py
│   │   ├── feature_loader.py
│   │   ├── organism_loader.py
│   │   ├── pwm_loader.py
│   │   ├── sequence_loader.py
│   │   ├── vocab_loader.py
│   │   └── xref_loader.py
│   ├── models
│   │   ├── **init**.py
│   │   ├── attract_motif.py
│   │   ├── base.py
│   │   ├── cv.py
│   │   ├── cvterm.py
│   │   ├── feature.py
│   │   ├── feature_loc.py
│   │   ├── feature_prop.py
│   │   ├── feature_relationship.py
│   │   ├── gene_id_mapping.py
│   │   ├── motif_mapping.py
│   │   ├── organism.py
│   │   ├── rbp_pwm.py
│   │   ├── scan_job.py
│   │   └── xref_models.py
│   └── utils
│       ├── **init**.py
│       ├── db_utils.py
│       └── motif_utils.py
├── scripts
│   ├── **init**.py
│   ├── additional_scripts
│   │   ├── annotate_features_with_motifs.py
│   │   ├── load_gff3_features.py
│   │   └── prepopulate_cvterms.py
│   ├── create_test_scanjob.py
│   ├── load_chr_to_chado.sh
│   ├── prepare_chromosome_gbk.py
│   └── rreme_load_gbk.pl
├── static
│   ├── css
│   │   └── search.css
│   └── js
│       └── search.js
└── templates\
      ├── results.html
      └── search.html

ScanJob Architecture

The RREME motif scanning pipeline is driven by the ScanJob and MotifMapping ORM models and executed through the scanner.py dispatcher script. Each user-submitted query creates a new ScanJob entry that stores the input sequence or transcript ID, motif types to scan (e.g., "rbp", "are"), and job status (PENDING, RUNNING, DONE, or ERROR).

Scan Workflow

Job Creation
A user submits a query via the CGI form. This creates a new ScanJob in the database and marks it as PENDING.
Job Dispatcher
The scanner.py script polls for pending jobs and executes them one at a time. It:
- Retrieves the sequence from the feature ID or directly from the input_value
- Dispatches motif scans based on motif_types
- Records motif hits as rows in the motif_mapping table
- Updates job status to DONE or ERROR
Motif Mapping
Each detected motif is stored as a MotifMapping instance, which includes:
- Coordinates on the sequence (start, end)
- Associated matrix_id (PWM or motif label)
- Strand orientation and optional score
- Source tag (e.g., "ATtRACT", "ARE", "miRNA")

Supported Motif Types

rbp: Uses PWM matrices from rbp_pwm, linked via attract_motif
are: Regex search for canonical AU-rich sequences (e.g., UUAUUUAUU)
mirna, m6a: Reserved for future extension

Relevant ORM Models

ScanJob: Captures each user query and tracks processing status
MotifMapping: Records each motif hit and links it to the relevant ScanJob and feature
scanner.py: Job runner that executes scans, handles errors, and updates the database

This modular scanning framework supports concurrent and queued execution and allows extension to additional motif classes with minimal architectural changes.

Getting Started

Clone the repository:

git clone https://github.com/SecondBook5/RREME_Project.git
cd RREME_Project

Set up the Python environment and install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Create the database and load the schema:

mysql -u your_user -p
CREATE DATABASE rreme_db;
USE rreme_db;
SOURCE db/abook3_chado_dump.sql;

Populate the database with motifs and transcript features:

python rreme/loaders/attract_loader.py
python rreme/loaders/feature_loader.py
python rreme/loaders/pwm_loader.py

Launch the web interface (Apache2 must serve files from /var/www/html/abook3/rreme_project/ and execute the CGI scripts in cgi-bin).

Current Status

Web interface functional
ARE and ATtRACT RBP motif scanning implemented
Transcript autocomplete via indexed MySQL query
Chado-style schema mapped via SQLAlchemy
miRNA and m6A motif support under development

References

Giudice G. et al. (2016). ATtRACT—a database of RNA-binding proteins and associated motifs. Database, 2016, baw035. https://doi.org/10.1093/database/baw035
Gebauer F. et al. (2021). RNA-binding proteins in human genetic disease. Nat Rev Genet 22, 185–198. https://doi.org/10.1038/s41576-020-00302-y
GMOD Chado Schema: https://gmod.org/wiki/Chado

Author

Developed by AJ Book for the final project in 410.712 Advanced Practical Computer Concepts for Bioinformatics at Johns Hopkins University.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RREME: Regulatory RNA Element Motif Explorer

Project Summary

Features

Technologies Used

Requirements

Directory Structure

ScanJob Architecture

Scan Workflow

Supported Motif Types

Relevant ORM Models

Getting Started

Current Status

References

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
cgi-bin		cgi-bin
db		db
docs		docs
rreme		rreme
scripts		scripts
static		static
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
create_orm_tables.py		create_orm_tables.py
requirements.txt		requirements.txt
rreme_db.png		rreme_db.png

SecondBook5/RREME_Project

Folders and files

Latest commit

History

Repository files navigation

RREME: Regulatory RNA Element Motif Explorer

Project Summary

Features

Technologies Used

Requirements

Directory Structure

ScanJob Architecture

Scan Workflow

Supported Motif Types

Relevant ORM Models

Getting Started

Current Status

References

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages