RREME (Regulatory RNA Element Motif Explorer) is a CGI-based web application for detecting post-transcriptional regulatory elements in RNA sequences. Users may input transcript names (with autocomplete) or raw RNA sequences. The backend scans for matches to AU-rich elements (AREs) and RNA-binding protein (RBP) motifs from the ATtRACT database using regular expressions and position weight matrices (PWMs).
The results are stored in a MySQL database using a Chado-inspired schema and displayed via Jinja2-rendered HTML. The system is designed for extensibility, with planned additions including microRNA motif support from miRBase, m6A data from REPIC, and additional annotations from AREsite2.
- PWM and regex motif scanning using curated ATtRACT matrices
- Schema-aware use of controlled vocabulary via Chado-style tables (
cv
,cvterm
) - CGI interface with jQuery-based autocomplete and AJAX
- Jinja2 templates for dynamic result rendering
- SQLAlchemy ORM for database access and integrity
- Modular loading architecture for transcripts, motifs, and vocabularies
- Python 3 (CGI scripts, ORM, utilities)
- JavaScript (jQuery, autocomplete)
- MySQL (InnoDB, Chado-style schema)
- Apache2 with CGI enabled
- Jinja2 for template rendering
- Python ≥ 3.8
- MySQL
- Apache2 (with CGI enabled)
- Python packages in
requirements.txt
:sqlalchemy
mysql-connector-python
jinja2
.
├── README.md
├── cgi-bin
│ ├── autofill_suggestions.cgi
│ ├── results.cgi
│ ├── rreme_search.cgi
│ └── test.cgi
├── create_orm_tables.py
├── data
├── db
│ ├── abook3_chado_dump.sql
│ ├── abook3_chado_dump_innodb.sql
│ ├── rreme_db_protein_transcript_backup.sql
│ └── rreme_full_backup.sql # This is the main backup
├── docs
├── requirements.txt
├── rreme
│ ├── **init**.py
│ ├── loaders
│ │ ├── **init**.py
│ │ ├── attract_loader.py
│ │ ├── chromosome_loader.py
│ │ ├── clear_ncbi_features.py
│ │ ├── feature_loader.py
│ │ ├── organism_loader.py
│ │ ├── pwm_loader.py
│ │ ├── sequence_loader.py
│ │ ├── vocab_loader.py
│ │ └── xref_loader.py
│ ├── models
│ │ ├── **init**.py
│ │ ├── attract_motif.py
│ │ ├── base.py
│ │ ├── cv.py
│ │ ├── cvterm.py
│ │ ├── feature.py
│ │ ├── feature_loc.py
│ │ ├── feature_prop.py
│ │ ├── feature_relationship.py
│ │ ├── gene_id_mapping.py
│ │ ├── motif_mapping.py
│ │ ├── organism.py
│ │ ├── rbp_pwm.py
│ │ ├── scan_job.py
│ │ └── xref_models.py
│ └── utils
│ ├── **init**.py
│ ├── db_utils.py
│ └── motif_utils.py
├── scripts
│ ├── **init**.py
│ ├── additional_scripts
│ │ ├── annotate_features_with_motifs.py
│ │ ├── load_gff3_features.py
│ │ └── prepopulate_cvterms.py
│ ├── create_test_scanjob.py
│ ├── load_chr_to_chado.sh
│ ├── prepare_chromosome_gbk.py
│ └── rreme_load_gbk.pl
├── static
│ ├── css
│ │ └── search.css
│ └── js
│ └── search.js
└── templates\
├── results.html
└── search.html
The RREME motif scanning pipeline is driven by the ScanJob
and MotifMapping
ORM models and executed through the scanner.py
dispatcher script. Each user-submitted query creates a new ScanJob
entry that stores the input sequence or transcript ID, motif types to scan (e.g., "rbp"
, "are"
), and job status (PENDING
, RUNNING
, DONE
, or ERROR
).
-
Job Creation
A user submits a query via the CGI form. This creates a newScanJob
in the database and marks it asPENDING
. -
Job Dispatcher
Thescanner.py
script polls for pending jobs and executes them one at a time. It:- Retrieves the sequence from the feature ID or directly from the
input_value
- Dispatches motif scans based on
motif_types
- Records motif hits as rows in the
motif_mapping
table - Updates job status to
DONE
orERROR
- Retrieves the sequence from the feature ID or directly from the
-
Motif Mapping
Each detected motif is stored as aMotifMapping
instance, which includes:- Coordinates on the sequence (
start
,end
) - Associated
matrix_id
(PWM or motif label) - Strand orientation and optional score
- Source tag (e.g.,
"ATtRACT"
,"ARE"
,"miRNA"
)
- Coordinates on the sequence (
rbp
: Uses PWM matrices fromrbp_pwm
, linked viaattract_motif
are
: Regex search for canonical AU-rich sequences (e.g.,UUAUUUAUU
)mirna
,m6a
: Reserved for future extension
ScanJob
: Captures each user query and tracks processing statusMotifMapping
: Records each motif hit and links it to the relevantScanJob
andfeature
scanner.py
: Job runner that executes scans, handles errors, and updates the database
This modular scanning framework supports concurrent and queued execution and allows extension to additional motif classes with minimal architectural changes.
- Clone the repository:
git clone https://github.com/SecondBook5/RREME_Project.git
cd RREME_Project
- Set up the Python environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Create the database and load the schema:
mysql -u your_user -p
CREATE DATABASE rreme_db;
USE rreme_db;
SOURCE db/abook3_chado_dump.sql;
- Populate the database with motifs and transcript features:
python rreme/loaders/attract_loader.py
python rreme/loaders/feature_loader.py
python rreme/loaders/pwm_loader.py
- Launch the web interface (Apache2 must serve files from
/var/www/html/abook3/rreme_project/
and execute the CGI scripts incgi-bin
).
- Web interface functional
- ARE and ATtRACT RBP motif scanning implemented
- Transcript autocomplete via indexed MySQL query
- Chado-style schema mapped via SQLAlchemy
- miRNA and m6A motif support under development
- Giudice G. et al. (2016). ATtRACT—a database of RNA-binding proteins and associated motifs. Database, 2016, baw035. https://doi.org/10.1093/database/baw035
- Gebauer F. et al. (2021). RNA-binding proteins in human genetic disease. Nat Rev Genet 22, 185–198. https://doi.org/10.1038/s41576-020-00302-y
- GMOD Chado Schema: https://gmod.org/wiki/Chado
Developed by AJ Book for the final project in 410.712 Advanced Practical Computer Concepts for Bioinformatics at Johns Hopkins University.