Skip to content

RREME: Regulatory RNA Element Motif Explorer | Final Project for AS.410.712 Advanced Practical Computer Concepts for Bioinformatics

Notifications You must be signed in to change notification settings

SecondBook5/RREME_Project

Repository files navigation

RREME: Regulatory RNA Element Motif Explorer

Project Summary

RREME (Regulatory RNA Element Motif Explorer) is a CGI-based web application for detecting post-transcriptional regulatory elements in RNA sequences. Users may input transcript names (with autocomplete) or raw RNA sequences. The backend scans for matches to AU-rich elements (AREs) and RNA-binding protein (RBP) motifs from the ATtRACT database using regular expressions and position weight matrices (PWMs).

The results are stored in a MySQL database using a Chado-inspired schema and displayed via Jinja2-rendered HTML. The system is designed for extensibility, with planned additions including microRNA motif support from miRBase, m6A data from REPIC, and additional annotations from AREsite2.

Features

  • PWM and regex motif scanning using curated ATtRACT matrices
  • Schema-aware use of controlled vocabulary via Chado-style tables (cv, cvterm)
  • CGI interface with jQuery-based autocomplete and AJAX
  • Jinja2 templates for dynamic result rendering
  • SQLAlchemy ORM for database access and integrity
  • Modular loading architecture for transcripts, motifs, and vocabularies

Technologies Used

  • Python 3 (CGI scripts, ORM, utilities)
  • JavaScript (jQuery, autocomplete)
  • MySQL (InnoDB, Chado-style schema)
  • Apache2 with CGI enabled
  • Jinja2 for template rendering

Requirements

  • Python ≥ 3.8
  • MySQL
  • Apache2 (with CGI enabled)
  • Python packages in requirements.txt:
    • sqlalchemy
    • mysql-connector-python
    • jinja2

Directory Structure

.
├── README.md
├── cgi-bin
│   ├── autofill_suggestions.cgi
│   ├── results.cgi
│   ├── rreme_search.cgi
│   └── test.cgi
├── create_orm_tables.py
├── data
├── db
│   ├── abook3_chado_dump.sql
│   ├── abook3_chado_dump_innodb.sql
│   ├── rreme_db_protein_transcript_backup.sql
│   └── rreme_full_backup.sql  # This is the main backup
├── docs
├── requirements.txt
├── rreme
│   ├── **init**.py
│   ├── loaders
│   │   ├── **init**.py
│   │   ├── attract_loader.py
│   │   ├── chromosome_loader.py
│   │   ├── clear_ncbi_features.py
│   │   ├── feature_loader.py
│   │   ├── organism_loader.py
│   │   ├── pwm_loader.py
│   │   ├── sequence_loader.py
│   │   ├── vocab_loader.py
│   │   └── xref_loader.py
│   ├── models
│   │   ├── **init**.py
│   │   ├── attract_motif.py
│   │   ├── base.py
│   │   ├── cv.py
│   │   ├── cvterm.py
│   │   ├── feature.py
│   │   ├── feature_loc.py
│   │   ├── feature_prop.py
│   │   ├── feature_relationship.py
│   │   ├── gene_id_mapping.py
│   │   ├── motif_mapping.py
│   │   ├── organism.py
│   │   ├── rbp_pwm.py
│   │   ├── scan_job.py
│   │   └── xref_models.py
│   └── utils
│       ├── **init**.py
│       ├── db_utils.py
│       └── motif_utils.py
├── scripts
│   ├── **init**.py
│   ├── additional_scripts
│   │   ├── annotate_features_with_motifs.py
│   │   ├── load_gff3_features.py
│   │   └── prepopulate_cvterms.py
│   ├── create_test_scanjob.py
│   ├── load_chr_to_chado.sh
│   ├── prepare_chromosome_gbk.py
│   └── rreme_load_gbk.pl
├── static
│   ├── css
│   │   └── search.css
│   └── js
│       └── search.js
└── templates\
      ├── results.html
      └── search.html

ScanJob Architecture

The RREME motif scanning pipeline is driven by the ScanJob and MotifMapping ORM models and executed through the scanner.py dispatcher script. Each user-submitted query creates a new ScanJob entry that stores the input sequence or transcript ID, motif types to scan (e.g., "rbp", "are"), and job status (PENDING, RUNNING, DONE, or ERROR).

Scan Workflow

  1. Job Creation
    A user submits a query via the CGI form. This creates a new ScanJob in the database and marks it as PENDING.

  2. Job Dispatcher
    The scanner.py script polls for pending jobs and executes them one at a time. It:

    • Retrieves the sequence from the feature ID or directly from the input_value
    • Dispatches motif scans based on motif_types
    • Records motif hits as rows in the motif_mapping table
    • Updates job status to DONE or ERROR
  3. Motif Mapping
    Each detected motif is stored as a MotifMapping instance, which includes:

    • Coordinates on the sequence (start, end)
    • Associated matrix_id (PWM or motif label)
    • Strand orientation and optional score
    • Source tag (e.g., "ATtRACT", "ARE", "miRNA")

Supported Motif Types

  • rbp: Uses PWM matrices from rbp_pwm, linked via attract_motif
  • are: Regex search for canonical AU-rich sequences (e.g., UUAUUUAUU)
  • mirna, m6a: Reserved for future extension

Relevant ORM Models

  • ScanJob: Captures each user query and tracks processing status
  • MotifMapping: Records each motif hit and links it to the relevant ScanJob and feature
  • scanner.py: Job runner that executes scans, handles errors, and updates the database

This modular scanning framework supports concurrent and queued execution and allows extension to additional motif classes with minimal architectural changes.

alt text

Getting Started

  1. Clone the repository:
git clone https://github.com/SecondBook5/RREME_Project.git
cd RREME_Project
  1. Set up the Python environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  1. Create the database and load the schema:
mysql -u your_user -p
CREATE DATABASE rreme_db;
USE rreme_db;
SOURCE db/abook3_chado_dump.sql;
  1. Populate the database with motifs and transcript features:
python rreme/loaders/attract_loader.py
python rreme/loaders/feature_loader.py
python rreme/loaders/pwm_loader.py
  1. Launch the web interface (Apache2 must serve files from /var/www/html/abook3/rreme_project/ and execute the CGI scripts in cgi-bin).

Current Status

  • Web interface functional
  • ARE and ATtRACT RBP motif scanning implemented
  • Transcript autocomplete via indexed MySQL query
  • Chado-style schema mapped via SQLAlchemy
  • miRNA and m6A motif support under development

References

Author

Developed by AJ Book for the final project in 410.712 Advanced Practical Computer Concepts for Bioinformatics at Johns Hopkins University.

About

RREME: Regulatory RNA Element Motif Explorer | Final Project for AS.410.712 Advanced Practical Computer Concepts for Bioinformatics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published