Open Microbial Community — git-native, AI-assisted scientific publishing for microbial ecology.
How it works: Point at an SRA accession → select analyses → pipelines run on HPC → AI drafts manuscript → peer review via GitHub PRs.
pip install -r portal/requirements.txt
cd portal
cp .env.example .env # edit as needed
uvicorn app.main:app --reload --port 8002Visit http://localhost:8002. In dev mode (DEBUG=true), GitHub OAuth is bypassed — you're auto-logged in as dev-user.
AI modules (manuscript generation, review agents, author interview) use any OpenAI-compatible API:
- Install LM Studio
- Load a model (we use
qwen3-coder-30b) - Start the server on port 1234
Configure LLM_BASE_URL, LLM_API_KEY, and LLM_MODEL in .env for cloud providers.
portal/ # FastAPI web application
├── app/
│ ├── main.py # App entry, route registration
│ ├── config.py # Pydantic settings from .env
│ ├── database.py # SQLAlchemy async models (User, Submission)
│ ├── auth.py # GitHub OAuth + dev mode bypass
│ ├── submissions.py # Submission CRUD + accession lookup
│ ├── reviews.py # AI review + manuscript generation endpoints
│ ├── interviews.py # Author interview flow
│ ├── metadata.py # AI metadata assistant endpoints
│ ├── sra_metadata.py # NCBI Entrez: accession resolution + metadata
│ ├── slurm.py # SLURM job submission via OpenSSH ControlMaster
│ ├── pipeline_processing.py # Post-pipeline: AI draft → GitHub repo
│ ├── github_integration.py # Repo creation, file commits, review PRs
│ └── github_app_auth.py # GitHub App JWT auth with PAT fallback
├── templates/ # Jinja2 HTML templates
└── static/ # CSS
ai/ # AI modules (OpenAI-compatible API)
├── llm_client.py # Shared client — LM Studio or cloud
├── manuscript_generator.py # Section generation + citation resolution
├── review_agents.py # Statistical, methodological, clarity review
├── author_interview.py # Conversational interview
├── metadata_assistant.py # SRA metadata preparation
├── citation_resolver.py # [CITE] → PubMed → inline citations + BibTeX
├── pubmed_search.py # Direct NCBI E-utilities search
├── figure_generator.py # Plotly JSON from pipeline outputs
└── pipeline_parser.py # Parse outputs (nanopore_mag, illumina_mag, rnaseq, isolate_genome)
templates/paper-repo/ # Quarto manuscript template for generated papers
tests/ # pytest test suite
- Enter any NCBI accession (PRJNA, SRR, SAMN, etc.) → resolves to parent BioProject
- Select data types from breakdown table (platform/strategy/source/layout combos)
- Pipeline auto-selected from library tags (AMPLICON → microscape, WGS → illumina_mag, etc.)
- Author interview → AI gathers research context via conversation
- Submit to HPC → Nextflow pipeline via SLURM on Alliance Canada
- AI manuscript draft → generated from pipeline outputs + interview data
- Citations resolved via PubMed E-utilities
- Figures generated as Plotly JSON
- GitHub paper repo →
open-community-science/micro-NNNN- Quarto manuscript auto-rendered to HTML + PDF via GitHub Actions
.omc/provenance directory (interview transcripts, first drafts, metadata)
- AI peer review → statistical, methodological, clarity agents create review PRs
- Org:
open-community-science - GitHub App: "OMC Platform" — handles both bot operations (JWT → installation tokens) and user OAuth login
- Falls back to PAT if App not configured
- Paper repos:
micro-NNNN— org-owned, authors fork if desired - Reviews: Each review type (statistical, methodological, clarity) creates its own PR
- Training data:
.omc/directory in each paper repo stores AI interaction history — the diff betweenmanuscript_v1.jsonand the final manuscript is labeled training data
| Pipeline | Strategy | Parser |
|---|---|---|
microscape |
Illumina amplicon (16S/ITS) | Planned |
nanopore_mag |
Long-read MAG assembly | Implemented |
illumina_mag |
Short-read MAG assembly | Implemented |
rnaseq |
RNA-Seq differential expression | Implemented |
isolate_genome |
Isolate genome assembly | Implemented |
All settings via environment variables or portal/.env:
| Variable | Default | Description |
|---|---|---|
DEBUG |
false |
Dev mode (auto-login, verbose logging) |
LLM_BASE_URL |
http://10.151.49.182:1234/v1 |
OpenAI-compatible API endpoint |
LLM_MODEL |
qwen3-coder-30b-a3b-instruct |
Model for AI features |
GITHUB_APP_ID |
GitHub App numeric ID | |
GITHUB_APP_PRIVATE_KEY |
PEM file path or content | |
GITHUB_ORG |
open-community-science |
GitHub org for paper repos |
SLURM_ENABLED |
false |
Enable HPC job submission |
SLURM_HOST |
HPC cluster hostname (e.g. fir.alliancecan.ca) |
|
SLURM_USER |
HPC username | |
SLURM_ACCOUNT |
SLURM account (e.g. def-rec3141_cpu) |
See portal/app/config.py for the full list.
# Fast unit tests (~1s)
python -m pytest tests/test_figure_generator.py tests/test_review_pr.py -v
# Citation tests (~5s, hits NCBI)
python -m pytest tests/test_pubmed_citations.py -v
# Full E2E workflow (~2 min, needs LLM server)
python -m pytest tests/test_e2e_workflow.py -v --timeout=300- Backend: FastAPI + SQLAlchemy async + SQLite (aiosqlite)
- Frontend: Jinja2 templates + htmx + vanilla JS
- AI: OpenAI-compatible API (LM Studio locally, Claude in production)
- Metadata: NCBI Entrez (Biopython) + PubMed E-utilities
- HPC: OpenSSH ControlMaster → SLURM → Nextflow on Fir (Alliance Canada)
- Hosting: Arbutus cloud VM (Alliance Canada OpenStack)
- Papers: Quarto → GitHub Pages
- GitHub: GitHub App (bot ops + OAuth login)
Production: https://microbial.opencommunity.science (Arbutus cloud VM)
./deploy.sh # deploy to Arbutus VM
# After DNS setup:
ssh omc2 sudo certbot --nginx -d microbial.opencommunity.scienceSLURM: Uses OpenSSH ControlMaster for Duo MFA bypass on Alliance clusters. Authenticate once:
ssh ubuntu@<arbutus-ip>
ssh fir # triggers Duo MFA, opens persistent connectionCC-BY 4.0