This document describes the new gene interaction network builder that complements the existing MeSH-based co-citation network analysis.
The build_gene_interaction_network.py script builds a gene co-citation network starting from a single gene (e.g., IL17A, TNF) instead of a MeSH term. It uses the same underlying logic and parameters as 4_build_network.py.
- Start with a gene: Provide a gene symbol (e.g., IL17A)
- Find associated papers: Retrieves all PubMed IDs associated with that gene
- Identify co-cited genes: Finds all other genes mentioned in those papers
- Build network: Creates a co-occurrence network based on shared PMIDs
- Apply filters: Uses the same filtering parameters as the MeSH-based network
python code/build_gene_interaction_network.py --gene IL17A --year-start 2014 --year-end 2024 --organism map_to_human--gene: Starting gene symbol (e.g., 'IL17A', 'TNF', 'TP53')
--year-start: Start year for filtering papers (optional, uses all years if not specified)--year-end: End year for filtering papers (optional, uses all years if not specified)--organism: Organism filtering mode (default:map_to_human)human_only: Only include human genesmap_to_human: Include non-human genes mapped to human orthologs
--min-papers-per-gene: Minimum papers per gene to include in network (default: 5)--min-papers-per-edge: Minimum shared papers required for an edge (default: 3)--exclude-seed: Exclude the starting gene from the network (default: False, gene is included)
python code/build_gene_interaction_network.py \
--gene IL17A \
--year-start 2014 \
--year-end 2024 \
--organism map_to_humanResults:
- 929 seed papers associated with IL17A
- 29 genes in the network (including IL17A, after filtering)
- 38 edges (gene pairs with ≥3 shared papers)
Top genes:
- IL17A (929 papers - the starting gene)
- IL17F (100 papers)
- IL23A (40 papers)
- IFNG (31 papers)
- IL10, IL6, TNF (25 papers each)
Direct interactions with IL17A:
- IL23A (40 shared papers)
- IFNG (31 shared papers)
- IL10, IL6, TNF (25 shared papers each)
- IL17RA (22 shared papers)
- TGFB1 (18 shared papers)
python code/build_gene_interaction_network.py \
--gene TNF \
--year-start 2020 \
--year-end 2024 \
--organism human_only \
--min-papers-per-gene 10 \
--min-papers-per-edge 5Results:
- 415 seed papers associated with TNF
- 7 genes in the network (after filtering)
- 3 edges (gene pairs with ≥5 shared papers)
Top interacting genes:
- IL6 (55 papers)
- IL1B (32 papers)
- IL10 (24 papers)
The script creates a directory in /results with the following structure:
/results/gene_<GENE>_<YEAR_START>_<YEAR_END>_<ORGANISM>/
├── network_data.json # Complete network data with metadata
├── nodes.csv # List of genes with paper counts
└── edges.csv # Network edges with shared paper counts
Contains complete network information:
{
"metadata": {
"starting_gene": "IL17A",
"starting_gene_id": 3605,
"num_seed_papers": 929,
"year_start": 2014,
"year_end": 2024,
"organism_mode": "map_to_human",
"num_genes": 28,
"num_edges": 10
},
"nodes": [
{
"gene_id": 112744,
"symbol": "IL17F",
"name": "interleukin 17F",
"total_pmids": [...],
"human_pmids": [...],
"ortholog_pmids": [...],
"total_count": 100,
"human_count": 100,
"ortholog_count": 0
},
...
],
"edges": [
{
"gene1": 23765,
"gene2": 112744,
"shared_pmids": [...],
"weight": 10
},
...
]
}The gene-based networks work with the existing visualization tools:
python code/5_export_dot.py /results/gene_IL17A_2014_2024_map_to_human/network_data.jsonThis creates network.dot which can be visualized with Graphviz:
dot -Tpng network.dot -o network.png
# or
dot -Tsvg network.dot -o network.svgpython code/6_generate_html.py /results/gene_IL17A_2014_2024_map_to_human/network_data.jsonThis generates a single-file HTML (index.html) with:
- Network overview with statistics
- Genes table with clickable paper counts
- Edges table with clickable shared paper counts
- Direct PubMed links: Click any paper count to open all those papers in PubMed
- Green numbers = all papers (total)
- Blue numbers = human gene papers only
- Orange numbers = ortholog gene papers only
Note: The old multi-file HTML generator (with separate pages per gene/edge) is available as 6_generate_html_multifile.py if needed.
| Feature | MeSH-based (4_build_network.py) | Gene-based (build_gene_interaction_network.py) |
|---|---|---|
| Starting point | MeSH term (e.g., "Chordoma") | Gene symbol (e.g., "IL17A") |
| Seed papers | Papers tagged with MeSH term | Papers associated with gene |
| Network type | Co-citation within topic | Interaction partners of gene |
| Use case | Topic-focused research | Gene-centered analysis |
| Metadata field | mesh_term |
starting_gene, starting_gene_id, num_seed_papers |
The script automatically:
- Looks up the gene ID from the symbol (case-insensitive)
- Validates that the gene exists in the database
- Suggests similar gene symbols if not found
Papers are retrieved in two steps:
- Get all PMIDs for the starting gene from
gene_pubmedtable - If year filters are specified, filter PMIDs using the
pubmed_articlestable
The network construction follows the same logic as MeSH-based networks:
- Find all genes co-cited in seed papers
- Support both
human_onlyandmap_to_humanmodes - Filter genes by minimum paper count
- Build edges based on shared PMIDs
- Filter edges by minimum shared paper count
By default, the starting gene IS INCLUDED in the network. This is important because:
- The starting gene acts as a hub connecting other genes
- Without it, many edges would be lost (only showing secondary connections)
- It provides context for understanding the network structure
- You can see how strongly other genes are co-cited with your gene of interest
Example with IL17A:
- With IL17A included (default): 38 edges
- With IL17A excluded (
--exclude-seed): 10 edges
Use --exclude-seed only if you specifically want to see how the other genes relate to each other, independent of the starting gene.
- Start broad, then narrow: Begin with default parameters, then increase thresholds to reduce network size
- Use year filters: Recent papers may show emerging interactions
- Compare organism modes:
map_to_humanprovides more coverage,human_onlyis more specific - Adjust thresholds: For highly studied genes, increase
--min-papers-per-geneto focus on strongest interactions
The gene-based network builder integrates seamlessly with existing tools:
- ✓ Uses the same database schema
- ✓ Produces compatible JSON format
- ✓ Works with
5_export_dot.pyfor DOT export - ✓ Works with
6_generate_html.pyfor HTML generation - ✓ Follows the same filtering and ortholog mapping logic