Article: geneRNIB: a living benchmark for gene regulatory network inference
Documentation: geneRNBI-doc
Repository: openproblems-bio/task_grn_inference
If you use this framework, please cite
@article{nourisa2025genernib,
title={geneRNIB: a living benchmark for gene regulatory network inference},
author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
journal={bioRxiv},
pages={2025--02},
year={2025},
publisher={Cold Spring Harbor Laboratory}
}
Repository: openproblems-bio/task_grn_inference
geneRNIB is a living benchmark platform for GRN inference. This platform provides curated datasets for GRN inference and evaluation, standardized evaluation protocols and metrics, computational infrastructure, and a dynamically updated leaderboard to track state-of-the-art methods. It runs novel GRNs in the cloud, offers competition scores, and stores them for future comparisons, reflecting new developments over time.
The platform supports the integration of new inference methods, datasets and protocols. When a new feature is added, previously evaluated GRNs are re-assessed, and the leaderboard is updated accordingly. The aim is to evaluate both the accuracy and completeness of inferred GRNs. It is designed for both single-modality and multi-omics GRN inference.
| name | roles |
|---|---|
| Jalil Nourisa | author |
| Robrecht Cannoodt | author |
| Jérémie Kalfon | contributor |
| Antoine Passimier | contributor |
| Marco Stock | contributor |
| Christian Arnold | contributor |
flowchart TB
file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
file_evaluation_de_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data-differential-expression'>perturbation data differential expression</a>")
file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
comp_control_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-control-method'>Control Method</a>"/]
file_atac_h5ad-.-comp_method
comp_method-.->file_prediction_h5ad
file_prediction_h5ad---comp_metric
comp_metric-->file_score_h5ad
file_evaluation_bulk_h5ad-.-comp_metric
file_evaluation_de_h5ad-.-comp_metric
file_evaluation_sc_h5ad-.-comp_metric
file_rna_h5ad---comp_method
file_rna_h5ad---comp_control_method
comp_control_method-.->file_prediction_h5ad
Chromatin accessibility data
Example file:
resources_test/grn_benchmark/inference_data//op_atac.h5ad
Format:
AnnData object
obs: 'cell_type', 'donor_id'
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
Data structure:
| Slot | Type | Description |
|---|---|---|
obs["cell_type"] |
string |
(Optional) The annotated cell type of each cell based on RNA expression. |
obs["donor_id"] |
string |
(Optional) Donor id. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
A GRN inference method
Arguments:
| Name | Type | Description |
|---|---|---|
--rna |
file |
RNA expression data. |
--atac |
file |
(Optional) Chromatin accessibility data. |
--prediction |
file |
(Optional, Output) File indicating the inferred GRN. |
--tf_all |
file |
NA. Default: resources_test/grn_benchmark/prior/tf_all.csv. |
--max_n_links |
integer |
(Optional) NA. Default: 50000. |
--num_workers |
integer |
(Optional) NA. Default: 2. |
--temp_dir |
string |
(Optional) NA. Default: output/temdir. |
--layer |
string |
(Optional) NA. Default: lognorm. |
--seed |
integer |
(Optional) NA. Default: 32. |
--dataset_id |
string |
(Optional) NA. Default: op. |
--apply_tf_methods |
boolean |
(Optional) NA. Default: TRUE. |
File indicating the inferred GRN.
Example file: resources_test/grn_models/op/collectri.h5ad
Format:
AnnData object
uns: 'dataset_id', 'method_id', 'prediction'
Data structure:
| Slot | Type | Description |
|---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["method_id"] |
string |
A unique identifier for the inference method. |
uns["prediction"] |
object |
Inferred GRNs in the format of source, target, weight. |
A metric to evaluate the performance of the inferred GRN
Arguments:
| Name | Type | Description |
|---|---|---|
--prediction |
file |
File indicating the inferred GRN. |
--evaluation_data |
file |
(Optional) Perturbation dataset for benchmarking. |
--evaluation_data_sc |
file |
(Optional) Perturbation dataset for benchmarking (sinlge cell). |
--evaluation_data_de |
file |
(Optional) Perturbation dataset for benchmarking (differential expression). |
--score |
file |
(Output) File indicating the score of a metric. |
--layer |
string |
(Optional) NA. Default: lognorm. |
--max_n_links |
integer |
(Optional) NA. Default: 50000. |
--tf_all |
file |
(Optional) NA. |
--num_workers |
integer |
(Optional) NA. Default: 20. |
--apply_tf |
boolean |
(Optional) NA. Default: TRUE. |
--regulators_consensus |
file |
(Optional) NA. |
--reg_type |
string |
(Optional) NA. Default: ridge. |
File indicating the score of a metric.
Example file: resources_test/scores/score.h5ad
Format:
AnnData object
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'
Data structure:
| Slot | Type | Description |
|---|---|---|
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["method_id"] |
string |
A unique identifier for the method. |
uns["metric_ids"] |
string |
One or more unique metric identifiers. |
uns["metric_values"] |
double |
The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
Perturbation dataset for benchmarking
Example file:
resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad
Format:
AnnData object
obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
layers: 'X_norm'
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
Data structure:
| Slot | Type | Description |
|---|---|---|
obs["cell_type"] |
string |
The annotated cell type of each cell based on RNA expression. |
obs["perturbation"] |
string |
Name of the column containing perturbation names. |
obs["donor_id"] |
string |
(Optional) Donor id. |
obs["perturbation_type"] |
string |
(Optional) Name of the column indicating perturbation type. |
layers["X_norm"] |
double |
Normalized values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
Perturbation dataset for benchmarking (differential expression)
Example file:
resources_test/grn_benchmark/evaluation_data/replogle_de.h5ad
Format:
AnnData object
obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
Data structure:
| Slot | Type | Description |
|---|---|---|
obs["cell_type"] |
string |
The annotated cell type of each cell based on RNA expression. |
obs["perturbation"] |
string |
Name of the column containing perturbation names. |
obs["donor_id"] |
string |
(Optional) Donor id. |
obs["perturbation_type"] |
string |
(Optional) Name of the column indicating perturbation type. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
Perturbation dataset for benchmarking (sinlge cell).
Example file:
resources_test/grn_benchmark/evaluation_data/norman_sc.h5ad
Format:
AnnData object
obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
layers: 'X_norm'
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
Data structure:
| Slot | Type | Description |
|---|---|---|
obs["cell_type"] |
string |
The annotated cell type of each cell based on RNA expression. |
obs["perturbation"] |
string |
Name of the column containing perturbation names. |
obs["donor_id"] |
string |
(Optional) Donor id. |
obs["perturbation_type"] |
string |
(Optional) Name of the column indicating perturbation type. |
layers["X_norm"] |
double |
Normalized values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
RNA expression data.
Example file: resources_test/grn_benchmark/inference_data/op_rna.h5ad
Format:
AnnData object
obs: 'cell_type', 'donor_id'
layers: 'counts', 'X_norm'
uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'
Data structure:
| Slot | Type | Description |
|---|---|---|
obs["cell_type"] |
string |
(Optional) The annotated cell type of each cell based on RNA expression. |
obs["donor_id"] |
string |
(Optional) Donor id. |
layers["counts"] |
double |
(Optional) Counts matrix. |
layers["X_norm"] |
double |
Normalized values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. |
uns["dataset_name"] |
string |
Nicely formatted name. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
Quality control methods for verifying the pipeline.
Arguments:
| Name | Type | Description |
|---|---|---|
--rna |
file |
RNA expression data. |
--rna_all |
file |
(Optional) RNA expression data that contains all variability. Only used for positive control. |
--prediction |
file |
(Optional, Output) File indicating the inferred GRN. |
--tf_all |
file |
NA. Default: resources_test/grn_benchmark/prior/tf_all.csv. |
--max_n_links |
integer |
(Optional) NA. Default: 50000. |
--num_workers |
integer |
(Optional) NA. Default: 20. |
--temp_dir |
string |
(Optional) NA. Default: output/temdir. |
--layer |
string |
(Optional) NA. Default: lognorm. |
--seed |
integer |
(Optional) NA. Default: 32. |
--dataset_id |
string |
(Optional) NA. Default: op. |
--apply_tf_methods |
boolean |
(Optional) NA. Default: TRUE. |