GRN Inference

Benchmarking GRN inference methods

Article: geneRNIB: a living benchmark for gene regulatory network inference

Repository: openproblems-bio/task_grn_inference

If you use this framework, please cite it as

  @article{nourisa2025genernib,
    title={geneRNIB: a living benchmark for gene regulatory network inference},
    author={Nourisa, Jalil and Passemiers, Antoine and Stock, Marco and Zeller-Plumhoff, Berit and Cannoodt, Robrecht and Arnold, Christian and Tong, Alexander and Hartford, Jason and Scialdone, Antonio and Moreau, Yves and others},
    journal={bioRxiv},
    pages={2025--02},
    year={2025},
    publisher={Cold Spring Harbor Laboratory}
  }

Description

geneRNIB is a living benchmark platform for GRN inference. This platform provides curated datasets for GRN inference and evaluation, standardized evaluation protocols and metrics, computational infrastructure, and a dynamically updated leaderboard to track state-of-the-art methods. It runs novel GRNs in the cloud, offers competition scores, and stores them for future comparisons, reflecting new developments over time.

The platform supports the integration of new inference methods, datasets and protocols. When a new feature is added, previously evaluated GRNs are re-assessed, and the leaderboard is updated accordingly. The aim is to evaluate both the accuracy and completeness of inferred GRNs. It is designed for both single-modality and multi-omics GRN inference.

In the current version, geneRNIB contains 10 inference methods including both single and multi-omics, 8 evalation metrics, and five datasets.

See our publication for the details of methods.

Installation

You need to have Docker, Java, and Viash installed. Follow these instructions to install the required dependencies.

Download resources

git clone --recursive [email protected]:openproblems-bio/task_grn_inference.git

cd task_grn_inference

To interact with the framework, you should download the resources containing necessary inferene and evaluation datasets to get started. Here, we download the test resources which are solely used for testing if the framework is installed successfully.

scripts/download_resources.sh

Refer to the Documentation for downloading the actual datasets. To reproduce the results, run scripts/run_benchmark_all.sh, which is a very resource intensive run.

Run a GRN inference method

To infer a GRN for a given dataset (e.g. op) using simple Pearson correlation:

viash run src/control_methods/pearson_corr/config.vsh.yaml -- \
            --rna resources_test/grn_benchmark/inference_data/op_rna.h5ad \
            --prediction output/net.h5ad \
            --tf_all resources_test/grn_benchmark/prior/tf_all.csv

Of note, we are using the resources_test datasets, which are small versions of the actual datasets for computational speed. Thus, the obtained predictions are not realistic. To obtain a realistic prediction, download the actual data and set the folder to resources.

Evaluate a GRN prediction

Once got the prediction for a given dataset (e.g. op), use the following code to obtain evaluation scores.

scripts/single_grn_evaluation.sh output/net.h5ad op --test_run

This outputs the scores into output/test_run/scores.yaml. Of note, by passing --test_run, the evaluations are done on the test data. To use the actual data (resources folder), omit this flag.

Add a GRN inference method, evaluation metric, or dataset

To add a new component to the repository, follow the Documentation.

Authors & contributors

name	roles
Jalil Nourisa	author
Robrecht Cannoodt	author
Antoine Passimier	contributor
Marco Stock	contributor
Christian Arnold	contributor

API

flowchart TB
  file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
  comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
  file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
  comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-feature-based-metrics'>feature-based metrics</a>"/]
  comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-wasserstein-distance-metrics'>Wasserstein distance metrics</a>"/]
  comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
  file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
  file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
  file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
  file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
  file_atac_h5ad-.-comp_method
  comp_method-.->file_prediction_h5ad
  file_prediction_h5ad---comp_metric_regression
  file_prediction_h5ad---comp_metric_ws
  file_prediction_h5ad---comp_metric
  comp_metric_regression-->file_score_h5ad
  comp_metric_ws-->file_score_h5ad
  comp_metric-->file_score_h5ad
  file_evaluation_bulk_h5ad---comp_metric_regression
  file_evaluation_sc_h5ad-.-comp_metric_ws
  file_rna_h5ad---comp_method

File format: chromatin accessibility data

Chromatin accessibility data

Example file: resources_test/grn_benchmark/inference_data//op_atac.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'
 uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	(Optional) The annotated cell type of each cell based on RNA expression.
`obs["donor_id"]`	`string`	(Optional) Donor id.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["normalization_id"]`	`string`	Which normalization was used.

Component type: method

A GRN inference method

Arguments:

Name	Type	Description
`--rna`	`file`	RNA expression data.
`--atac`	`file`	(Optional) Chromatin accessibility data.
`--prediction`	`file`	(Optional, Output) File indicating the inferred GRN.
`--tf_all`	`file`	NA. Default: `resources_test/grn_benchmark/prior/tf_all.csv`.
`--max_n_links`	`integer`	(Optional) NA. Default: `50000`.
`--num_workers`	`integer`	(Optional) NA. Default: `20`.
`--temp_dir`	`string`	(Optional) NA. Default: `output/temdir`.
`--layer`	`string`	(Optional) NA. Default: `X_norm`.
`--seed`	`integer`	(Optional) NA. Default: `32`.
`--dataset_id`	`string`	(Optional) NA. Default: `op`.
`--is_test`	`boolean`	(Optional) NA. Default: `FALSE`.

File format: GRN prediction

File indicating the inferred GRN.

Example file: resources_test/grn_models/op/collectri.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'prediction'

Data structure:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the inference method.
`uns["prediction"]`	`object`	Inferred GRNs in the format of source, target, weight.

Component type: feature-based metrics

A regression metric to evaluate the performance of the inferred GRN

Arguments:

Name	Type	Description
`--prediction`	`file`	File indicating the inferred GRN.
`--score`	`file`	(Output) File indicating the score of a metric.
`--layer`	`string`	(Optional) NA. Default: `X_norm`.
`--max_n_links`	`integer`	(Optional) NA. Default: `50000`.
`--verbose`	`integer`	(Optional) NA. Default: `2`.
`--num_workers`	`integer`	(Optional) NA. Default: `20`.
`--apply_tf`	`boolean`	(Optional) NA. Default: `TRUE`.
`--apply_skeleton`	`boolean`	(Optional) NA. Default: `FALSE`.
`--skeleton`	`file`	(Optional) NA.
`--evaluation_data`	`file`	Perturbation dataset for benchmarking.
`--tf_all`	`file`	NA.
`--reg_type`	`string`	(Optional) NA. Default: `ridge`.

Component type: Wasserstein distance metrics

A Wasserstein distance based metric to evaluate the performance of the inferred GRN

Arguments:

Name	Type	Description
`--prediction`	`file`	File indicating the inferred GRN.
`--score`	`file`	(Output) File indicating the score of a metric.
`--layer`	`string`	(Optional) NA. Default: `X_norm`.
`--max_n_links`	`integer`	(Optional) NA. Default: `50000`.
`--verbose`	`integer`	(Optional) NA. Default: `2`.
`--num_workers`	`integer`	(Optional) NA. Default: `20`.
`--apply_tf`	`boolean`	(Optional) NA. Default: `TRUE`.
`--apply_skeleton`	`boolean`	(Optional) NA. Default: `FALSE`.
`--skeleton`	`file`	(Optional) NA.
`--evaluation_data_sc`	`file`	(Optional) Perturbation dataset for benchmarking (sinlge cell).

Component type: metrics

A metric to evaluate the performance of the inferred GRN

Arguments:

Name	Type	Description
`--prediction`	`file`	File indicating the inferred GRN.
`--score`	`file`	(Output) File indicating the score of a metric.
`--layer`	`string`	(Optional) NA. Default: `X_norm`.
`--max_n_links`	`integer`	(Optional) NA. Default: `50000`.
`--verbose`	`integer`	(Optional) NA. Default: `2`.
`--num_workers`	`integer`	(Optional) NA. Default: `20`.
`--apply_tf`	`boolean`	(Optional) NA. Default: `TRUE`.
`--apply_skeleton`	`boolean`	(Optional) NA. Default: `FALSE`.
`--skeleton`	`file`	(Optional) NA.

File format: score

File indicating the score of a metric.

Example file: resources_test/scores/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the method.
`uns["metric_ids"]`	`string`	One or more unique metric identifiers.
`uns["metric_values"]`	`double`	The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

File format: perturbation data (pseudo)bulk

Perturbation dataset for benchmarking

Example file: resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad

Format:

AnnData object
 obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
 layers: 'X_norm'
 uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	The annotated cell type of each cell based on RNA expression.
`obs["perturbation"]`	`string`	Name of the column containing perturbation names.
`obs["donor_id"]`	`string`	(Optional) Donor id.
`obs["perturbation_type"]`	`string`	(Optional) Name of the column indicating perturbation type.
`layers["X_norm"]`	`double`	Normalized values.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["normalization_id"]`	`string`	Which normalization was used.

File format: perturbation data (sc)

Perturbation dataset for benchmarking (sinlge cell).

Example file: resources_test/grn_benchmark/evaluation_data/norman_sc.h5ad

Format:

AnnData object
 obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
 layers: 'X_norm'
 uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	The annotated cell type of each cell based on RNA expression.
`obs["perturbation"]`	`string`	Name of the column containing perturbation names.
`obs["donor_id"]`	`string`	(Optional) Donor id.
`obs["perturbation_type"]`	`string`	(Optional) Name of the column indicating perturbation type.
`layers["X_norm"]`	`double`	Normalized values.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["normalization_id"]`	`string`	Which normalization was used.

File format: gene expression data

RNA expression data.

Example file: resources_test/grn_benchmark/inference_data/op_rna.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'
 layers: 'counts', 'X_norm'
 uns: 'dataset_id', 'dataset_name', 'dataset_summary', 'dataset_organism', 'normalization_id'

Data structure:

Slot	Type	Description
`obs["cell_type"]`	`string`	(Optional) The annotated cell type of each cell based on RNA expression.
`obs["donor_id"]`	`string`	(Optional) Donor id.
`layers["counts"]`	`double`	(Optional) Counts matrix.
`layers["X_norm"]`	`double`	Normalized values.
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["dataset_name"]`	`string`	Nicely formatted name.
`uns["dataset_summary"]`	`string`	Short description of the dataset.
`uns["dataset_organism"]`	`string`	(Optional) The organism of the sample in the dataset.
`uns["normalization_id"]`	`string`	Which normalization was used.

Name		Name	Last commit message	Last commit date
Latest commit History 591 Commits
.github		.github
common @ f01ff21		common @ f01ff21
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
_viash.yaml		_viash.yaml
main.nf		main.nf
test.ipynb		test.ipynb
thumbnail.svg		thumbnail.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GRN Inference

Description

Installation

Download resources

Run a GRN inference method

Evaluate a GRN prediction

Add a GRN inference method, evaluation metric, or dataset

Authors & contributors

API

File format: chromatin accessibility data

Component type: method

File format: GRN prediction

Component type: feature-based metrics

Component type: Wasserstein distance metrics

Component type: metrics

File format: score

File format: perturbation data (pseudo)bulk

File format: perturbation data (sc)

File format: gene expression data

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 8

Languages

License

openproblems-bio/task_grn_inference

Folders and files

Latest commit

History

Repository files navigation

GRN Inference

Description

Installation

Download resources

Run a GRN inference method

Evaluate a GRN prediction

Add a GRN inference method, evaluation metric, or dataset

Authors & contributors

API

File format: chromatin accessibility data

Component type: method

File format: GRN prediction

Component type: feature-based metrics

Component type: Wasserstein distance metrics

Component type: metrics

File format: score

File format: perturbation data (pseudo)bulk

File format: perturbation data (sc)

File format: gene expression data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 8

Languages

Packages