Skip to content

marvinm2/AOPWikiRDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

482 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AOP-Wiki XML to RDF Conversion Tool

DOI RDF generation Turtle File Quality Control URI Resolvability Check uri resolvability

A tool for converting AOP-Wiki XML data exports into RDF (Resource Description Framework) format. This repository contains the conversion notebook that transforms Adverse Outcome Pathway data into semantic web formats, along with automated workflows for data generation and quality control.

Features

  • Automated XML-to-RDF conversion from AOP-Wiki XML exports
  • Gene mapping integration using HGNC data and BridgeDb services
  • Weekly automated updates via GitHub Actions
  • Quality control validation of generated RDF files
  • URI resolvability monitoring with automated badge generation
  • SPARQL endpoint setup instructions for local data exploration

Generated Data

The conversion process produces three main RDF files:

  • AOPWikiRDF.ttl: Main dataset containing AOP-Wiki data (AOPs, Key Events, Key Event Relationships, Chemical Stressors)
  • AOPWikiRDF-Genes.ttl: Gene mapping extensions with approved HGNC symbols and database identifiers
  • AOPWikiRDF-Void.ttl: VoID metadata describing the datasets

Quick Start

Prerequisites

pip install -r requirements.txt

Run Conversion

jupyter execute AOP-Wiki_XML_to_RDF_conversion.ipynb

Generate Statistics

jupyter execute AOP-Wiki_stats.ipynb

Set up a Virtuoso SPARQL endpoint with AOP-Wiki RDF (on linux):

Step 1 - Create folder to mount

Enter the terminal and create a local folder to map to the docker container. Note the path to the folder to enter it at step 3. In this example, the folder '/aopwikirdf' was created and entered it by using:

mkdir -p aopwikirdf

Step 2 - Move the RDF (.ttl) files into the newly created folder

Step 3 - Run the Docker image

Be sure to use ports 8890:8890 and 1111:1111. In this case, the container was named "AOPWiki". Also, this step configures the mapped local folder with the data, which is in this example "/aopwikirdf". The Docker image used is openlink/virtuoso-opensource-7. Run the Docker image by entering:

sudo docker run -d --env DBA_PASSWORD=dba -p 8890:8890 -p 1111:1111 --name AOPWiki --volume `pwd`/aopwikirdf/:/database/data/  openlink/virtuoso-opensource-7

Step 4 - Enter the running container

The SPARQL endpoint should already be accessible through localhost:8890/sparql/. However, while the Docker image is running, the data is not yet loaded. Therefore you need to enter the it by using:

sudo docker exec -it AOPWiki  bash

Step 5 - Move the .ttl files

First, enter the "/data" folder and move the Turtle file(s) to the folder upstream by using:

mv data/AOPWikiRDF.ttl .
mv data/AOPWikiRDF-Void.ttl .
mv data/AOPWikiRDF-Genes.ttl .
exit

Step 6 - Enter the container SQL and reset

Enter the running docker container SQL by using:

sudo docker exec -i AOPWiki isql 1111

In case the service is already active and contains older RDF, be sure to perform a global reset and delete the old RDF files from the load_list, using the following commands:

RDF_GLOBAL_RESET();
DELETE FROM load_list WHERE ll_graph = 'http://aopwiki.org/';

The presence of files in the load_list can be viewed using the following command:

select * from DB.DBA.load_list;

Step 7 - Load the RDF

Use the following commands to complete the loading of RDF. If errors occur, try again within a few seconds (which often works), or look at http://docs.openlinksw.com/virtuoso/errorcodes/ to find out what they mean.

log_enable(2);
DB.DBA.XML_SET_NS_DECL ('dc', 'http://purl.org/dc/elements/1.1/',2);
DB.DBA.XML_SET_NS_DECL ('dcterms', 'http://purl.org/dc/terms/',2);
DB.DBA.XML_SET_NS_DECL ('rdfs', 'http://www.w3.org/2000/01/rdf-schema#',2);
DB.DBA.XML_SET_NS_DECL ('foaf', 'http://xmlns.com/foaf/0.1/',2);
DB.DBA.XML_SET_NS_DECL ('aop', 'https://identifiers.org/aop/',2);
DB.DBA.XML_SET_NS_DECL ('aop.events', 'https://identifiers.org/aop.events/',2);
DB.DBA.XML_SET_NS_DECL ('aop.relationships', 'https://identifiers.org/aop.relationships/',2);
DB.DBA.XML_SET_NS_DECL ('aop.stressor', 'https://identifiers.org/aop.stressor/',2);
DB.DBA.XML_SET_NS_DECL ('aopo', 'http://aopkb.org/aop_ontology#',2);
DB.DBA.XML_SET_NS_DECL ('cas', 'https://identifiers.org/cas/',2);
DB.DBA.XML_SET_NS_DECL ('inchikey', 'https://identifiers.org/inchikey/',2);
DB.DBA.XML_SET_NS_DECL ('pato', 'http://purl.obolibrary.org/obo/PATO_',2);
DB.DBA.XML_SET_NS_DECL ('ncbitaxon', 'http://purl.bioontology.org/ontology/NCBITAXON/',2);
DB.DBA.XML_SET_NS_DECL ('cl', 'http://purl.obolibrary.org/obo/CL_',2);
DB.DBA.XML_SET_NS_DECL ('uberon', 'http://purl.obolibrary.org/obo/UBERON_',2);
DB.DBA.XML_SET_NS_DECL ('go', 'http://purl.org/obo/owl/GO#',2);
DB.DBA.XML_SET_NS_DECL ('mi', 'http://purl.obolibrary.org/obo/MI_',2);
DB.DBA.XML_SET_NS_DECL ('mp', 'http://purl.obolibrary.org/obo/MP_',2);
DB.DBA.XML_SET_NS_DECL ('hp', 'http://purl.obolibrary.org/obo/HP_',2);
DB.DBA.XML_SET_NS_DECL ('pco', 'http://purl.obolibrary.org/obo/PCO_',2);
DB.DBA.XML_SET_NS_DECL ('nbo', 'http://purl.obolibrary.org/obo/NBO_',2);
DB.DBA.XML_SET_NS_DECL ('vt', 'http://purl.obolibrary.org/obo/VT_',2);
DB.DBA.XML_SET_NS_DECL ('pr', 'http://purl.obolibrary.org/obo/PR_',2);
DB.DBA.XML_SET_NS_DECL ('chebio', 'http://purl.obolibrary.org/obo/CHEBI_',2);
DB.DBA.XML_SET_NS_DECL ('fma', 'http://purl.org/sig/ont/fma/fma',2);
DB.DBA.XML_SET_NS_DECL ('cheminf', 'http://semanticscience.org/resource/CHEMINF_',2);
DB.DBA.XML_SET_NS_DECL ('ncit', 'http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#',2);
DB.DBA.XML_SET_NS_DECL ('comptox', 'https://comptox.epa.gov/dashboard/',2);
DB.DBA.XML_SET_NS_DECL ('mmo', 'http://purl.obolibrary.org/obo/MMO_',2);
DB.DBA.XML_SET_NS_DECL ('chebi', 'https://identifiers.org/chebi/',2);
DB.DBA.XML_SET_NS_DECL ('chemspider', 'https://identifiers.org/chemspider/',2);
DB.DBA.XML_SET_NS_DECL ('wikidata', 'https://identifiers.org/wikidata/',2);
DB.DBA.XML_SET_NS_DECL ('chembl.compound', 'https://identifiers.org/chembl.compound/',2);
DB.DBA.XML_SET_NS_DECL ('pubchem.compound', 'https://identifiers.org/pubchem.compound/',2);
DB.DBA.XML_SET_NS_DECL ('drugbank', 'https://identifiers.org/drugbank/',2);
DB.DBA.XML_SET_NS_DECL ('kegg.compound', 'https://identifiers.org/kegg.compound/',2);
DB.DBA.XML_SET_NS_DECL ('lipidmaps', 'https://identifiers.org/lipidmaps/',2);
DB.DBA.XML_SET_NS_DECL ('hmdb', 'https://identifiers.org/hmdb/',2);
DB.DBA.XML_SET_NS_DECL ('ensembl', 'https://identifiers.org/ensembl/',2);
DB.DBA.XML_SET_NS_DECL ('edam', 'http://edamontology.org/',2);
DB.DBA.XML_SET_NS_DECL ('hgnc', 'https://identifiers.org/hgnc/',2);
DB.DBA.XML_SET_NS_DECL ('ncbigene', 'https://identifiers.org/ncbigene/',2);
DB.DBA.XML_SET_NS_DECL ('uniprot', 'https://identifiers.org/uniprot/',2);
DB.DBA.XML_SET_NS_DECL ('void', 'http://rdfs.org/ns/void#',2);
DB.DBA.XML_SET_NS_DECL ('pav', 'http://purl.org/pav/',2);
DB.DBA.XML_SET_NS_DECL ('dcat', 'http://www.w3.org/ns/dcat#',2);
log_enable(1);
grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
ld_dir('data', 'AOPWikiRDF.ttl', 'http://aopwiki.org/');
ld_dir('data', 'AOPWikiRDF-Void.ttl', 'http://aopwiki.org/');
ld_dir('data', 'AOPWikiRDF-Genes.ttl', 'http://aopwiki.org/');
ld_dir('data', 'ServiceDescription.ttl', 'servicedescription');

To finalize the loading of data, use:

rdf_loader_run();

Check the status and look if the all.ttl file is loaded by entering:

select * from DB.DBA.load_list;

If the "il_state" = 2, the loading is complete. If issues occurred in this step, have a look at http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader. Quit the SQL by entering:

quit;

Step 8 - Enter the Virtuoso service with loaded AOP-Wiki RDF

The container is running with loaded RDF, available through http://localhost:8890, or enter the SPARQL endpoint directly through http://localhost:8890/sparql/.

Maintainers

  • Lead maintainer: Marvin Martens — Department of Translational Genomics, Maastricht University — ORCID 0000-0003-2230-0840
  • Backup maintainer: Egon Willighagen — Department of Translational Genomics, Maastricht University — ORCID 0000-0001-7542-0286

For questions, bug reports, and feature requests please open a GitHub Issue.

Licensing

Two licences apply to material in this repository:

  • Conversion code (Python package, scripts, notebooks, Actions workflows, tests, documentation): MIT — see LICENSE.
  • Generated RDF dataset (data/*.ttl, including AOPWikiRDF.ttl, AOPWikiRDF-Genes.ttl, AOPWikiRDF-Enriched.ttl, AOPWikiRDF-Void.ttl, ServiceDescription.ttl): Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA 4.0) — see data/LICENSE-DATA.

The dataset is a derivative re-expression of the AOP-Wiki, which has been default-licensed under CC-BY-SA 4.0 since Release 2.6 (2023-04-29). The dataset adopts the same licence to honour the upstream Share-Alike obligation.

Citation

If you use this software or the RDF dataset, please cite the underlying paper and the dataset DOI. See CITATION.cff for the canonical citation.

  • Paper: Martens M., Evelo C.T., Willighagen E.L. (2022). Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. Applied In Vitro Toxicology 8(1):2–13. doi:10.1089/aivt.2021.0010
  • Dataset releases (concept DOI, all versions): 10.5281/zenodo.13353286

About

This repository contains code for the AOP-Wiki XML-to-RDF conversion and guidance to deploy a Virtuoso SPARQL endpoint Docker image that is loaded with the AOP-Wiki RDF

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors