Skip to content

sdg-ai/sdg_data_catalog

Repository files navigation

sdg_data_catalog

The SDG Data Catalogue is an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works. Our system leverages advances in Artificial Intelligence and Natural Language Processing Technologies to extract and organise deep knowledge of data sets available that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.

This repository contains the code used in the different stages of the pipeline:

  • open source academic paper scrapping
  • paper processing and metadata extracting
  • data annotation
  • named entity recognition model to extract entities related to the datasets (names, authors, description...)
  • entity linking model
  • knowledge graph model between datasets and papers

You have instructions in each of the sub-sections on how to run the code on your corpus of papers.

About

The SDG Data Catalogue is an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors