sdg_data_catalog

The SDG Data Catalogue is an open, extensible, global database of data sets, metadata, and research networks built automatically by mining millions of published open access academic works. Our system leverages advances in Artificial Intelligence and Natural Language Processing Technologies to extract and organise deep knowledge of data sets available that is otherwise hidden in plain sight in the continuous stream of research generated by the scientific community.

This repository contains the code used in the different stages of the pipeline:

open source academic paper scrapping
paper processing and metadata extracting
data annotation
named entity recognition model to extract entities related to the datasets (names, authors, description...)
entity linking model
knowledge graph model between datasets and papers

You have instructions in each of the sub-sections on how to run the code on your corpus of papers.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
dataset_ner		dataset_ner
db		db
entity_linking		entity_linking
knowledge_graph		knowledge_graph
paper_classification		paper_classification
scrapping		scrapping
tests		tests
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sdg_data_catalog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sdg_data_catalog

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages