Skip to content

qiushipeng/Data_Reusability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global landscape of primary omics data generation and its secondary analysis across 193 countries and territories

Download data

Download the most recent open access subset of PubMed Central (PMC) publications. Download metadata reference tables for every public SRA and GEO dataset.

Note: this data is large. Create a directory outside this repository to store the data, and point each script to that directory where appropriate.

cd scripts
./download_publications.sh
./download_refs.py
cd ../

Select papers mentioning SRA or GEO

Parse the text of every publication for regular expressions matching SRA and GEO accession IDs.

cd scripts
./extract_GEO_SRA.sh
cd ../

Extract the publication date from every selected paper

Parse the XML files to find the earliest listed publish date and countries.

cd scripts
./extract_date.sh
./extract_country.sh
cd ../

Create a master table containing all the data

Launch jupyter notebook

cd jupyter_notebooks
jupyter notebook

Merge data scraped from the PMC publications onto reference data from SRA and GEO.

  • Run jupyter_notebooks/Create_Metadata_Table.ipynb
  • Run jupyter_notebooks/Analyze_Metadata_Table.ipynb

Create figures

Use everything generated so far to visualize findings.

  • Run jupyter_notebooks/Visulization.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages