You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository will host a set of tidyverse code for Exploratory Analysis and Predictive Modelling of sequences citation in the literature. Sequences originate from the European Nucleotide Archive (ENA). The literatures originate from the EuropePMC
files total 208G compressed and 3.5T uncompressed.
The release contains 263,421,789 sequence entries comprising 408,005,271,872 nucleotides.
ENA Release Breakdown by Taxonomy
Division
entries
ENV:Environmental Samples
16,765,544
FUN:Fungi
7,511,473
HUM:Human
27,520,827
INV:Invertebrates
40,534,979
MAM:Other Mammals
16,578,137
MUS:Mus musculus
10,479,013
PHG:Bacteriophage
17,393
PLN:Plants
85,618,575
PRO:Prokaryotes
3,589,696
ROD:Rodents
3,263,952
SYN:Synthetic
10,049,087
TGN:Transgenic
286,472
UNC:Unclassified
15,943,630
VRL:Viruses
3,198,057
VRT:Other Vertebrates
22,064,954
Total
263,421,789
ENA Release Extraction Condition
Sequence entry must have the /country qualifier that represent, the locality of isolation of the sequenced organism indicated in terms of political names for nations, oceans or seas, followed by regions and localities