Skip to content

This repository contains code and data relative to paper "High throughput genomic feature extraction reveals prokaryotic adaptations to the abiotic environment".

Notifications You must be signed in to change notification settings

MGXlab/abiotic_environment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Code and Data for Abiotic Environment Paper

This repository contains code and data relative to paper "High throughput genomic feature extraction reveals prokaryotic adaptations to the abiotic environment", by Maria Beatriz Walter Costa, Rose Brouns, Aristeidis Litos, Heyde França, Maria Schreiber, Francesco Bisiach, Bas E. Dutilh.

Scripts are located in folder scripts/.

Description of files in data/ folder follows below. You can uncompress .tar.gz files in the terminal with: tar -xzvf FILENAME.tar.gz

  • df_bacdive.tar.gz: contains 91,228 rows with prokaryotic isolates of BacDive and 13 columns with the taxonomy, genome assembly ID, and metadata on abiotic growth factors. Metadata contains minimum and maximum reported values separated by a minus sign. Filters described in the paper (see Material and Methods) were applied.
  • All files in sub-folders oxygen, pH, salt and temperature contain pickle.zst formatted files for the development of machine learning models of classification and regression for the following feature types: aminoacid frequencies, eggNOG COGs, kmer profiles (k = 9) and ncRNA families. If you want to predict the abiotic growth factors of a new genome or MAG, you could use these as training set. Note that all classification models (including oxygen) were built upon two contrasting classes for the purpose of investigating general underlying biological mechanisms. If you wish to predict phenotypes of new isolates, use the regression files. For oxygen, you should develop a new classification model based upon the data of file df_bacdive.tar.gz, which contains aerobic and anaerobic as well as intermediate classes.

About

This repository contains code and data relative to paper "High throughput genomic feature extraction reveals prokaryotic adaptations to the abiotic environment".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published