Skip to content

melanierb/minhash-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MASH vs OMH

Singularity container

Build singularity container:

singularity build container.img container.def

GOS FTP

# Get root file listing
wget --no-remove-listing ftp://ftp.imicrobe.us/projects/26/

# Get Assembly file
wget ftp://ftp.imicrobe.us:21/projects/26/CAM_PROJ_GOS.asm.fa.gz

# Download all sample reads
wget -r --no-parent -A *.fa.gz ftp://ftp.imicrobe.us/projects/26/samples

Some notes:

  • OMH better on long reads, depends on the missmatchs of the data
  • Select 2 biggest samples of each data source
  • Estimate runtime
  • Get the directories where the data is already downloaded
  • OMH params -k 10 -l 1,2,3 -m 10
  • Do average of minimum distances

Steps:

  • Download 3 datasets (2 similar and 1 distant) and sample ~1000 reads

  • Make sketches

  • Compute distance matrix

  • Make histogram

  • Run hypothesis testing?

  • Running OMH distance 8 min 20 sec for 3 pair sets

About

Code of the project "A Comparative Analysis of MASH and OMH in Metagenomic Datasets" for the bioinformatics course at University of Padova Spring 2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 99.7%
  • Other 0.3%