Skip to content

Commit

Permalink
fix format of metaphlan. #34
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Jun 26, 2023
1 parent a61bf97 commit f0a35f4
Show file tree
Hide file tree
Showing 4 changed files with 147 additions and 135 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

- `kmcp compute/split-genomes`:
- fix a bug in chunk computation when splitting circular genomes (`--circular`).
- `kmcp profile`:
- fix format of metaphlan. [#34](https://github.com/shenwei356/kmcp/issues/34)

### v0.9.2 - 2023-05-16

Expand Down
184 changes: 92 additions & 92 deletions demo-profiling/mock.kmcp.gz.kmcp.profile.log
Original file line number Diff line number Diff line change
@@ -1,92 +1,92 @@
11:19:41.340 [INFO] using a lot of threads does not always accelerate processing, 4-threads is fast enough
11:19:41.340 [INFO] kmcp v0.9.0
11:19:41.340 [INFO] https://github.com/shenwei356/kmcp
11:19:41.340 [INFO]
11:19:41.340 [INFO] checking input files ...
11:19:41.340 [INFO] 1 input file(s) given
11:19:41.340 [INFO] loading TaxId mapping file ...
11:19:41.341 [INFO] 15 pairs of TaxId mapping values from 1 file(s) loaded
11:19:41.341 [INFO] loading Taxonomy from: taxdump-custom/
11:19:41.341 [INFO] 44 nodes in 8 ranks loaded
11:19:41.341 [INFO] 0 merged nodes loaded
11:19:41.341 [INFO] 0 deleted nodes loaded
11:19:41.341 [INFO] 44 names loaded
11:19:41.341 [INFO]
11:19:41.341 [INFO] -------------------- [main parameters] --------------------
11:19:41.341 [INFO] match filtration:
11:19:41.341 [INFO] maximal false positive rate: 0.010000
11:19:41.341 [INFO] minimal query coverage: 0.550000
11:19:41.341 [INFO] keep matches with the top N scores: N=0
11:19:41.341 [INFO] only keep the full matches: false
11:19:41.341 [INFO] only keep main matches: false, maximal score gap: 0.400000
11:19:41.341 [INFO]
11:19:41.341 [INFO] deciding the existence of a reference:
11:19:41.341 [INFO] preset profiling mode: 1
11:19:41.341 [INFO] minimal number of reads per reference chunk: 5
11:19:41.341 [INFO] minimal number of uniquely matched reads: 2
11:19:41.341 [INFO] minimal proportion of matched reference chunks: 0.600000
11:19:41.341 [INFO] maximal standard deviation of relative depths of all chunks: 2.000000
11:19:41.341 [INFO]
11:19:41.341 [INFO] minimal number of high-confidence uniquely matched reads: 1
11:19:41.341 [INFO] minimal query coverage of high-confidence uniquely matched reads: 0.700000
11:19:41.341 [INFO] minimal proportion of high-confidence uniquely matched reads: 0.100000
11:19:41.341 [INFO]
11:19:41.341 [INFO] taxonomy data:
11:19:41.341 [INFO] taxdump directory: taxdump-custom/
11:19:41.341 [INFO] mapping reference IDs to TaxIds: [taxdump-custom/taxid.map]
11:19:41.341 [INFO]
11:19:41.341 [INFO] reporting:
11:19:41.341 [INFO] default format : mock.kmcp.gz.kmcp.profile
11:19:41.341 [INFO] CAMI format : mock.kmcp.gz.cami.profile
11:19:41.341 [INFO] Sample ID : 0
11:19:41.341 [INFO] MetaPhlAn3 format: mock.kmcp.gz.metaphlan.profile
11:19:41.342 [INFO] Sample ID : 0
11:19:41.342 [INFO] Taxonomy ID :
11:19:41.342 [INFO] Binning result : mock.kmcp.gz.binning.gz
11:19:41.342 [INFO] -------------------- [main parameters] --------------------
11:19:41.342 [INFO]
11:19:41.342 [INFO] stage 1/4: counting matches and unique matches for filtering out low-confidence references
11:19:41.342 [INFO] parsing file: mock.kmcp.gz
11:19:41.595 [INFO] number of references in search result: 15
11:19:41.595 [INFO] number of estimated references: 15
11:19:41.595 [INFO] elapsed time: 253.367592ms
11:19:41.595 [INFO]
11:19:41.595 [INFO] stage 2/4: counting ambiguous matches for correcting matches
11:19:41.595 [INFO] parsing file: mock.kmcp.gz
11:19:41.790 [INFO] elapsed time: 195.122745ms
11:19:41.790 [INFO]
11:19:41.790 [INFO] stage 3/4: recounting matches and unique matches
11:19:41.790 [INFO] parsing file: mock.kmcp.gz
11:19:42.065 [INFO] number of estimated references: 15
11:19:42.065 [INFO] elapsed time: 274.730518ms
11:19:42.065 [INFO]
11:19:42.065 [INFO] stage 4/4: estimating abundance using EM algorithm
11:19:42.065 [INFO] initialization step
11:19:42.065 [INFO] parsing file: mock.kmcp.gz
11:19:42.303 [INFO] number of estimated references: 15
11:19:42.303 [INFO] elapsed time: 237.825485ms
11:19:42.303 [INFO] iteration #1
11:19:42.303 [INFO] parsing file: mock.kmcp.gz
11:19:42.547 [INFO] elapsed time: 243.984881ms
11:19:42.547 [INFO] iteration #2
11:19:42.547 [INFO] parsing file: mock.kmcp.gz
11:19:42.786 [INFO] elapsed time: 239.530777ms
11:19:42.786 [INFO] iteration #3
11:19:42.787 [INFO] parsing file: mock.kmcp.gz
11:19:43.004 [INFO] elapsed time: 217.528222ms
11:19:43.004 [INFO] iteration #4
11:19:43.004 [INFO] parsing file: mock.kmcp.gz
11:19:43.239 [INFO] elapsed time: 235.044219ms
11:19:43.239 [INFO] stop iterating after abundances being converged
11:19:43.239 [INFO] number of estimated references: 15
11:19:43.239 [INFO] elapsed time: 1.174247359s
11:19:43.239 [INFO]
11:19:43.239 [INFO] #input matched reads: 308839, #reads belonging to references in profile: 308839, proportion: 100.000000%
11:19:43.239 [INFO]
11:19:43.239 [INFO] writting binning result...
11:19:43.239 [INFO] parsing file: mock.kmcp.gz
11:19:43.614 [INFO]
11:19:43.614 [INFO] 308839 binning results are save to mock.kmcp.gz.binning.gz
11:19:43.616 [INFO]
11:19:43.616 [INFO] elapsed time: 2.27608748s
11:19:43.616 [INFO]
12:57:51.163 [INFO] using a lot of threads does not always accelerate processing, 4-threads is fast enough
12:57:51.163 [INFO] kmcp v0.9.3
12:57:51.163 [INFO] https://github.com/shenwei356/kmcp
12:57:51.163 [INFO]
12:57:51.163 [INFO] checking input files ...
12:57:51.163 [INFO] 1 input file(s) given
12:57:51.163 [INFO] loading TaxId mapping file ...
12:57:51.169 [INFO] 15 pairs of TaxId mapping values from 1 file(s) loaded
12:57:51.169 [INFO] loading Taxonomy from: taxdump-custom/
12:57:51.169 [INFO] 44 nodes in 8 ranks loaded
12:57:51.170 [INFO] 0 merged nodes loaded
12:57:51.170 [INFO] 0 deleted nodes loaded
12:57:51.170 [INFO] 44 names loaded
12:57:51.170 [INFO]
12:57:51.170 [INFO] -------------------- [main parameters] --------------------
12:57:51.170 [INFO] match filtration:
12:57:51.170 [INFO] maximum false positive rate: 0.010000
12:57:51.170 [INFO] minimum query coverage: 0.550000
12:57:51.170 [INFO] keep matches with the top N scores: N=0
12:57:51.170 [INFO] only keep the full matches: false
12:57:51.170 [INFO] only keep main matches: false, maximum score gap: 0.400000
12:57:51.170 [INFO]
12:57:51.170 [INFO] deciding the existence of a reference:
12:57:51.170 [INFO] preset profiling mode: 1
12:57:51.170 [INFO] minimum number of reads per reference chunk: 5
12:57:51.170 [INFO] minimum number of uniquely matched reads: 2
12:57:51.170 [INFO] minimum proportion of matched reference chunks: 0.600000
12:57:51.170 [INFO] maximum standard deviation of relative depths of all chunks: 2.000000
12:57:51.170 [INFO]
12:57:51.170 [INFO] minimum number of high-confidence uniquely matched reads: 1
12:57:51.170 [INFO] minimum query coverage of high-confidence uniquely matched reads: 0.700000
12:57:51.170 [INFO] minimum proportion of high-confidence uniquely matched reads: 0.100000
12:57:51.170 [INFO]
12:57:51.170 [INFO] taxonomy data:
12:57:51.170 [INFO] taxdump directory: taxdump-custom/
12:57:51.170 [INFO] mapping reference IDs to TaxIds: [taxdump-custom/taxid.map]
12:57:51.171 [INFO]
12:57:51.171 [INFO] reporting:
12:57:51.171 [INFO] default format : mock.kmcp.gz.kmcp.profile
12:57:51.171 [INFO] CAMI format : mock.kmcp.gz.cami.profile
12:57:51.171 [INFO] Sample ID : 0
12:57:51.171 [INFO] MetaPhlAn3 format: mock.kmcp.gz.metaphlan.profile
12:57:51.171 [INFO] Sample ID : 0
12:57:51.171 [INFO] Taxonomy ID :
12:57:51.171 [INFO] Binning result : mock.kmcp.gz.binning.gz
12:57:51.171 [INFO] -------------------- [main parameters] --------------------
12:57:51.171 [INFO]
12:57:51.171 [INFO] stage 1/4: counting matches and unique matches for filtering out low-confidence references
12:57:51.171 [INFO] parsing file: mock.kmcp.gz
12:57:51.443 [INFO] number of references in search result: 15
12:57:51.443 [INFO] number of estimated references: 15
12:57:51.443 [INFO] elapsed time: 272.145891ms
12:57:51.443 [INFO]
12:57:51.443 [INFO] stage 2/4: counting ambiguous matches for correcting matches
12:57:51.443 [INFO] parsing file: mock.kmcp.gz
12:57:51.629 [INFO] elapsed time: 185.568805ms
12:57:51.629 [INFO]
12:57:51.629 [INFO] stage 3/4: recounting matches and unique matches
12:57:51.629 [INFO] parsing file: mock.kmcp.gz
12:57:51.901 [INFO] number of estimated references: 15
12:57:51.901 [INFO] elapsed time: 272.64602ms
12:57:51.901 [INFO]
12:57:51.901 [INFO] stage 4/4: estimating abundance using EM algorithm
12:57:51.901 [INFO] initialization step
12:57:51.901 [INFO] parsing file: mock.kmcp.gz
12:57:52.173 [INFO] number of estimated references: 15
12:57:52.173 [INFO] elapsed time: 272.13733ms
12:57:52.174 [INFO] iteration #1
12:57:52.174 [INFO] parsing file: mock.kmcp.gz
12:57:52.411 [INFO] elapsed time: 237.547446ms
12:57:52.411 [INFO] iteration #2
12:57:52.411 [INFO] parsing file: mock.kmcp.gz
12:57:52.636 [INFO] elapsed time: 225.34593ms
12:57:52.637 [INFO] iteration #3
12:57:52.637 [INFO] parsing file: mock.kmcp.gz
12:57:52.871 [INFO] elapsed time: 234.649731ms
12:57:52.871 [INFO] iteration #4
12:57:52.871 [INFO] parsing file: mock.kmcp.gz
12:57:53.093 [INFO] elapsed time: 221.874667ms
12:57:53.093 [INFO] stop iterating after abundances being converged
12:57:53.093 [INFO] number of estimated references: 15
12:57:53.093 [INFO] elapsed time: 1.191925834s
12:57:53.093 [INFO]
12:57:53.093 [INFO] #input matched reads: 308839, #reads belonging to references in profile: 308839, proportion: 100.000000%
12:57:53.093 [INFO]
12:57:53.093 [INFO] writting binning result...
12:57:53.093 [INFO] parsing file: mock.kmcp.gz
12:57:53.359 [INFO]
12:57:53.359 [INFO] 308839 binning results are save to mock.kmcp.gz.binning.gz
12:57:53.362 [INFO]
12:57:53.362 [INFO] elapsed time: 2.198684473s
12:57:53.362 [INFO]
Loading

0 comments on commit f0a35f4

Please sign in to comment.