-
Notifications
You must be signed in to change notification settings - Fork 1
simplifying code and adding resources.yml #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
lucaspatel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor concerns about testing but otherwise looks good to me.
| cd ../step-2/${sample_name}_split | ||
| # making a copy of the small_LCG before they are removed | ||
| mkdir -p {{output}}/step-2/${sample_name}_small_LCG | ||
| find . -maxdepth 1 -type f -size -512k -print0 | xargs -0 -r cp -t ../${sample_name}_small_LCG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So small_LCG is defined by files in size < 512kb? Probably important to note in the documentation for the PacBio workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting small_LCG to be defined by total genome size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jianshu93, can you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by file size for now. Can be optimized, they are proportational to total genome size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approximate 515,000 bases (half a million), because one character takes one byte approximately.
| else: | ||
| full = full.concat(loaded) | ||
|
|
||
| with h5py.File(f"{base}/{rank}", "w") as out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I'm reading this right, but wouldn't these lines here indicate that you will write full to out on every iteration of the loop, thus rewriting the same path over and over?
|
|
||
| @click.command() | ||
| @click.option("--base", type=click.Path(exists=True), required=True) | ||
| def biom_merge(base): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of logic to merge some BIOM. Is this borrowed from qp-woltka? Do you have tests supporting this function in particular?
| "qiita-files@https://github.com/qiita-spots/qiita-files/archive/master.zip", | ||
| "qiita_client@https://github.com/qiita-spots/qiita_client/archive/master.zip", | ||
| "woltka@git+https://github.com/qiyunzhu/woltka.git#egg=woltka", | ||
| "micov@git+https://github.com/biocore/micov.git#egg=micov", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but you can probably get micov from pip now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we can install from bioconda
jianshu93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the database option. We will probably update it very soon. wol3 or even GTDB.
No description provided.