Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does GFFcleaner remove all features based on transcript ids from gff3? #38

Open
Xiaofei-git opened this issue Aug 19, 2020 · 5 comments

Comments

@Xiaofei-git
Copy link

Xiaofei-git commented Aug 19, 2020

Dear there,

Does GFFcleaner remove all features based on transcript ids from gff3?

Here is the question I want to address in details. I want to remove all lines below from the original gff3 file, but there is only transcript id information of novel_model_471_5f349842 in the list (if not check by looking at the original gff3, I can't know the parent is novel_gene_467_5f349842). So, I am looking for a tool to remove all the feature lines based on transcript ids. As long as there is one of the transcript id in the transcript list, I'd like to remove all lines for this gene in the gff3.

I checked the manual page of GFFcleaner, but I don't find the answer.

Thanks so much!

chrUn	.	gene	30387395	30387595	.	+	.	ID=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A
chrUn	.	mRNA	30387395	30387595	.	+	.	ID=novel_model_471_5f349842;Parent=novel_gene_467_5f349842;Name=%2A%2A%20NO%20NAME%20ASSIGNED%20%2A%2A
chrUn	.	exon	30387395	30387595	.	+	.	ID=novel_model_471_5f349842.exon1;Parent=novel_model_471_5f349842
chrUn	.	CDS	30387395	30387595	.	+	0	ID=cds.novel_model_471_5f349842;Parent=novel_model_471_5f349842
@pjbriggs
Copy link
Member

@Xiaofei-git thanks for your question, I believe you're correct that there is currently no functionality in the GFFcleaner utility which can do what you want.

However it should be possible to add a new option to remove records based on attributes. For example, it could be something like:

--remove-where Parent=novel_gene_467_5f349842

or

--remove-where ID="novel_model_471_5f349842*"

Would something like this do what you want?

@Xiaofei-git
Copy link
Author

Xiaofei-git commented Aug 20, 2020

@Xiaofei-git thanks for your question, I believe you're correct that there is currently no functionality in the GFFcleaner utility which can do what you want.

However it should be possible to add a new option to remove records based on attributes. For example, it could be something like:

--remove-where Parent=novel_gene_467_5f349842

or

--remove-where ID="novel_model_471_5f349842*"

Would something like this do what you want?

Yes, that is what I want to. But, not only a single transcript id, it is a list of transcript ids from a file. So, it is OK the argument is a file for "Parent" or "ID"?
One more thing to double check, it will remove all features/lines related to this transcript id, right? I'd like to remove all the lines (e.g. all 4 lines in my above example) related to this transcript, and also another output for non-matching feature (-v, --invert-match).

Thank you so much!

@Xiaofei-git
Copy link
Author

What is the difference bwtween GFFUtils and the python module of gffutils https://pythonhosted.org/gffutils/ ?
If it is not easy to manipulate to the tool, I can also write python code for this purpose using module of gffutils by myself, although one command line is much easier. Thanks a lot!

@pjbriggs
Copy link
Member

@Xiaofei-git:

For your first question: yes, I can make a --remove-where option that works with a file. However I've just tried to implement this functionality in a new experimental utility, which seemed a bit quicker (for me).

The code is in PR #39; if you wanted to try this then you would need to install from GitHub, for example using virtualenv:

virtualenv venv
source venv/bin/activate
pip install -r https://raw.githubusercontent.com/fls-bioinformatics-core/GFFUtils/master/requirements.txt
pip install git+https://github.com/fls-bioinformatics-core/GFFUtils.git@remove-features-by-attribute-values

which will provide the new gff_remove_features utility; the usage is described in PR #39. Please let me know if you decide to try this out.

For your question about gffutils: it is a separate package and I've never tried to use it, so I can't say what the differences are. If you wanted to take a look at that instead then that's okay too.

@Xiaofei-git
Copy link
Author

@Xiaofei-git:

For your first question: yes, I can make a --remove-where option that works with a file. However I've just tried to implement this functionality in a new experimental utility, which seemed a bit quicker (for me).

The code is in PR #39; if you wanted to try this then you would need to install from GitHub, for example using virtualenv:

virtualenv venv
source venv/bin/activate
pip install -r https://raw.githubusercontent.com/fls-bioinformatics-core/GFFUtils/master/requirements.txt
pip install git+https://github.com/fls-bioinformatics-core/GFFUtils.git@remove-features-by-attribute-values

which will provide the new gff_remove_features utility; the usage is described in PR #39. Please let me know if you decide to try this out.

Thanks so much!
Yes, I'd like to try and let you know.

For your question about gffutils: it is a separate package and I've never tried to use it, so I can't say what the differences are. If you wanted to take a look at that instead then that's okay too.

Yes, I have tried and made it work by coding. But, one command line in functionality would be much easier. So, I really like to check this (PR #39) out. Thanks so much again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants