Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please report "XXX not found in the TE_SO database" here! #542

Open
oushujun opened this issue Feb 18, 2025 · 0 comments
Open

Please report "XXX not found in the TE_SO database" here! #542

oushujun opened this issue Feb 18, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@oushujun
Copy link
Owner

Well, TE classifications are MESSY! All sort of naming conventions and alias are floating around.

I created the TE_Sequence_Ontology.txt file in EDTA/bin/ to collect all these names, and use the Sequence Ontology system to serve as a standardized naming system. I have been collecting these names FOR YEARS, BUT still, I still encounter new ones!

So, if you see warnings messages like the ones below, please copy the unique warning lines and paste them here.

PLE/Chlamys not found in the TE_SO database, it will not be used to rename sequences in the final annotation.

Warning: DNA/MarinerTc1 not found in the TE_SO database, will use the generic term 'repeat_fragment SO:0001050' to replace it.

Please try not to paste the whole report...

Here's a line that can help to grep the problematic names, please paste the result in this thread:

cat YOUR_EDTA_REPORT.txt | grep TE_SO | sed 's/Warning: //' | awk '{print $1}' | sort | uniq -c | sort -k1,1 -rn

Thank you for contributing!

@oushujun oushujun added the enhancement New feature or request label Feb 18, 2025
@oushujun oushujun pinned this issue Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant