Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using combination of DeepSomatic and Deepvariant to call variants in cancer cell lines. #42

Closed
sayangsep opened this issue Jan 31, 2025 · 4 comments

Comments

@sayangsep
Copy link

sayangsep commented Jan 31, 2025

Hi All,
We are currently trying to use both Deepvariant and Deepsomatic to identify the genotype of both germline and somatic variants from the cancer cell lines: K562, CaKi2, Hepg2. However, from the ENCODE description it looks like we do not have paired normal. So, our current pipeline is the following:

  1. Use deepsomatic to get GERMLINE and PASS (somatic) labels.
  2. Pull the variants from Deepvariant callset that corresponds to the DeepSomatic GERMLINE calls.
  3. Extract the somatic variant from Deepsomatic and combine with the VCF created in step 2.

Can you please suggest if we need to alter our pipeline? Or is there a better way to get the germline and somatic calls?

Many thanks!

@kishwarshafin
Copy link
Collaborator

Use deepsomatic to get GERMLINE and PASS (somatic) labels.

DeepSomatic inherently filters a lot of "germline-looking" variants in the pre-processing step. So you can't reliably use DeepSomatic for germline variant calling. Please use DeepVariant strictly for all Germline calls. PASS for somatic makes sense.

Pull the variants from Deepvariant callset that corresponds to the DeepSomatic GERMLINE calls.

This is OK, however DeepVariant is not trained for somatic variants. So a lot of high-frequency somatic variants will be called. So be careful about merging the variants.

Extract the somatic variant from Deepsomatic and combine with the VCF created in step 2.

Yes, I believe this would give you the best result if you come up with the best logic on how to combine the callset. For example, if DeepSomatic calls something somatic and it's also called in DV, then take the call as somatic and not germline.

@sayangsep
Copy link
Author

Thanks @kishwarshafin!
Using your recommendation, we got the following results. In terms of number of calls do you spot any weirdness?

HepG2 (Hepatocellular Carcinoma)

DeepVariant PASS calls: 4,534,909
DeepSomatic findings:
Somatic variants: 5,140,669
Germline-labeled: 4,564,997
Final combined set: 9,436,006 variants

K562 (Chronic Myeloid Leukemia)

DeepVariant PASS calls: 4,438,294
DeepSomatic findings:
Somatic variants: 1,384,990
Germline-labeled: 4,063,101
Final combined set: 5,477,903 variants

Caki2 (Clear Cell Renal Cell Carcinoma)

DeepVariant PASS calls: 4,050,603
DeepSomatic findings:
Somatic variants: 241,410
Germline-labeled: 3,957,083
Final combined set: 4,185,757 variants

@kishwarshafin
Copy link
Collaborator

Very hard to tell this way but it does look reasonable I think? Specially the germline numbers.

@sayangsep
Copy link
Author

Thanks @kishwarshafin! I am closing this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants