Skip to content

Single BND entry ensemble VCF bug #17

@sci-kai

Description

@sci-kai

Hey, I am using minda to first create an SV ensemble VCF from 4 long-read variant callers and then compare this ensemble VCF with different short-read VCFs.
I found that minda cannot find lots of BNDs especially for translocations that are within most VCFs.

For example
In DELLY VCF file:

chr10   119993510       BND00042682     G       G]chr5:125749610]       2100    PASS    PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.5.0;END=119993511;CHR2=chr5;POS2=125749610;PE=0;MAPQ=0;CT=3to3;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=0;HOMLEN=1;SR=35;SRQ=0.997899;CONSENSUS=<BASES>;CE=1.92333;CONSBP=952;RDRATIO=1;SOMATIC      GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-11.4391,-228:114:PASS:19277:39275:19998:2:0:0:38:0    0/1:-105.468,0,-585.468:10000:PASS:92076:166547:74471:2:0:0:104:24

In severus VCF file

chr5	125749610	severus_BND905_2	N	N]chr10:119993510]	60.0	PASS	PRECISE;SVTYPE=BND;MATE_ID=severus_BND905_1;STRANDS=++;DETAILED_TYPE=inv_tra;MAPQ=60.0;CLUSTERID=severus_0	GT:VAF:hVAF:DR:DV	0/1:0.20:0.20,0.00,0.00:139:35
chr10	119993510	severus_BND905_1	N	N]chr5:125749610]	60.0	PASS	PRECISE;SVTYPE=BND;MATE_ID=severus_BND905_2;STRANDS=++;DETAILED_TYPE=inv_tra;MAPQ=60.0;CLUSTERID=severus_0	GT:VAF:hVAF:DR:DV	0/1:0.20:0.20,0.00,0.00:139:35

After running minda ensemble mode: resulting minda ensembl VCF file entry:

chr5	125749610	Minda_8	G	G]chr5:125749610]	.	PASS	SVLEN=-1;SVTYPE=BND;CHR2=chr10;END=119993510;STRANDS=++;SUPP_VEC=delly_BND00042682,severus_BND905_1

and the corresponding sample_support.tsv:

#CHROM_x        POS_x   locus_group_x   ID_list_x       #CHROM_y        POS_y   locus_group_y   ID_list_y       REF_x   REF_y   ALT_x   ALT_y   ensemble      STRANDS SVTYPE  SVLEN   VAF     Minda_IDs       delly   severus condition_A
chr5    125749610       8       ['delly_BND00042682', 'severus_BND905_2']     chr10   119993510       8_1  ['delly_BND00042682', 'severus_BND905_1']      G       G       G]chr5:125749610]       G]chr5:125749610]    True     ++      BND     -1              ['delly_2', 'severus_2']     True    True    2

The ALT column has lost the respective other end of the breakpoint in the VCF file.
I figured out that the issue seems to be in decomposing the single notated breakends like the one in delly.
After decomposing and having the "_x" and "_y" merged entries in the sample_support.tsv, the ALT_x and ALT_y fields are the same.
I think it should be adjusted so that it always has the opposite breakend in both dfs.
I also have a proposed fix that I will open as PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions