Hey, I am using minda to first create an SV ensemble VCF from 4 long-read variant callers and then compare this ensemble VCF with different short-read VCFs.
I found that minda cannot find lots of BNDs especially for translocations that are within most VCFs.
For example
In DELLY VCF file:
chr10 119993510 BND00042682 G G]chr5:125749610] 2100 PASS PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.5.0;END=119993511;CHR2=chr5;POS2=125749610;PE=0;MAPQ=0;CT=3to3;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=0;HOMLEN=1;SR=35;SRQ=0.997899;CONSENSUS=<BASES>;CE=1.92333;CONSBP=952;RDRATIO=1;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-11.4391,-228:114:PASS:19277:39275:19998:2:0:0:38:0 0/1:-105.468,0,-585.468:10000:PASS:92076:166547:74471:2:0:0:104:24
In severus VCF file
chr5 125749610 severus_BND905_2 N N]chr10:119993510] 60.0 PASS PRECISE;SVTYPE=BND;MATE_ID=severus_BND905_1;STRANDS=++;DETAILED_TYPE=inv_tra;MAPQ=60.0;CLUSTERID=severus_0 GT:VAF:hVAF:DR:DV 0/1:0.20:0.20,0.00,0.00:139:35
chr10 119993510 severus_BND905_1 N N]chr5:125749610] 60.0 PASS PRECISE;SVTYPE=BND;MATE_ID=severus_BND905_2;STRANDS=++;DETAILED_TYPE=inv_tra;MAPQ=60.0;CLUSTERID=severus_0 GT:VAF:hVAF:DR:DV 0/1:0.20:0.20,0.00,0.00:139:35
After running minda ensemble mode: resulting minda ensembl VCF file entry:
chr5 125749610 Minda_8 G G]chr5:125749610] . PASS SVLEN=-1;SVTYPE=BND;CHR2=chr10;END=119993510;STRANDS=++;SUPP_VEC=delly_BND00042682,severus_BND905_1
and the corresponding sample_support.tsv:
#CHROM_x POS_x locus_group_x ID_list_x #CHROM_y POS_y locus_group_y ID_list_y REF_x REF_y ALT_x ALT_y ensemble STRANDS SVTYPE SVLEN VAF Minda_IDs delly severus condition_A
chr5 125749610 8 ['delly_BND00042682', 'severus_BND905_2'] chr10 119993510 8_1 ['delly_BND00042682', 'severus_BND905_1'] G G G]chr5:125749610] G]chr5:125749610] True ++ BND -1 ['delly_2', 'severus_2'] True True 2
The ALT column has lost the respective other end of the breakpoint in the VCF file.
I figured out that the issue seems to be in decomposing the single notated breakends like the one in delly.
After decomposing and having the "_x" and "_y" merged entries in the sample_support.tsv, the ALT_x and ALT_y fields are the same.
I think it should be adjusted so that it always has the opposite breakend in both dfs.
I also have a proposed fix that I will open as PR.
Hey, I am using minda to first create an SV ensemble VCF from 4 long-read variant callers and then compare this ensemble VCF with different short-read VCFs.
I found that minda cannot find lots of BNDs especially for translocations that are within most VCFs.
For example
In DELLY VCF file:
In severus VCF file
After running minda ensemble mode: resulting minda ensembl VCF file entry:
and the corresponding sample_support.tsv:
The ALT column has lost the respective other end of the breakpoint in the VCF file.
I figured out that the issue seems to be in decomposing the single notated breakends like the one in delly.
After decomposing and having the "_x" and "_y" merged entries in the sample_support.tsv, the ALT_x and ALT_y fields are the same.
I think it should be adjusted so that it always has the opposite breakend in both dfs.
I also have a proposed fix that I will open as PR.