-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi there! Once again, thank you for building this tool, it's hugely useful.
I came across some difficulty when trying to identify variable regions in close succession within the same chain for pretty typical looking VHs and VLs. This is with scfv mode turned on. For one example VH and VL pair, I tried to understand the minimum linker length required to identify both variable regions in the same prediction. Below are the sequences and identified query start and end for the minimum required length of a G4S linker (modulo by increasing length) and the N-1 example.
Is this behavior something that can be handled in the post-processing or is this fully due to model training?
- VH + linker + VL: min linker length = 2
Only found 1 domain for Sequence 1.
VH + G + VL --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 115
VH + GG + VL --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 115
Sequence 1-2: start 118, end 224
- VL + linker + VH: min linker length = 6
Only found 1 domain for Sequence 1.
VL + GGGGS + VH --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 106
VL + GGGGSG + VH --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 106
Sequence 1-2: start 113, end 228
- VH + linker + VH: min linker length = 7
Only found 1 domain for Sequence 1.
VH + GGGGSG + VH --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGGGSGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 115
VH + GGGGSGG + VH --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGGGSGGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 115
Sequence 1-2: start 123, end 238
- VL + linker + VL: min linker length = 15
Only found 1 domain for Sequence 1.
VL + GGGGSGGGGSGGGG + VL --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGGGGSGGGGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 106
VL + GGGGSGGGGSGGGGS + VL --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGGGGSGGGGSDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 106
Sequence 1-2: start 122, end 228