Skip to content

Difficulty identifying variable regions in close succession #116

@eliottpark

Description

@eliottpark

Hi there! Once again, thank you for building this tool, it's hugely useful.

I came across some difficulty when trying to identify variable regions in close succession within the same chain for pretty typical looking VHs and VLs. This is with scfv mode turned on. For one example VH and VL pair, I tried to understand the minimum linker length required to identify both variable regions in the same prediction. Below are the sequences and identified query start and end for the minimum required length of a G4S linker (modulo by increasing length) and the N-1 example.

Is this behavior something that can be handled in the post-processing or is this fully due to model training?

  • VH + linker + VL: min linker length = 2

Only found 1 domain for Sequence 1.

VH + G + VL --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 115
VH + GG + VL --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 115
Sequence 1-2: start 118, end 224

  • VL + linker + VH: min linker length = 6

Only found 1 domain for Sequence 1.

VL + GGGGS + VH --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 106
VL + GGGGSG + VH --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 106
Sequence 1-2: start 113, end 228

  • VH + linker + VH: min linker length = 7

Only found 1 domain for Sequence 1.

VH + GGGGSG + VH --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGGGSGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 115
VH + GGGGSGG + VH --------------
EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSSGGGGSGGEVQLVESGGGLVQPGGSLRLSCAASGFTFSSYMMSWVRQAPGKGLEWVATISGTGANTYYPDSVKGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCARQLYYFDYWGQGTTVTVSS
Sequence 1-1: start 0, end 115
Sequence 1-2: start 123, end 238

  • VL + linker + VL: min linker length = 15

Only found 1 domain for Sequence 1.

VL + GGGGSGGGGSGGGG + VL --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGGGGSGGGGDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 106
VL + GGGGSGGGGSGGGGS + VL --------------
DIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIKGGGGSGGGGSGGGGSDIQMTQSPSSLSASVGDRVTITCLASQTIGTWLTWYQQKPGKAPKLLIYTATSLADGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQVYSIPWTFGGGTKVEIK
Sequence 1-1: start 0, end 106
Sequence 1-2: start 122, end 228

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions