-
Notifications
You must be signed in to change notification settings - Fork 6
Model arguments and methods
When the model is instantiated it can be done minimally:
from anarcii import Anarcii
model = Anarcii()
seqs = ["EIVMTQSPDTLSVSPGERATLSCRASESISSNLAWYQQKPGQVPRLLIYGASTRATGVPARFTGSGSGTEFTLTISSLQSEDFAVYYCQQYNNRLPYTFGQGTKLEIKRTVAAP",
"DIVMTQSRDTLSVSPGERATLSCRSSESISSNLEWYQQKPGQVPRLLIYGISTRATGVPQRFTGSGSGTQFTLTISSLQSQDFQVYYCQQYNNRLPYTFGQGTKLEIKRTVQQP"]
results = model.number(seqs)Or the arguments can be used to modify the model being run, inference speed and output data formats.
from anarcii import Anarcii
model = Anarcii(seq_type="tcr",
mode="speed",
batch_size=128,
cpu=True,
ncpu=32,
verbose=True,
max_seqs_len=10_000_000)
seqs = "lots_of_seqs.fasta"
results = model.number(seqs)-
seq_type(str, default="antibody"):
Defines the type of sequence being numbered. One of"antibody","tcr","unknown"or"shark". This in combination with mode (below) determines the model being run. -
mode(str, default="accuracy"):
In combination with *seq_typethis determines the model. One of"accuracy"or"speed". -
batch_size(int, default=8):
Number of sequences processed at a time. See section recommended batch sizes. -
cpu(bool, default=False):
If set toTrue, explicitly forces the use of CPU computation instead of GPU, regardless of availability. -
ncpu(int, default=-1):
Number of CPU cores to use. This does impact speed when running on a GPU (queuing of the dataloader), therefore it is recommended to increase this as much as possible to maximise speed when running on large numbers of sequences. -
verbose(bool, default=False):
Provides more info on the progress during model processing. Useful for monitoring, especially when running on many sequences. -
max_seqs_len(int, default=102400):
Defined the threshold for entering batch mode. If the input fasta file or list of sequences is greater than 102,400 then the model will enter batch mode to save RAM (writing outputs to a.msgpackfile which can be accessed in a memory efficient way). Outputs must be accessed by runningmodel.to_csv("myseqs.csv")etc or interacting with the.msgpackfile. This can be increased if you are not limited by RAM and want the model to return an object in python. See MessagePack.