-
Notifications
You must be signed in to change notification settings - Fork 6
Working with MessagePack
MessagePack is an excellent way of working with big data (see https://msgpack.org/). It allows for memory efficient serialisation of large files, this means you can read and write without blowing up RAM usage.
In our tests, we find that the model uses ~7-10 GB of RAM (peak usage during inference) for every 100,000 sequences being processed.
ANARCII decides to write to a msgpack output based on the number of sequences being processed, and whether that number exceeds the max_seqs_len (int, default=102400) value. max_seqs_len is set on model instantiation.
# Lets enter batch mode and create a msgpack file by setting max_seqs_len to 2.
model = Anarcii(
seq_type="antibody",
mode="accuracy",
max_seqs_len=2,
)You can reduce max_seqs_len to ensure the model does not run out of RAM when processing and just write to a .msgpack file, or else increase max_seqs_len when working with lots of RAM (1M sequences will use ~100G of RAM) so that outputs are returned as a Python dictionary object held in memory which can be processed downstream.
seq = "./example_data/monoclonals_clean.fasta"
# Results is the path to the msgpack file.
results = model.number(seq)The model will print the path to the serialised msgpack file.
Serialising output to anarcii-7a357575-3f3b-45d7-82c9-abe13ceb4c98-imgt.msgpack as the number of sequences exceeds the serialisation limit of 2.The returned object is also the path.
print(results)Returns
anarcii-7a357575-3f3b-45d7-82c9-abe13ceb4c98-imgt.msgpackRegardless of being in batch mode or not, any ANARCII output can be written to a MessagePack with the path of choice.
model.to_msgpack("my_file_path.msgpack")Last output saved to my_file_path.msgpack in scheme: None.So you have written your ANARCII numbered sequences to a .msgpack file. How do you explore the contents?
Analogous to controlling the batch size when writing sequences to a msgpack file, you can control the chunk size being read from a msgpack file back into Python to avoid blowing up RAM usage.
# import from_msgpack_map() from utils.
from anarcii.utils import from_msgpack_map
# create a generator object
gen_object = from_msgpack_map(results, chunk_size=102400)
# This can be iterated over in default chunk size of 102,400.
# To access more or less of the msgpack contents per iteration (by calling next() or in loop iterations),
# simply modify the chunk_size parameter in the from_msgpack_map() function.
dt = next(gen_object)
print(dt)The contents of dt will not exceed 102,400 (chunk_size) sequences.
{'sp|P01629|KV2A4_MOUSE Ig kappa chain V-II region 2S1.3 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'I'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'A'), ((8, ' '), 'A'), ((9, ' '), 'F'), ((10, ' '), 'S'), ((11, ' '), 'N'), ((12, ' '), 'P'), ((13, ' '), 'V'), ((14, ' '), 'T'), ((15, ' '), 'L'), ((16, ' '), 'G'), ((17, ' '), 'T'), ((18, ' '), 'S'), ((19, ' '), 'A'), ((20, ' '), 'S'), ((21, ' '), 'F'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'K'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'Q'), ((31, ' '), 'Q'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'K'), ((35, ' '), 'G'), ((36, ' '), 'I'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'Y'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'S'), ((50, ' '), 'P'), ((51, ' '), 'Q'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'Q'), ((57, ' '), 'M'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'D'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'R'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'V'), ((100, ' '), 'G'), ((101, ' '), 'V'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'A'), ((106, ' '), 'N'), ((107, ' '), 'L'), ((108, ' '), 'Q'), ((109, ' '), 'E'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'L'), ((115, ' '), 'P'), ((116, ' '), 'Y'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.36202621459961, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}, 'sp|P01630|KV2A6_MOUSE Ig kappa chain V-II region 7S34.1 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'I'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'A'), ((9, ' '), 'P'), ((10, ' '), 'S'), ((11, ' '), 'A'), ((12, ' '), 'L'), ((13, ' '), 'V'), ((14, ' '), 'T'), ((15, ' '), 'P'), ((16, ' '), 'G'), ((17, ' '), 'E'), ((18, ' '), 'S'), ((19, ' '), 'V'), ((20, ' '), 'S'), ((21, ' '), 'I'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'K'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'L'), ((31, ' '), 'H'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'N'), ((35, ' '), 'G'), ((36, ' '), 'N'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'Y'), ((41, ' '), 'W'), ((42, ' '), 'F'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'R'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'C'), ((50, ' '), 'P'), ((51, ' '), 'Q'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'R'), ((57, ' '), 'M'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'A'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'R'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'V'), ((100, ' '), 'G'), ((101, ' '), 'V'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'M'), ((106, ' '), 'Q'), ((107, ' '), 'Q'), ((108, ' '), 'R'), ((109, ' '), 'E'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'Y'), ((115, ' '), 'P'), ((116, ' '), 'Y'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.411128997802734, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}, 'sp|P01631|KV2A7_MOUSE Ig kappa chain V-II region 26-10 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'V'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'P'), ((9, ' '), 'L'), ((10, ' '), 'S'), ((11, ' '), 'L'), ((12, ' '), 'P'), ((13, ' '), 'V'), ((14, ' '), 'S'), ((15, ' '), 'L'), ((16, ' '), 'G'), ((17, ' '), 'D'), ((18, ' '), 'Q'), ((19, ' '), 'A'), ((20, ' '), 'S'), ((21, ' '), 'I'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'Q'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'V'), ((31, ' '), 'H'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'N'), ((35, ' '), 'G'), ((36, ' '), 'N'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'N'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'A'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'S'), ((50, ' '), 'P'), ((51, ' '), 'K'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'K'), ((57, ' '), 'V'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'R'), ((68, ' '), 'F'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'D'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'K'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'L'), ((100, ' '), 'G'), ((101, ' '), 'I'), ((102, ' '), 'Y'), ((103, ' '), 'F'), ((104, ' '), 'C'), ((105, ' '), 'S'), ((106, ' '), 'Q'), ((107, ' '), 'T'), ((108, ' '), 'T'), ((109, ' '), 'H'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'V'), ((115, ' '), 'P'), ((116, ' '), 'P'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.66534996032715, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}, 'sp|P01691|KV10_RABIT Ig kappa chain V region 12F2 (Fragment) OS=Oryctolagus cuniculus OX=9986 PE=2 SV=1': {'numbering': (((1, ' '), 'A'), ((2, ' '), 'Y'), ((3, ' '), 'D'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'P'), ((9, ' '), 'A'), ((10, ' '), 'S'), ((11, ' '), 'V'), ((12, ' '), 'E'), ((13, ' '), 'V'), ((14, ' '), 'A'), ((15, ' '), 'V'), ((16, ' '), 'G'), ((17, ' '), 'G'), ((18, ' '), 'T'), ((19, ' '), 'V'), ((20, ' '), 'T'), ((21, ' '), 'I'), ((22, ' '), 'K'), ((23, ' '), 'C'), ((24, ' '), 'Q'), ((25, ' '), 'A'), ((26, ' '), 'S'), ((27, ' '), 'Q'), ((28, ' '), 'S'), ((29, ' '), 'I'), ((30, ' '), '-'), ((31, ' '), '-'), ((32, ' '), '-'), ((33, ' '), '-'), ((34, ' '), '-'), ((35, ' '), '-'), ((36, ' '), 'S'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'S'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'Q'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'R'), ((50, ' '), 'P'), ((51, ' '), 'K'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'R'), ((57, ' '), 'A'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'T'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'S'), ((73, ' '), '-'), ((74, ' '), 'S'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'K'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'E'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'T'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'G'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'C'), ((97, ' '), 'A'), ((98, ' '), 'D'), ((99, ' '), 'A'), ((100, ' '), 'A'), ((101, ' '), 'T'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'Q'), ((106, ' '), 'Q'), ((107, ' '), 'G'), ((108, ' '), 'W'), ((109, ' '), 'S'), ((110, ' '), 'S'), ((111, ' '), '-'), ((112, ' '), 'S'), ((113, ' '), 'N'), ((114, ' '), 'V'), ((115, ' '), 'E'), ((116, ' '), 'N'), ((117, ' '), 'V'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'E'), ((124, ' '), 'V'), ((125, ' '), 'V'), ((126, ' '), 'V'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 29.50858497619629, 'query_start': 6, 'query_end': 115, 'error': None, 'scheme': 'imgt'}}In from_msgpack_map() set chunk_size = 1.
To access the each chunk, simply iterate over the generator again.
gen_object = from_msgpack_map(results, chunk_size=1)
for i in gen_object:
print("Length of contents: ", len(i))
print(i)Length of contents: 1
{'sp|P01629|KV2A4_MOUSE Ig kappa chain V-II region 2S1.3 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'I'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'A'), ((8, ' '), 'A'), ((9, ' '), 'F'), ((10, ' '), 'S'), ((11, ' '), 'N'), ((12, ' '), 'P'), ((13, ' '), 'V'), ((14, ' '), 'T'), ((15, ' '), 'L'), ((16, ' '), 'G'), ((17, ' '), 'T'), ((18, ' '), 'S'), ((19, ' '), 'A'), ((20, ' '), 'S'), ((21, ' '), 'F'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'K'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'Q'), ((31, ' '), 'Q'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'K'), ((35, ' '), 'G'), ((36, ' '), 'I'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'Y'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'S'), ((50, ' '), 'P'), ((51, ' '), 'Q'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'Q'), ((57, ' '), 'M'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'D'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'R'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'V'), ((100, ' '), 'G'), ((101, ' '), 'V'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'A'), ((106, ' '), 'N'), ((107, ' '), 'L'), ((108, ' '), 'Q'), ((109, ' '), 'E'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'L'), ((115, ' '), 'P'), ((116, ' '), 'Y'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.36202621459961, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}}
Length of contents: 1
{'sp|P01630|KV2A6_MOUSE Ig kappa chain V-II region 7S34.1 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'I'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'A'), ((9, ' '), 'P'), ((10, ' '), 'S'), ((11, ' '), 'A'), ((12, ' '), 'L'), ((13, ' '), 'V'), ((14, ' '), 'T'), ((15, ' '), 'P'), ((16, ' '), 'G'), ((17, ' '), 'E'), ((18, ' '), 'S'), ((19, ' '), 'V'), ((20, ' '), 'S'), ((21, ' '), 'I'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'K'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'L'), ((31, ' '), 'H'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'N'), ((35, ' '), 'G'), ((36, ' '), 'N'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'Y'), ((41, ' '), 'W'), ((42, ' '), 'F'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'R'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'C'), ((50, ' '), 'P'), ((51, ' '), 'Q'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'R'), ((57, ' '), 'M'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'A'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'R'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'V'), ((100, ' '), 'G'), ((101, ' '), 'V'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'M'), ((106, ' '), 'Q'), ((107, ' '), 'Q'), ((108, ' '), 'R'), ((109, ' '), 'E'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'Y'), ((115, ' '), 'P'), ((116, ' '), 'Y'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.411128997802734, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}}
Length of contents: 1
{'sp|P01631|KV2A7_MOUSE Ig kappa chain V-II region 26-10 OS=Mus musculus OX=10090 PE=1 SV=1': {'numbering': (((1, ' '), 'D'), ((2, ' '), 'V'), ((3, ' '), 'V'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'P'), ((9, ' '), 'L'), ((10, ' '), 'S'), ((11, ' '), 'L'), ((12, ' '), 'P'), ((13, ' '), 'V'), ((14, ' '), 'S'), ((15, ' '), 'L'), ((16, ' '), 'G'), ((17, ' '), 'D'), ((18, ' '), 'Q'), ((19, ' '), 'A'), ((20, ' '), 'S'), ((21, ' '), 'I'), ((22, ' '), 'S'), ((23, ' '), 'C'), ((24, ' '), 'R'), ((25, ' '), 'S'), ((26, ' '), 'S'), ((27, ' '), 'Q'), ((28, ' '), 'S'), ((29, ' '), 'L'), ((30, ' '), 'V'), ((31, ' '), 'H'), ((32, ' '), 'S'), ((33, ' '), '-'), ((34, ' '), 'N'), ((35, ' '), 'G'), ((36, ' '), 'N'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'N'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'L'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'A'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'S'), ((50, ' '), 'P'), ((51, ' '), 'K'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'K'), ((57, ' '), 'V'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'N'), ((67, ' '), 'R'), ((68, ' '), 'F'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'P'), ((73, ' '), '-'), ((74, ' '), 'D'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'S'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'D'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'K'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'R'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'A'), ((97, ' '), 'E'), ((98, ' '), 'D'), ((99, ' '), 'L'), ((100, ' '), 'G'), ((101, ' '), 'I'), ((102, ' '), 'Y'), ((103, ' '), 'F'), ((104, ' '), 'C'), ((105, ' '), 'S'), ((106, ' '), 'Q'), ((107, ' '), 'T'), ((108, ' '), 'T'), ((109, ' '), 'H'), ((110, ' '), '-'), ((111, ' '), '-'), ((112, ' '), '-'), ((113, ' '), '-'), ((114, ' '), 'V'), ((115, ' '), 'P'), ((116, ' '), 'P'), ((117, ' '), 'T'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'K'), ((124, ' '), 'L'), ((125, ' '), 'E'), ((126, ' '), 'I'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 30.66534996032715, 'query_start': 0, 'query_end': 111, 'error': None, 'scheme': 'imgt'}}
Length of contents: 1
{'sp|P01691|KV10_RABIT Ig kappa chain V region 12F2 (Fragment) OS=Oryctolagus cuniculus OX=9986 PE=2 SV=1': {'numbering': (((1, ' '), 'A'), ((2, ' '), 'Y'), ((3, ' '), 'D'), ((4, ' '), 'M'), ((5, ' '), 'T'), ((6, ' '), 'Q'), ((7, ' '), 'T'), ((8, ' '), 'P'), ((9, ' '), 'A'), ((10, ' '), 'S'), ((11, ' '), 'V'), ((12, ' '), 'E'), ((13, ' '), 'V'), ((14, ' '), 'A'), ((15, ' '), 'V'), ((16, ' '), 'G'), ((17, ' '), 'G'), ((18, ' '), 'T'), ((19, ' '), 'V'), ((20, ' '), 'T'), ((21, ' '), 'I'), ((22, ' '), 'K'), ((23, ' '), 'C'), ((24, ' '), 'Q'), ((25, ' '), 'A'), ((26, ' '), 'S'), ((27, ' '), 'Q'), ((28, ' '), 'S'), ((29, ' '), 'I'), ((30, ' '), '-'), ((31, ' '), '-'), ((32, ' '), '-'), ((33, ' '), '-'), ((34, ' '), '-'), ((35, ' '), '-'), ((36, ' '), 'S'), ((37, ' '), 'T'), ((38, ' '), 'Y'), ((39, ' '), 'L'), ((40, ' '), 'S'), ((41, ' '), 'W'), ((42, ' '), 'Y'), ((43, ' '), 'Q'), ((44, ' '), 'Q'), ((45, ' '), 'K'), ((46, ' '), 'P'), ((47, ' '), 'G'), ((48, ' '), 'Q'), ((49, ' '), 'R'), ((50, ' '), 'P'), ((51, ' '), 'K'), ((52, ' '), 'L'), ((53, ' '), 'L'), ((54, ' '), 'I'), ((55, ' '), 'Y'), ((56, ' '), 'R'), ((57, ' '), 'A'), ((58, ' '), '-'), ((59, ' '), '-'), ((60, ' '), '-'), ((61, ' '), '-'), ((62, ' '), '-'), ((63, ' '), '-'), ((64, ' '), '-'), ((65, ' '), 'S'), ((66, ' '), 'T'), ((67, ' '), 'L'), ((68, ' '), 'A'), ((69, ' '), 'S'), ((70, ' '), 'G'), ((71, ' '), 'V'), ((72, ' '), 'S'), ((73, ' '), '-'), ((74, ' '), 'S'), ((75, ' '), 'R'), ((76, ' '), 'F'), ((77, ' '), 'K'), ((78, ' '), 'G'), ((79, ' '), 'S'), ((80, ' '), 'G'), ((81, ' '), '-'), ((82, ' '), '-'), ((83, ' '), 'S'), ((84, ' '), 'G'), ((85, ' '), 'T'), ((86, ' '), 'E'), ((87, ' '), 'F'), ((88, ' '), 'T'), ((89, ' '), 'L'), ((90, ' '), 'T'), ((91, ' '), 'I'), ((92, ' '), 'S'), ((93, ' '), 'G'), ((94, ' '), 'V'), ((95, ' '), 'E'), ((96, ' '), 'C'), ((97, ' '), 'A'), ((98, ' '), 'D'), ((99, ' '), 'A'), ((100, ' '), 'A'), ((101, ' '), 'T'), ((102, ' '), 'Y'), ((103, ' '), 'Y'), ((104, ' '), 'C'), ((105, ' '), 'Q'), ((106, ' '), 'Q'), ((107, ' '), 'G'), ((108, ' '), 'W'), ((109, ' '), 'S'), ((110, ' '), 'S'), ((111, ' '), '-'), ((112, ' '), 'S'), ((113, ' '), 'N'), ((114, ' '), 'V'), ((115, ' '), 'E'), ((116, ' '), 'N'), ((117, ' '), 'V'), ((118, ' '), 'F'), ((119, ' '), 'G'), ((120, ' '), 'G'), ((121, ' '), 'G'), ((122, ' '), 'T'), ((123, ' '), 'E'), ((124, ' '), 'V'), ((125, ' '), 'V'), ((126, ' '), 'V'), ((127, ' '), 'K'), ((128, ' '), '-')), 'chain_type': 'K', 'score': 29.50858497619629, 'query_start': 6, 'query_end': 115, 'error': None, 'scheme': 'imgt'}}Finally, if ANARCII has output your large number of numbered sequences into a .msgpack file then you can simply convert this to a .csv file with a path of your choice.
model.to_csv("my_csv.csv")This will also work if you have converted to an alternate numbering scheme.