Skip to content

How to Convert a DNA Sequence to a Time Series #527

Answered by seanlaw
seanlaw asked this question in Q&A
Discussion options

You must be logged in to vote

According to page 32 of this tutorial by the original authors of the matrix profile, it is claimed that "it is possible to convert DNA
strings to real-valued time series, in a lossless fashion" by doing something like:

import numpy as np

last = 0
dna_ts = np.full(len(dna_seq), -1.0)
for i, nucleotide in enumerate(range(len(dna_seq))):
    if nucleotide == "A":
        dna_ts[i] = last + 2
    elif nucleotide == "G":
        dna_ts[i] = last + 1
    elif nucleotide == "C":
        dna_ts[i] = last - 1
    elif nucleotide == "T":
        dna_ts[i] = last - 2
    else:
        dna_ts[i] = last
        print(f'Warning: Unrecognized nucleotide "{nucleotide}" in index location "{i}"')

    las…

Replies: 1 comment

Comment options

seanlaw
Jan 25, 2022
Maintainer Author

You must be logged in to vote
0 replies
Answer selected by seanlaw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant