Skip to content

Commit 214dbf2

Browse files
author
sizhenli
committed
v1
1 parent aabaa5e commit 214dbf2

File tree

310 files changed

+152019
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

310 files changed

+152019
-1
lines changed

.gitignore

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
bin/*
2+
tmp_rets/*
3+
*.pyc
4+
.vscode/*
5+
6+
# Prerequisites
7+
*.d
8+
9+
# Compiled Object files
10+
*.slo
11+
*.lo
12+
*.o
13+
*.obj
14+
15+
# Precompiled Headers
16+
*.gch
17+
*.pch
18+
19+
# Compiled Dynamic libraries
20+
*.so
21+
*.dylib
22+
*.dll
23+
24+
# Fortran module files
25+
*.mod
26+
*.smod
27+
28+
# Compiled Static libraries
29+
*.lai
30+
*.la
31+
*.a
32+
*.lib
33+
34+
# Executables
35+
*.exe
36+
*.out
37+
*.app

Makefile

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
################################
2+
# Makefile
3+
#
4+
# author: Sizhen
5+
# edited by: 10/2020
6+
################################
7+
8+
CC=g++
9+
# DEPS=LinearPartitionDouble.h doublebp.cpp energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h LinearFoldEval.cpp
10+
# DEPS=newbpLog.cpp LinearPartitionLog.h energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h LinearFoldEval.cpp
11+
# DEPS=LinearPartitionNewprune-C.h energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h LinearFoldEval.cpp newbp-C.cpp
12+
# DEPS=LinearPartitionNewprune.h energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h LinearFoldEval.cpp
13+
# DEPS=newbpLog_pf_type.cpp LinearPartitionLog_pf_type.h energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h LinearFoldEval.cpp
14+
# DEPS= LinearSampling.h energy_parameter.h feature_weight.h intl11.h intl21.h intl22.h utility_v.h utility.h
15+
DEPS=src/LinearTurboFold.h src/SeqFold.h src/LinearPartition/src/LinearPartition.h src/LinearAlignment/src/LinearAlign.h src/probknot.h \
16+
src/ConfigParser.h src/utils/common_utils.h src/utils/defines.h src/utils/structure_object.h \
17+
src/utils/ansi_string.h src/utils/utils.h src/utils/TProgressDialog.h src/utils/MultiSequence.h src/utils/SafeVector.h src/utils/Sequence.h \
18+
src/utils/structure.h src/utils/rna_library.h \
19+
src/utils/phmm_aln.h src/utils/phmm.h src/utils/p_alignment.h \
20+
src/utils/math/matrix.h src/utils/random.h \
21+
src/ProbabilisticModel.h src/Alignment.h src/utils/GuideTree.h
22+
23+
CFLAGS=-std=c++11 -O3
24+
.PHONY : clean linearturbofold
25+
objects=bin/linearturbofold
26+
27+
linearturbofold: src/LinearTurboFold.cpp $(DEPS)
28+
chmod +x linearturbofold
29+
mkdir -p bin
30+
$(CC) src/LinearTurboFold.cpp src/SeqFold.cpp src/LinearPartition/src/LinearPartition.cpp src/LinearAlignment/src/LinearAlign.cpp src/probknot.cpp \
31+
src/ConfigParser.cpp src/utils/common_utils.cpp src/utils/structure_object.cpp \
32+
src/utils/ansi_string.cpp src/utils/utils.cpp src/utils/TProgressDialog.cpp src/utils/MultiSequence.cpp src/utils/Sequence.cpp \
33+
src/utils/structure.cpp src/utils/rna_library.cpp \
34+
src/utils/phmm_aln.cpp src/utils/phmm.cpp src/utils/p_alignment.cpp \
35+
src/utils/math/matrix.cpp src/utils/random.cpp \
36+
src/ProbabilisticModel.cpp src/Alignment.cpp src/utils/GuideTree.cpp \
37+
$(CFLAGS) -Dlv -o bin/linearturbofold
38+
39+
clean:
40+
-rm $(objects)

README.md

+67-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,67 @@
1-
# LinearTurboFold
1+
# LinearTurboFold
2+
3+
This repository contains the C++ source code for the LinearTurboFold project, an end-to-end linear-time algorithm for structural alignment and conserved structure prediction of RNA homologs, which is the first joint-fold-and-align algorithm to scale to full-length SARS-CoV-2 genomes without imposing any constraints on base-pairing distance.
4+
5+
[LinearTurboFold: Fast Folding and Alignment for RNA Homologs with Applications to Coronavirus](https://www.biorxiv.org/content/10.1101/2020.11.23.393488v2)
6+
7+
Sizhen Li, He Zhang, Liang Zhang, Kaibo Liu, Boxiang Liu, David Mathews*, Liang Huang*
8+
9+
\* corresponding author
10+
11+
# Dependency
12+
gcc 4.8.5 or above; <br>
13+
python2.7
14+
15+
# Compile
16+
```
17+
Make
18+
```
19+
20+
# Run
21+
LinearTurboFold can be run with:
22+
```
23+
./linearturbofold -i input.fasta -o output_dir [OPTIONS]
24+
```
25+
The input file should be in the FASTA format. Please see [input.fasta](input.fasta) as an example. <br>
26+
Output a multiple sequence alignment and predicted secondary structures in the output directory.
27+
28+
### OPTIONS
29+
`--it`
30+
The number of iterations (default 3). <br>
31+
`--b1`
32+
The beam size for LinearAlignment (default 100, set 0 for infinite beam). <br>
33+
`--b2`
34+
The beam size for LinearPartition (default 100, set 0 for infinite beam). <br>
35+
`--pf`
36+
Save partition functions for all the sequencs after the last iteration (default False). <br>
37+
`--bpp`
38+
Save base pair probabilities for all the sequencs after the last iteration (default False). <br>
39+
`-v`
40+
Print out alignment, folding and runtime information (default False). <br>
41+
`--th`
42+
Set ThreshKnot threshknot (default 0.3). <br>
43+
`--tkit`
44+
Set ThreshKnot iterations (default 1). <br>
45+
`--tkhl`
46+
Set ThreshKnot minimum helix length (default 3). <br>
47+
48+
### Example
49+
```
50+
./linearturbofold -i input.fasta -o rets/ --pf --bpp
51+
100% [==================================================]
52+
3 iterations Done!
53+
Outputing partition functions to files ...
54+
Outputing base pair probabilities to files ...
55+
Outputing multiple sequence alignment to rets/output.aln...
56+
Outputing structures to files ...
57+
```
58+
59+
# Evalutation Dataset
60+
We used the [RNAStralign](https://rna.urmc.rochester.edu/publications.html) dataset with known alignments and structures to evaluate LinearTurboFold and benchmarks.
61+
62+
# SARS-CoV-2 Dataset and Results
63+
The 25 SARS-CoV-2 and SARS-related genomes analyzed in the paper are listed in [samples25.fasta](data/sars-cov-2_data/samples25.fasta). <br>
64+
For further study by experts,
65+
we provide the whole multiple sequence alignment and predicted structures for all genomes from LinearTurboFold in [sars-cov-2_and_sars-related_25_genomes_msa_structures.txt](sars-cov-2_rets/sars-cov-2_and_sars-related_25_genomes_msa_structures.txt). <br>
66+
Each genome corresponds to three lines: sequence name, aligned sequence and aligned structure, respectively.
67+

0 commit comments

Comments
 (0)