-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
76 lines (53 loc) · 2.75 KB
/
README
File metadata and controls
76 lines (53 loc) · 2.75 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# ALFRED : Alignment Free Distance Estimator
AlFRED is an Alignment Free Distance Estimator software. Currently, it
provides the following options :
1. Compute k-mismatch longest common substring using the prefix-chopping
algorithm for a pair of sequneces.
2. Compute k-mistmatch average common substring for every pair of
strings present in a database of strings.
# Installation
## Dependencies
### External Dependencies that need to be installed
* A modern, C++11 ready compiler such as `g++` version 4.8 or higher or `clang` version 3.2 or higher.
* The [cmake](www.cmake.org) build system (Version >= 2.8.11).
* A 64-bit operating system. Either Mac OS X or Linux are currently supported.
* Git version control system
### Dependices made available with the gz file
* [googletest](https://code.google.com/p/googletest/) is included as git submodule.
* [libdivsufsort](https://github.com/y-256/libdivsufsort) libraries are copied in the directory.
## Compilation
Download the alfred.tar.gz file available
[here](assets/docs/alfred.tar.gz). Extract the contents as follows:
tar xvzf alfred.tar.gz
Next, create a build directory. For example,
mkdir build
cd build
Finally, configure and build the executable 'alfred.x'. Continuing the example,
cmake ..
make
If successfully built, alfred.x should be available in the build directory.
# Usage
## Preparing Input
Place the input sequences into a single fasta files. Make sure that the
header of the fasta file has the appropriate sequence name. For example,
PHYLIP doesn't accept sequence names longer than 10 characters. Also,
make sure that the input file has one of 'fasta' or 'fa' or 'fas' file
extension. Note that the program accepts only sequences with valid DNA
or protien alphabets. An example is as shown below.
>SeqName
ACGTTAGAGTAAATGGAGTAGAAT
## Compute k-mismatch Average common substrings for a string database
Run the program by providing the pepared input file with -f option,
the output file with the -o option, and the number of mismatches to allow
with -k option. An example is shown below:
build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3
Output is generated in a matrix format, which can be fed directly to PHYLIP
## Compute k-mismatch LCP between two sequences X and Y
Run the program in the same manner shown above, and add the -p option to
stop after generating the k-mismatch LCP values. An example is shown
below:
build/alfred.x -f primates.full.fas -o acs.primates.k3.out -k 3 -p
Output is generated in the json format. json ouput has two arrays, LX
and LY -- one corresponding to string. The array LX is of the length of
the string X, where i-th entry of LX has the position of Y and the
length of longest common substring for the suffix of X starting at i.