-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhow-to-use-bioinformatics-libraries-in-jshell.html
154 lines (128 loc) · 9.06 KB
/
how-to-use-bioinformatics-libraries-in-jshell.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title> How-to: Use Bioinformatics Libraries in JShell | Computational Approaches to Biological Problems
</title>
<link rel="canonical" href="./how-to-use-bioinformatics-libraries-in-jshell.html">
<link rel="stylesheet" href="./theme/css/bootstrap.min.css">
<link rel="stylesheet" href="./theme/css/fontawesome.min.css">
<link rel="stylesheet" href="./theme/css/pygments/default.min.css">
<link rel="stylesheet" href="./theme/css/theme.css">
<meta name="description" content="I'm a big fan of Java's new read-evaluate-print-loop (REPL) environment, JShell. For beginners in Java, JShell provides a fantastic environment to troubleshoot issues and get comfortable with Java/new packages. As this is a bioinformatics blog, I will be demonstrating how to read FASTA files using the htsjdk (High-throughput sequencing …">
</head>
<body>
<header class="header">
<div class="container">
<div class="row">
<div class="col-sm-4">
<a href="./">
<img class="img-fluid rounded" src=./images/LucasBoatwrightHeadShot-161x215.jpeg alt="Computational Approaches to Biological Problems">
</a>
</div>
<div class="col-sm-8">
<h1 class="title"><a href="./">Computational Approaches to Biological Problems</a></h1>
<p class="text-muted">and other biological inklings</p>
<ul class="list-inline">
<li class="list-inline-item"><a href="./index.html">Home</a></li>
<li class="list-inline-item"><a href="./archives.html">posts</a></li>
<li class="list-inline-item"><a href="https://github.com/jlboat">GitHub</a></li>
<li class="list-inline-item"><a href="https://scholar.google.com/citations?hl=en&user=q5G2bJEAAAAJ&view_op=list_works&gmla=AJsN-F4gfKvkjBIT6Nv8Xy97IIUHuhFi9jF8x5ntZveyPGo_xO2ws-_5NjCw-_9z9CD2mWJnlbL1DeMd-0l-IUNYPsQ1NR4wu8s_l6YjZYlsb9mTQ1d8TX8">Pubs</a></li>
<li class="list-inline-item text-muted">|</li>
<li class="list-inline-item"><a href="./pages/about-me.html">About me</a></li>
</ul>
</div>
</div> </div>
</header>
<div class="main">
<div class="container">
<h1> How-to: Use Bioinformatics Libraries in JShell
</h1>
<hr>
<article class="article">
<header>
<ul class="list-inline">
<li class="list-inline-item text-muted" title="2018-11-13T00:00:00-05:00">
<i class="fas fa-clock"></i>
Tue 13 November 2018
</li>
<li class="list-inline-item">
<i class="fas fa-folder-open"></i>
<a href="./category/how-to.html">How-to</a>
</li>
<li class="list-inline-item">
<i class="fas fa-user"></i>
<a href="./author/j-lucas-boatwright.html">J. Lucas Boatwright</a> </li>
<li class="list-inline-item">
<i class="fas fa-tag"></i>
<a href="./tag/bioinformatics.html">#bioinformatics</a>, <a href="./tag/education.html">#education</a>, <a href="./tag/java.html">#java</a> </li>
</ul>
</header>
<div class="content">
<p>I'm a big fan of Java's new read-evaluate-print-loop (REPL) environment, JShell. For beginners in Java, JShell provides a fantastic environment to troubleshoot issues and get comfortable with Java/new packages. As this is a bioinformatics blog, I will be demonstrating how to read FASTA files using the htsjdk (High-throughput sequencing Java Development Kit) library.</p>
<p>First, we have to import all of the appropriate package information. Here, I've downloaded and compiled the library using my IntelliJ IDE (integrated development environment). So, I'm exporting the CLASSPATH to the Java bytecode and designating the CLASSPATH to JShell in bash.</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span> <span class="nv">CLASSPATH</span><span class="o">=</span><span class="s2">"/home/lboat/IdeaProjects/htsjdk/build/classes/java/main/"</span>
jshell -c /home/lboat/IdeaProjects/htsjdk/build/classes/java/main/
</code></pre></div>
<p>Now that JShell is open, we can import all of the necessary functions.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">htsjdk.samtools.reference.ReferenceSequenceFile</span>
<span class="kn">import</span> <span class="nn">htsjdk.samtools.reference.ReferenceSequenceFileFactory</span>
<span class="kn">import</span> <span class="nn">htsjdk.samtools.reference.FastaSequenceIndexCreator</span>
<span class="kn">import</span> <span class="nn">htsjdk.samtools.reference.FastaSequenceIndex</span>
<span class="kn">import</span> <span class="nn">java.nio.file.Paths</span>
<span class="kn">import</span> <span class="nn">java.nio.file.Path</span>
</code></pre></div>
<p>Now we get the path to the FASTA file and generate FASTA object using a FactoryBuilder (A specialized Java class for generating objects).</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Get the path to our file</span>
<span class="n">Path</span> <span class="n">path</span> <span class="o">=</span> <span class="n">Paths</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"big.fasta"</span><span class="p">)</span>
<span class="c1">// Generate a FASTA object</span>
<span class="n">ReferenceSequenceFile</span> <span class="n">fasta</span> <span class="o">=</span> <span class="n">ReferenceSequenceFileFactory</span><span class="p">.</span><span class="na">getReferenceSequenceFile</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
<span class="c1">// Index our FASTA file</span>
<span class="n">FastaSequenceIndex</span> <span class="n">fsi</span> <span class="o">=</span> <span class="n">FastaSequenceIndexCreator</span><span class="p">.</span><span class="na">buildFromFasta</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
</code></pre></div>
<p>The FASTA file is now indexed and we can readily check some specific metrics for this FASTA.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Determine the number of sequences in our file</span>
<span class="n">fsi</span><span class="p">.</span><span class="na">size</span><span class="p">()</span>
<span class="c1">// Get sequence-specific information</span>
<span class="n">fsi</span><span class="p">.</span><span class="na">getIndexEntry</span><span class="p">(</span><span class="s">"sequence_5"</span><span class="p">)</span>
<span class="c1">// contig sequence_5; location 5152; size 999; basesPerLine 60; bytesPerLine 61</span>
<span class="c1">// Notice, our FASTA isn't recognized as indexed</span>
<span class="n">fasta</span><span class="p">.</span><span class="na">isIndexed</span><span class="p">()</span>
<span class="c1">// false</span>
<span class="c1">// Write our index </span>
<span class="n">fsi</span><span class="p">.</span><span class="na">write</span><span class="p">(</span><span class="n">Paths</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="s">"big.fa.fai"</span><span class="p">))</span>
<span class="c1">// Re-open the FASTA using this index</span>
<span class="n">fasta</span> <span class="o">=</span> <span class="n">ReferenceSequenceFileFactory</span><span class="p">.</span><span class="na">getReferenceSequenceFile</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
<span class="c1">// Notice, our FASTA is now indexed</span>
<span class="n">fasta</span><span class="p">.</span><span class="na">isIndexed</span><span class="p">()</span>
<span class="c1">// true</span>
<span class="c1">// You may want to check the first sequnce</span>
<span class="n">ReferenceSequence</span> <span class="n">rs</span> <span class="o">=</span> <span class="n">fasta</span><span class="p">.</span><span class="na">nextSequence</span><span class="p">()</span>
<span class="c1">// Look at its name</span>
<span class="n">rs</span><span class="p">.</span><span class="na">getName</span><span class="p">()</span>
<span class="c1">// Look at the sequence</span>
<span class="n">rs</span><span class="p">.</span><span class="na">getBaseString</span><span class="p">()</span>
</code></pre></div>
</div>
</article>
</div>
</div>
<footer class="footer">
<div class="container">
<div class="row">
<ul class="col-sm-6 list-inline">
<li class="list-inline-item"><a href="./authors.html">Authors</a></li>
<li class="list-inline-item"><a href="./archives.html">Archives</a></li>
<li class="list-inline-item"><a href="./categories.html">Categories</a></li>
<li class="list-inline-item"><a href="./tags.html">Tags</a></li>
</ul>
<p class="col-sm-6 text-sm-right text-muted">
Generated by <a href="https://github.com/getpelican/pelican" target="_blank">Pelican</a>
/ <a href="https://github.com/nairobilug/pelican-alchemy" target="_blank">✨</a>
</p>
</div> </div>
</footer>
</body>
</html>