Skip to content

Commit 4aa5a72

Browse files
committed
DTW integration and few other new features, bug fixes. See the recent changes list in the main README
1 parent 944495c commit 4aa5a72

35 files changed

+3315
-890
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,5 @@ test/*.idx
7474
src/*.o
7575

7676
extern/pod5*
77+
78+
test/evaluation/rawsamble/

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ RawHash performs real-time mapping of nanopore raw signals. When the prefix of r
2020

2121
# Recent changes
2222

23+
* We have integrated the signal alignment functionality with DTW as proposed in RawAlign (see the citation below). The parameters may still not be highly optimized as this is still in experimental stage. Use it with caution.
24+
25+
* Offline overlapping functionality is integrated.
26+
27+
* rmap.c is now rmap.cpp (needs to be compiled with C++) due to the recent DTW integration. We are planning to make it a C-compatible implementation again.
28+
2329
* We have released RawHash2, a more sensitive and faster raw signal mapping mechanism with substantial improvements over RawHash. RawHash2 is available within this repository. You can still use the earlier version, RawHash v1, from [this release](https://github.com/CMU-SAFARI/RawHash/releases/tag/v1.0).
2430

2531
* It is now possible to disable compiling HDF5, SLOW5, and POD5. Please check the `Compiling with HDF5, SLOW5, and POD5` section below for details.
@@ -156,9 +162,9 @@ Please follow the instructions in the [README](test/README.md) file in [test](./
156162
* Ability to specify even/odd channels to eject the reads only from these specified channels.
157163
* Please create issues if you want to see more features that can make RawHash2 easily integratable with nanopore sequencers for any use case.
158164

159-
# Citing RawHash and RawHash2
165+
# Citing RawHash, RawHash2, and RawAlign
160166

161-
To cite RawHash and RawHash2, you can use the following BibTeX:
167+
If you use RawHash in your work, please consider citing the following papers:
162168

163169
```bibtex
164170
@article{firtina_rawhash_2023,
@@ -174,10 +180,30 @@ To cite RawHash and RawHash2, you can use the following BibTeX:
174180
issn = {1367-4811},
175181
url = {https://doi.org/10.1093/bioinformatics/btad272},
176182
}
183+
184+
@article{firtina_rawhash2_2023,
185+
title = {{RawHash2}: Accurate and Fast Mapping of Raw Nanopore Signals using a Hash-based Seeding Mechanism},
186+
author = {Firtina, Can and Soysal, Melina and Lindegger, Joël and Mutlu, Onur},
187+
journal = {arXiv},
188+
year = {2023},
189+
month = sep,
190+
doi = {10.48550/arXiv.2309.05771},
191+
url = {https://doi.org/10.48550/arXiv.2309.05771},
192+
}
193+
194+
@article{lindegger_rawalign_2023,
195+
title = {{RawAlign}: {Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment}},
196+
author = {Lindegger, Joël and Firtina, Can and Ghiasi, Nika Mansouri and Sadrosadati, Mohammad and Alser, Mohammed and Mutlu, Onur},
197+
journal = {arXiv},
198+
year = {2023},
199+
month = oct,
200+
doi = {10.48550/arXiv.2310.05037},
201+
url = {https://doi.org/10.48550/arXiv.2310.05037},
202+
}
177203
```
178204

179205
# Acknowledgement
180206

181-
RawHash2 uses [klib](https://github.com/attractivechaos/klib), some code snippets from [Minimap2](https://github.com/lh3/minimap2) (e.g., pipelining, hash table usage, DP and RMQ-based chaining) and the R9.4 segmentation parameters from [Sigmap](https://github.com/haowenz/sigmap).
207+
RawHash2 uses [klib](https://github.com/attractivechaos/klib), some code snippets from [Minimap2](https://github.com/lh3/minimap2) (e.g., pipelining, hash table usage, DP and RMQ-based chaining) and the R9.4 segmentation parameters from [Sigmap](https://github.com/haowenz/sigmap). RawHash2 uses the DTW integration as proposed in RawAlign (please see the citation details above).
182208

183209
We thank [Melina Soysal](https://github.com/melina2200) and [Marie-Louise Dugua](https://github.com/MarieSarahLouise) for their feedback to improve the RawHash implementation and test scripts.

src/Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ifdef PROFILE
1313
CPPFLAGS+=-g -fno-omit-frame-pointer -march=native -DPROFILERH=1
1414
endif
1515

16-
OBJS= kthread.o kalloc.o bseq.o roptions.o sequence_until.o rutils.o rsig.o revent.o rsketch.o rindex.o lchain.o rseed.o rmap.o hit.o main.o
16+
OBJS= kthread.o kalloc.o bseq.o roptions.o sequence_until.o rutils.o rsig.o revent.o rsketch.o rindex.o lchain.o rseed.o rmap.o dtw.o hit.o main.o
1717

1818
CXX_COMPILER_VERSION ?= $(shell $(CXX) -dumpversion)
1919
SYSTEM_PROCESSOR ?= $(shell uname -m)
@@ -249,7 +249,7 @@ hit.o: chain.h kalloc.h khash.h
249249
rsig.o: hdf5_tools.hpp rh_kvec.h
250250
rseed.o: rsketch.h kalloc.h rutils.h rindex.h
251251
hit.o: rmap.h kalloc.h khash.h
252-
rmap.o: rindex.h rsig.h kthread.h rh_kvec.h rutils.h rsketch.h revent.h sequence_until.h
252+
rmap.o: rindex.h rsig.h kthread.h rh_kvec.h rutils.h rsketch.h revent.h sequence_until.h dtw.h
253253
revent.o: roptions.h kalloc.h
254254
rindex.o: roptions.h rutils.h rsketch.h rsig.h bseq.h khash.h rh_kvec.h kthread.h
255255
main:o rawhash.h ketopt.h rutils.h

src/chain.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ typedef struct {
3939
uint32_t hash;
4040
float div;
4141
mm_extra_t *p;
42+
43+
//DTW related
44+
float alignment_score;
4245
} mm_reg1_t;
4346

4447
typedef struct {
@@ -169,7 +172,8 @@ ri_seg_t *mm_seg_gen(void *km, uint32_t hash, int n_segs, const int *qlens, int
169172
void mm_seg_free(void *km, int n_segs, ri_seg_t *segs);
170173
// void mm_set_mapq(void *km, int n_regs, mm_reg1_t *regs, int min_chain_sc, int match_sc, int rep_len, int is_sr);
171174
// void mm_set_mapq(void *km, int n_regs, mm_reg1_t *regs, int min_chain_sc, int match_sc, int rep_len);
172-
void mm_set_mapq(void *km, int n_regs, mm_reg1_t *regs, int min_chain_sc, int rep_len);
175+
// void mm_set_mapq(void *km, int n_regs, mm_reg1_t *regs, int min_chain_sc, int rep_len);
176+
void mm_set_mapq(void *km, int n_regs, mm_reg1_t *regs, int min_chain_sc, int rep_len, int is_dtw);
173177

174178
#ifdef __cplusplus
175179
}

0 commit comments

Comments
 (0)