Skip to content

Conversation

@jchorl
Copy link

@jchorl jchorl commented Oct 2, 2025

I was profiling some code and found the majority of time is spent in array_to_qualitystring. This is particularly impactful on huge files with tons of reads.

The culprit is the allocation, copying, and computation in python. This optimization should allow the logic to all be compiled down to C.

Bench results:

Before:

---------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------
Name (time in us)                           Min       Max     Mean  StdDev   Median     IQR   Outliers  OPS (Kops/s)  Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     75.7550  126.3460  78.9250  2.1202  78.4110  0.8720  1160;1541       12.6703   11453           1
----------------------------------------------------------------------------------------------------------------------------------------

After:

-------------------------------------------------------- benchmark: 1 tests -------------------------------------------------------
Name (time in us)                          Min      Max    Mean  StdDev  Median     IQR  Outliers  OPS (Kops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------
test_fasta_iteration_long_sequences     1.2620  14.7180  1.3264  0.1447  1.3130  0.0200  409;1397      753.9372   45268           1
-----------------------------------------------------------------------------------------------------------------------------------

@jmarshall
Copy link
Member

Thanks, this looks like a good approach.

Eventually I want to add entry points to HTSlib so that we can just call HTSlib's SIMD-optimised versions of these conversions, but this is a big win in the meantime.

@jchorl
Copy link
Author

jchorl commented Oct 14, 2025

@jmarshall what would be the process to get this merged/released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants