Skip to content

Conversation

@nh13
Copy link
Contributor

@nh13 nh13 commented Jul 4, 2025

Add new IteratorColumnRecords class that generates pileup columns from a
collection of AlignedSegment objects using htslib's push-based pileup API
(bam_plp_push/bam_plp64_next).

Key features:

  • Accepts any iterable of AlignedSegments (requires coordinate-sorted order)
  • Supports optional reference sequence (fastafile parameter)
  • Includes add_reference(), has_reference(), and seq_len property
  • Configurable min_base_quality parameter
  • Uses 64-bit position types (hts_pos_t) for extended chromosome support

Implementation notes:

  • Uses bam_plp_push/bam_plp64_next instead of callback-based approach
  • Records consumed during initialization for push-based API
  • Includes required NULL push to signal end-of-input
  • Leverages 64-bit APIs (bam_plp64_next, faidx_fetch_seq64) from PR Update pysam/libchtslib.pxd to HTSlib 1.21 htslib/*.h declarations #1362
  • Uses opaque bam_plp_s struct (no direct field access needed)

Testing:

  • 12 new tests covering reference support, edge cases, and parameters
  • Documented known limitation: minor depth differences vs samtools mpileup
    due to push-based vs pull-based filtering differences

Changes:

  • Add IteratorColumnRecords class in pysam/libcalignmentfile.pyx
  • Update to use 64-bit pileup APIs (bam_plp64_next, faidx_fetch_seq64)
  • Add type stub in pysam/libcalignmentfile.pyi
  • Add parameterized to test dependencies for parameterized tests
  • Update CI workflows to install parameterized package

Closes #1352

cdef class IteratorColumnAll(IteratorColumn):
pass

cdef class IteratorColumnRecords:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cannot extend IteratorColumn since IteratorColumn requires a SAM file.

@nh13
Copy link
Contributor Author

nh13 commented Jul 5, 2025

@jmarshall #1353

@jmarshall
Copy link
Member

I've now finally finished up and merged #1362. Can I interest you in rebasing this onto current master and ideally updating it to use bam_plp64_next() et al?

@nh13 nh13 force-pushed the feat/pileup branch 5 times, most recently from 08461e0 to 4d6e755 Compare November 22, 2025 04:28
…ents

Add new IteratorColumnRecords class that generates pileup columns from a
collection of AlignedSegment objects using htslib's push-based pileup API
(bam_plp_push/bam_plp64_next).

Key features:
- Accepts any iterable of AlignedSegments (requires coordinate-sorted order)
- Supports optional reference sequence (fastafile parameter)
- Includes add_reference(), has_reference(), and seq_len property
- Configurable min_base_quality parameter
- Uses 64-bit position types (hts_pos_t) for extended chromosome support

Implementation notes:
- Uses bam_plp_push/bam_plp64_next instead of callback-based approach
- Records consumed during initialization for push-based API
- Includes required NULL push to signal end-of-input
- Leverages 64-bit APIs (bam_plp64_next, faidx_fetch_seq64) from PR pysam-developers#1362
- Uses opaque bam_plp_s struct (no direct field access needed)

Testing:
- 12 new tests covering reference support, edge cases, and parameters
- Documented known limitation: minor depth differences vs samtools mpileup
  due to push-based vs pull-based filtering differences

Changes:
- Add IteratorColumnRecords class in pysam/libcalignmentfile.pyx
- Update to use 64-bit pileup APIs (bam_plp64_next, faidx_fetch_seq64)
- Add type stub in pysam/libcalignmentfile.pyi
- Add parameterized to test dependencies for parameterized tests
- Update CI workflows to install parameterized package

Closes pysam-developers#1352
@nh13
Copy link
Contributor Author

nh13 commented Nov 22, 2025

@jmarshall done! Ready for re-review!

@nh13 nh13 marked this pull request as ready for review November 22, 2025 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create an iterator over PileupColumns from a list of AlignedSegments

2 participants