Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462

ShigrafS · 2025-03-16T19:06:30Z

PR Description:

Fixes: #209
Original Issues: #196

Overview

This PR fixes issue [#196](#196) by improving how blacklist BED files are parsed in the cooler balance CLI. Previously, single-line BED files, files with metadata headers (track=), and empty blacklist files were not handled correctly, leading to parsing errors and crashes.

The key fixes include:

Switching blacklist parsing from custom logic to bioframe.read_table.
Skipping track= headers in blacklist files.
Handling empty blacklist files to prevent np.concatenate errors.
Fixing an HDF5 writing scope error by correcting indentation.
Explicitly setting dtype="float64" for HDF5 weights to handle NaN values.
Adjusting test parameters to improve robustness.

Changes Made

✅ Replaced custom blacklist parsing with bioframe.read_table

Previous method failed for single-line BED files due to csv.Sniffer assumptions.
bioframe.read_table now ensures proper BED parsing.

✅ Added header skipping for track= metadata lines

If the first line starts with track=, it is skipped before parsing.
Prevents issues where the header was mistakenly interpreted as a chromosome entry.

✅ Handled empty blacklist files ("") gracefully

Avoided np.concatenate errors by checking for empty results.
Ensured that blacklist filtering works even if no regions are provided.

✅ Fixed indentation issue in HDF5 weight storage

Ensured create_dataset operations remain within the with h5py.File(...) block.
Prevented KeyError when writing to HDF5 files.

✅ Explicitly set dtype="float64" in HDF5 options

Prevented ValueError: cannot convert float NaN to integer.
Ensured robust handling of blacklist-masked bins.

✅ Adjusted test parameters for better stability

Modified test_balancing_with_blacklist parameters (tol=0.1, max-iters=1000, nproc=1)
Improved test reliability across different platforms.

Impact

Fixes parsing for single-line BED files, ensuring correctness.
Supports blacklist files with metadata headers (track=) without errors.
Prevents crashes when the blacklist file is empty.
Improves compatibility with bioframe for BED file handling.
Ensures proper HDF5 weight storage, avoiding scope-related errors.
Tests now pass reliably, improving maintainability.

Closes:

Fixes #196

…pen2c#209)

…r/util.py

…o rea-chromsize in util.py

for more information, see https://pre-commit.ci

Co-authored-by: Nezar Abdennur <[email protected]>

for more information, see https://pre-commit.ci

…zes.

- Added validate_pairs_columns to check column indices against file content. - Updated get_header to use readline() instead of peek() for test compatibility. - Modified pairs to use validate_pairs_columns stream, added kwargs = {}, removed duplicate call.

- Created 7 pytest cases in test_cload.py to test valid/invalid indices, stdin, headers, empty files, and extra fields. - Ensures validate_pairs_columns and pairs handle file and stdin inputs correctly.

ShigrafS and others added 14 commits February 26, 2025 14:18

Improve chromsizes file validation to catch formatting errors early (o…

f093a77

…pen2c#209)

Added a test function for the chromsize check introduced in src/coole…

c8616a8

…r/util.py

Fixed pytest error in test_chromsize_check.py and made minor tweaks t…

a7bc883

…o rea-chromsize in util.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

55986ad

for more information, see https://pre-commit.ci

Move tests to test_util and fix line lengths

34bb070

Remove carriage returns and fix line lengths

db29975

Update src/cooler/util.py

77cd6a1

Co-authored-by: Nezar Abdennur <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ea498d

for more information, see https://pre-commit.ci

Removed verbose and added pandas built in on_bad_lines in def chromsi…

ffa8363

…zes.

Add tests for cload pairs column validation (open2c#135)

8a50d24

- Created 7 pytest cases in test_cload.py to test valid/invalid indices, stdin, headers, empty files, and extra fields. - Ensures validate_pairs_columns and pairs handle file and stdin inputs correctly.

Switched tst_cload.py to LF.

35df58e

Switched cload.py to LF.

3679c60

Fix blacklist BED file parsing in balance.py (open2c#196)

e044a25

vedatonuryilmaz requested a review from nvictus March 17, 2025 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462

Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462

Uh oh!

ShigrafS commented Mar 16, 2025

Uh oh!

Uh oh!

Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462

Are you sure you want to change the base?

Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462

Uh oh!

Conversation

ShigrafS commented Mar 16, 2025

PR Description:

Overview

Changes Made

Impact

Closes:

Uh oh!

Uh oh!