Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Description:
Fixes: #209
Original Issues: #196
Overview
This PR fixes issue [#196](#196) by improving how blacklist BED files are parsed in the
cooler balance
CLI. Previously, single-line BED files, files with metadata headers (track=
), and empty blacklist files were not handled correctly, leading to parsing errors and crashes.The key fixes include:
bioframe.read_table
.track=
headers in blacklist files.np.concatenate
errors.dtype="float64"
for HDF5 weights to handleNaN
values.Changes Made
✅ Replaced custom blacklist parsing with
bioframe.read_table
csv.Sniffer
assumptions.bioframe.read_table
now ensures proper BED parsing.✅ Added header skipping for
track=
metadata linestrack=
, it is skipped before parsing.✅ Handled empty blacklist files (
""
) gracefullynp.concatenate
errors by checking for empty results.✅ Fixed indentation issue in HDF5 weight storage
create_dataset
operations remain within thewith h5py.File(...)
block.KeyError
when writing to HDF5 files.✅ Explicitly set
dtype="float64"
in HDF5 optionsValueError: cannot convert float NaN to integer
.✅ Adjusted test parameters for better stability
test_balancing_with_blacklist
parameters (tol=0.1
,max-iters=1000
,nproc=1
)Impact
track=
) without errors.bioframe
for BED file handling.Closes:
Fixes #196