-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional merge_neighbors #130
Comments
What follows is a code sample of doing a conditional merge with data aggregation. This is a pipeline example. Consider the pipeline, "Route 100." It has two High Consequence Areas (HCAs) along its length, and no Moderate Consequence Areas (MCAs). (HCAs and MCAs are those portions of a natural gas transmission pipeline where the consequences of a release could be unpleasant, due the presence of people and the possibility of ignition.) There have also been eight leaks recorded over time along the line. Our task is to summarize the count of leaks that have occurred within each HCA range. The Interval Tree object is well suited to performing this type of analysis. Note that we are using Python dictionaries as the data objects for the interval objects. This is fine for this simple example, but in actual practice, when dealing with thousands of miles of pipelines in a large system, using Pandas dataframes would make better sense. The example shown here actually comes from a Jupyter notebook, attached (zipped), as well as a simple pictorial representation of the data: (https://github.com/chaimleib/intervaltree/files/11923340/IntervalTree.Point.Dissolve.Test.zip)
Print output:
Print output:
Print output:
Print output:
Print output:
Here's what that the final print output looks like: [IntervalTree Point Dissolve Test.zip] |
The current merge_neighbors is distance-based, which makes it useful only for the narrow use case of dealing with "sliver" intervals, essentially by merging them into adjacent, longer intervals. A more useful general case would be to perform a merge of adjacent intervals based on some condition in the adjacent intervals' data objects. For instance, consider a case where the data objects are dictionaries. Under a conditional merge_neighbors, one would only perform the merge of adjacent intervals where the item values for a given key in the respective data objects of two adjacent intervals are equal to each other, or to some specified value. This type of conditional merge is independent of interval length, and is dependent rather on the data content of adjacent intervals.
This type of conditional merge can currently be performed, but only with rather cumbersome and clunky code. It is necessary to iterate over the sorted intervals of a tree, comparing the data object of the current interval to that the previous interval, and, if the condition is met, deleting them, and then constructing a new interval that covers the span of the adjacent deleted intervals, building a new data object for the new "merged' interval based on the remains of the deleted intervals' data objects, and then inserting the new "merged" interval back into the tree. It's tedious and not particularly performant. Adding the ability to specify a merge condition into the arguments for merge_neighbors would be very helpful.
The text was updated successfully, but these errors were encountered: