-
Notifications
You must be signed in to change notification settings - Fork 94
Update: Add interval SweepLine Algorithm #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Benchmark Results (Julia v1)Time benchmarks
Memory benchmarks
|
Benchmark Results (Julia vlts)Time benchmarks
Memory benchmarks
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1251 +/- ##
==========================================
+ Coverage 87.86% 87.89% +0.02%
==========================================
Files 196 197 +1
Lines 6181 6235 +54
==========================================
+ Hits 5431 5480 +49
- Misses 750 755 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @souma4! First round of review attached. This one should be much easier to get ready for a merge.
…s namespace open for a 2D sweep (if needed) in the future. Also fixed overlaps logic to properly break early
src/utils/intersect.jl
Outdated
struct SweepLineInterval{T<:Number} | ||
start::T | ||
stop::T | ||
segment::Any | ||
index::Int | ||
end | ||
|
||
function overlaps(i₁::SweepLineInterval, i₂::SweepLineInterval) | ||
i₁.start ≤ i₂.start && i₁.stop ≥ i₂.start | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this data structure?
The new algorithm seems very simple:
For each segment, do
- Compute start and save in a vector
starts
- Compute stop and save in a vector
stops
Sort vectors segs
, starts
and stops
in terms of sortperm(starts)
For each pair of segments, do
- Fast check with
starts
andstops
- If check is true, call expensive intersection
- Otherwise, advance the loop
If we simply implement this algorithm with raw vectors segs
, starts
and stops
, it will be probably be more readable and type stable.
Also, the function sweep1d
can probably be merged into the main pairwiseintersect
function after that. The only part of the code that deserves encapsulation is the kernel of the loop that uses starts
, stops
for a fast check and only then computes intersections with segs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if you feel that the readability with a single sort is superior, then use a simple vector of tuples preproc = [(start, stop, seg), ...]
and sort(preproc, by=first)
.
When you introduce a data structure like SweepLineInterval
and use its fields in a separate function (e.g., by=x->x.start
), you are accessing internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really don't. The main idea was to have everything tied to the same object, but this way works well. It's actually significantly faster because it is effectively a struct of arrays rather than array of structs. It's now the fastest test I've run thus far.
… to vectors of indices because sets are expensive to initialized
src/utils/intersect.jl
Outdated
oldindices = collect(1:n)[sortedindices] | ||
|
||
# primary sweepline algorithm | ||
𝐺 = Dict{Point,Vector{Int}}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is not concrete. Please use typeof
if necessary. Is Dict
the best data structure for this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh, you're right, Point
isn't concrete. Easy fix. that simple fix improved the speed 2x 😮
I argue that Dict is the best data structure largely because what we want is a set of unique points and a set of corresponding segment indices for each point. In the case of more than two lines intersecting at a point, the dictionary allows us to easily check if that point already exists, then we can just append new indices to the indices.
Vector
isfeasible, but they are a bit messier to handle with little to no speed improvement. That's largely because if we want unique keys in a Vector
, we still have to find which indices are duplicated (a large batch membership equality check) then append segment indices as needed.
Set
wouldn't work because the order of the set may be different from the segment indices, thus losing which indices belong to which intersection point.
Now, a true "graph" would probably be optimal, but adds the complexity of implementing said system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a check on if we could derive a graph from this output easily, and yes we can. We can build adjacency and segment-points (reverse of current point-segments) graphs in 50% of the time as creating the line segment intersections (in a dense case).
Upon reviewing the performance of comparative implementations, I decided to move from the very complicated and numerically unstable implementation in #1168 to a simpler interval sweepline (ie, we iterate over a sorted sequence by the furthest left x and check intersections if two segments have overlapping intervals).
This is equivalent to the Bentley-Ottmann algorithm in sparse cases and is O(N^2) in dense cases. However, in dense cases, Bentley-Ottmann computes many more intersections (to traverse the Binary search tree). These roughly balance out. In unofficial tests against a dense vector of segments, this implementation was the second fastest, only slower than crate::geo.
Profiling shows most time is spent exclusively in runtime and GC dispatch within the
Intersection(...) do
block, so improving efficiency there would be valuable.