-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel KD-Tree construction #165
Comments
What is the MatrixType & coeff? |
Hey, I forked the project and made a commit on building |
I totally love your changes and how you approached it 👍 I checked it with valgrind and gcc sanitizer and both are happy, so I am. Well, actually Then, yes, feel free of opening a PR, after adding the corresponding line (and your name at the end?) to the CHANGELOG. Thanks! |
Thank you for your reply! I have submitted a PR with some explanation about helgrind. If you have any suggestions or concerns, please let me know. Thanks again! |
Closing, this feature was already merged and is released in the latest v1.5.0 |
@Mintpasokon Can you share a bit of details on the dataset/parameters you used to achieve a speedup of 3? I was trying for my application and the result were far off that (incl. being slower at very large thread counts on an HPC node). I would love to understand whether there is room for improvement - sequential kdtree construction has become a bottleneck in our the application... |
@dokempf I tested with Ryzen 7 5800X 8c/16t, DDR4 3200, with 1~20 millon 2D points to get 3x speedup. As you are using large amount of threads, I guess there might be some problem with memory access locality, or false sharing, in this line, since the datas in vAcc are random, in worst case, it would require much larger CPU cache for your dataset than actually accessed.
|
@dokempf I made some perf tests around number of points and thread count in kd tree building. It looks like this: |
Thanks for the benchmarking, @Mintpasokon !! |
The scale of my problems is much larger. I tested datasets of 100k-400M 3D double precision points. The smallest of these do still fit in L3 - still I only see speed-ups of up to |
@dokempf Probably worth investigating an alternative implementation for such larger datasets with
Overall, for
Of course, this is just the expected average, and the constants (obviated above) are key in real implementations, but in theory we could obtain some gain. PS: I would love to, but I don't have bandwidth at present to try to implement and benchmark it, tough... :-( |
PS: @dokempf if your data is not too sparse as to memory to become a game stopper, you could try a grid-based representation instead, e.g. here I have an implementation |
Hey thanks for the great project! I am wondering if there is any easy way (or forked project) to enable parallelism for KD-Tree construction with nanoflann? Or do you have plans to integrate this feature in the future?
The text was updated successfully, but these errors were encountered: