-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Moving us over here for now since this is not lazperf-specific.
I don't know much about LAX files and what they allow us to do in terms of queries,
but if what we can get from the LAX file queries is a bunch of point indexes then we might be able to get something with acceptable performance in pure Python for LAS files.
So far my solution is to take advantage of the memory mapping that laspy provides. Here is a little example. This is not how the package is structured yet, but where I think I am headed.
Something I will be able to get from laxpy is a set of point indices that are in some quadtree cell. We provide the cell index as an argument:
my_cell_index = 512
my_point_indices = laxpy.query(my_cell_index)
.lax files are rather small relative to their "parent" .las files, so parsing these in Python is no problem so far, although I have only used fairly small .las files.
query returns a numpy array of all of the indices associated with that quadtree cell (not all of them are entirely within, see this video for an explanation). I can then just instantiate the laspy memory map and get the points. This seems to be reasonably efficient:
my_las = laspy.file.File('my_las_file.las')
my_las.points[my_point_indices]
So What is the Problem?
This is all fine and good, but we have to rely on lasindex itself to construct the .lax file. What would be nice is to do this ourselves via Python or Cython or something. That way it is a "complete package". If you refer to the video I linked above, Martin streams the points in one by one to construct the tree. As you have pointed out, doing this in Python itself is incredibly slow. For reference, lasindex can do this for ~600mb las tile in about 5-10 seconds...for the Python version I got tired of waiting!
The other solution is to just raise up our hands and load the entire point cloud into memory, construct the tree and write it to file. But this is silly, I think. The whole motivation behind creating the index in the first place is for querying large las files.
Any ideas you might have would be great.