Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to interrupt and resume for large numbers of points #61

Open
danbjoseph opened this issue Jul 24, 2024 · 6 comments
Open

option to interrupt and resume for large numbers of points #61

danbjoseph opened this issue Jul 24, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@danbjoseph
Copy link
Member

danbjoseph commented Jul 24, 2024

we need a way to interrupt and then resume both assign_images and assign_gvi_to_points


I am testing a somewhat large area in Indonesia and the projected time for it to complete is more than 60 hours. Are the features stored in the geopackage (or other geo file) being written/updated as the process runs, or does it happens once at the end? It would be fantastic to have a "resume" option so that we could interrupt the process, then restart it but with it ignoring the points for which we have already downloaded/fetched an image.

Assigning Images to Points:   0%|                      | 1463/791462 [09:21<67:34:00,  3.25points/s]
@jayqi jayqi added the enhancement New feature or request label Jul 25, 2024
@danbjoseph
Copy link
Member Author

via @dragonejt "we only output the file at the end of the entire script run. To avoid this, we would have to catch the KeyboardInterrupt and output the file then, but catching the KeyboardInterrupt is usually not recommended I think."

are there other options? what about writing the file every 100 points or something? i guess this is even more complicated if there is a delay between the API call to find an image and the actual download of the image (see the question in #67) - on resume we would need to check all image filenames in the geopackage for a matching image file on disc and not just pickup doing lines without a image filename value?

@jayqi
Copy link
Contributor

jayqi commented Jul 25, 2024

I agree that catching KeyboardInterrupt sounds like a weird thing to do.

There are certainly other options, and it will involve some bigger changes. For example, rather than storing the output data as a GeoPackage file, we could instead use a file-based database like SQLite or DuckDB with a geospatial extension where we can write the outputs. Or, we could write out the point-to-file mappings in a non-geospatial file format that supports streaming writes, like one JSON file per image metadata row, or using JSONL for a single file.

@danbjoseph danbjoseph changed the title option to interrupt and resume image fetch for large numners of points option to interrupt and resume for large numbers of points Aug 26, 2024
@jayqi
Copy link
Contributor

jayqi commented Sep 11, 2024

Okay, here's an idea to consider that might not require changing our data structure or what we store on disk: we separate the image identification and the image download into two steps.

First, we match each point to an image (without downloading any images). This will still take some time but I expect should be much faster. Then, this can get saved.

Then, we subsequently go through and download each image. If an image is already on locally available, then we can skip that image.

@danbjoseph
Copy link
Member Author

That sounds like it would help with Mapillary. What about if we are processing a local folder of images?

@jayqi
Copy link
Contributor

jayqi commented Sep 12, 2024

Is processing a local folder of images slow right now? I don't think we're moving around/copying the images right now, or anything like that.

@danbjoseph
Copy link
Member Author

i guess it took less than a minute for: 1090 points, 1286 images, 268 matches

Assigning Images to Points: 100%|███████| 1090/1090 [00:57<00:00, 19.10points/s]

if it stays that fast then something like 80,000 points div 20 points/s div 60 s/min is just over an hour, which is not fast but is reasonable

@dragonejt dragonejt added this to the Indonesia SLI Capture milestone Sep 26, 2024
@jayqi jayqi self-assigned this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

3 participants