Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terra::rast() doesn't return a variable with workers > 1 #259

Open
twest820 opened this issue Jun 7, 2023 · 3 comments
Open

terra::rast() doesn't return a variable with workers > 1 #259

twest820 opened this issue Jun 7, 2023 · 3 comments

Comments

@twest820
Copy link

twest820 commented Jun 7, 2023

A repex isn't feasible because multiple gigabytes (actually multiple terabytes in the full use case) of data are involved but I have the following scenario

library(dplyr)
library(furrr)
library(sf)
library(terra)

plan(multisession, workers = 16)

simpleFeatureCollection = st_read("simpleFeatureCollection.gpkg")

with_progress({
  progressBar = progressor(steps = nrow(simpleFeatureCollection))

  future_map(simpleFeatureCollection$ID, function(polygonID)
  {
    regionOfInterestPolygon = (simpleFeatureCollection %>% filter(ID == polygonID))[1]
    mediumSizeRaster = rast("twoGBraster.tif")
    rasterRegionOfInterest = crop(mediumSizeRaster, regionOfInterestPolygon)

    <do computationally intensive things>

    progressBar(<update message>)
  })
})

which fails with

Error in (function (.x, .f, ..., .progress = FALSE)  : ℹ In index: 1.
Caused by error in `h()`:
! error in evaluating the argument 'x' in selecting a method for function 'crop': object 'mediumSizeRaster' not found

Same code runs fine with workers = 1. While this approach isn't ideal (it would likely waste 60+ GB of memory in duplicate copies of a raster which is thread safe since it sees only read access), the preferred implementation of hoisting rast() out of the function body fails with #258. Since I've got 128 GB of DDR and can afford to waste some is there a way to get rast() to construct an object under parallel execution?

From what I can see at the moment, the least undesirable workaround appears to be refactor the code for single threaded execution, manually chunk and balance the polygons, and then kick off 16 background jobs in RStudio using Code -> Run selection as background job. But, insofar as I understand furrr, that's the sort of task future_map() exists to automate.

@DavisVaughan
Copy link
Collaborator

I'm not sure how big simpleFeatureCollection is but one thing to keep in mind with your current approach is that you (probably) get 16 copies of it, one for each worker, and that could be expensive

@DavisVaughan
Copy link
Collaborator

DavisVaughan commented Jun 7, 2023

Are you sure mediumSizeRaster = rast("twoGBraster.tif") is actually resulting in an object? It looks like it uses a relative path so the working directory on the worker may be different. You could try supplying an absolute path instead.

@twest820
Copy link
Author

twest820 commented Jun 8, 2023

If the error message for workers > 1 is to be believed, it appears somehow the call to rast() is getting skipped—even if the statement was executed and rast() had some silent error leading it to somehow return NULL instead of failing properly that should still result in the parser adding mediumSizeRaster as a workspace variable. So it seems like something might be going pretty badly wrong though, given future's limitations for flowing diagnostics from workers back to their caller, we might be stuck. (I find myself often wishing for plan(multithread) but that's not on furrr.)

If it was a pathing issue, which it presumably isn't since there's no issue with workers = 1, I'd expect to see something like the usual

Error: [rast] file does not exist: twoGBraster.tif
In addition: Warning message:
twoGBraster.tif: No such file or directory (GDAL error 4) 

come back. But future may not be able to route that.

I'm not sure how big simpleFeatureCollection is

Good question! It's only a couple MB, so negligible in this context—32 workers would be better but even 8 GB per worker is maybe asking too much (if this approach to the task had worked I was prepared to kill the future_map() and try with eight workers to get 16 GB DDR per worker if physical memory was going to be exceeded).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants