Skip to content

Losing data when add a raster to a dataset #5075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MaxDragonheart opened this issue Mar 25, 2021 · 15 comments
Open

Losing data when add a raster to a dataset #5075

MaxDragonheart opened this issue Mar 25, 2021 · 15 comments

Comments

@MaxDragonheart
Copy link

As I mentioned here, when I add a raster inside a geocube are produced a lot of nodata.

After a test @snowman2 has indicates that this is a possible bug of xarray libraries.

I was able to reproduce the issue you demonstrated. I think it has to do with decimal precision of the coordinates when adding the DataArray to the Dataset. That being said, I checked that the x coordinate and y coordinates were the exact same for raster_in and out_grid. This is likely an issue with xarray and not geocube.

For a workaround, this got it working without the gaps for me:

import geopandas
import rioxarray
from geocube.api.core import make_geocube

vector_in = geopandas.read_file("vector_data.shp")
raster_in = rioxarray.open_rasterio("raster_data.tif", masked=True).sel(band=1).drop("band")

out_grid = make_geocube(
    vector_data=vector_in,
    measurements=["id"],
    like=raster_in,
)

out_grid["process_value"] = (raster_in.dims, raster_in.values, raster_in.attrs, raster_in.encoding)
@max-sixty
Copy link
Collaborator

Please could you create a minimally reproducible example, as per the issue template?

@MaxDragonheart
Copy link
Author

All do you need is on the first linked issue, so here. You can find QGIS project with data example.

@keewis
Copy link
Collaborator

keewis commented Mar 26, 2021

the issue is that the code example you gave both there and here does not directly use xarray so debugging is difficult. Could you post the repr of raster_in and out_grid? If you can, saving both raster_in and out_grid to netCDF and attaching them to a comment might also help.

Without looking at the data I can't confirm, but I suspect this is a alignment issue. If so, we do have a open PR (not merged, yet, unfortunately) which added a tolerance kwarg to align and could fix this.

@MaxDragonheart
Copy link
Author

Below my code

import geopandas
import rioxarray
from geocube.api.core import make_geocube

# Read input_vector and create a GeoDataFrame
vector_in = geopandas.read_file(input_vector)

# Read raster data
raster_in = rioxarray.open_rasterio(input_raster)

# Create GeoCube
out_grid = make_geocube(
    vector_data=vector_in,
    measurements=["id"],
    like=raster_in, 
)
out_grid.to_netcdf("rasterized_vector.nc")

# Add raster_in to geocube
out_grid["process_value"] = raster_in
out_grid.to_netcdf("out_grid_process_value.nc")

Here there are two netCDF with output data.

@keewis
Copy link
Collaborator

keewis commented Mar 27, 2021

could you also show raster_in by itself so we can fully reproduce? I was thinking that there were tiny differences in x and y, but

a = xr.open_dataset("out_grid_process_value.nc")
b = xr.open_dataset("rasterized_vector.nc")
xr.testing.assert_equal(a[["id"]], b)

shows that the coordinates are equal so that's not the issue.

@snowman2
Copy link
Contributor

Similar issue here: corteva/rioxarray#298

Reproducible example in the issue ^^

@snowman2
Copy link
Contributor

snowman2 commented Apr 15, 2021

For the original post for this issue ref:

np.abs(raster_in.x.values - out_grid.x.values).max(), np.abs(raster_in.y.values - out_grid.y.values).max()
(1.7763568394002505e-15, 7.105427357601002e-15)

Looks like there is a tiny difference in decimal precision.

@snowman2
Copy link
Contributor

Similarly at corteva/rioxarray#298 (comment)

np.abs(resampled.x.values - da2.x.values).max(), np.abs(resampled.y.values - da2.y.values).max()
(4.000014541816199e-09, 6.666663665555461e-09)

@snowman2
Copy link
Contributor

snowman2 commented May 5, 2021

Also related: corteva/rioxarray#332

@snowman2
Copy link
Contributor

snowman2 commented May 5, 2021

I think it would be nice for xarray to have a "close enough" option where a certain number of decimal places is enough to consider them equal. Is there an issue somewhere where this is under consideration or is there an option already in existence that I missed?

@keewis
Copy link
Collaborator

keewis commented May 5, 2021

if I understand the issue correctly this will be fixed by the index refactor (see #4489 (comment))

@snowman2
Copy link
Contributor

snowman2 commented May 5, 2021

Yes, the alignment stuff in there should do the trick.

@hamiddashti
Copy link

A "close enough" option can take care of issues such as in xarray.where in situations where the coordinates are very close but not exactly matched (here) (e.g. due to Geotif roudning, here ) .

@kmuehlbauer
Copy link
Contributor

It looks like this was fixed somewhere before xarray 2024.10.0. At least the example in #5075 (comment) works now as expected.

Can you confirm @snowman2? Then we could close this issue.

@snowman2
Copy link
Contributor

snowman2 commented Nov 4, 2024

rioxarray added workarounds for it to work. Not sure if we can verify it using rioxarray or geocube. Will need an example that doesn't depend on those libraries to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants