The rompy_binary_datasources
package is an extension to the main rompy
package, providing specialized source classes for handling binary data formats like pandas DataFrames and xarray Datasets. These data source classes are separated from the main package to avoid issues with OpenAPI schema generation while ensuring complete functionality for users who need to work with these data formats.
This package seamlessly integrates with the main rompy
package through a carefully designed import stub mechanism. When users attempt to use the binary data source classes from the main package, they receive helpful error messages directing them to install this package.
Key features:
- Transparent integration: Classes behave as if they were part of the main package
- Helpful error messages: Clear guidance on how to install this package when needed
- Backward compatibility: Maintains existing import paths for code that expects these classes in the main package
To install rompy_binary_datasources:
$ pip install rompy_binary_datasources
The package is designed to be used alongside the main rompy package:
$ pip install rompy rompy_binary_datasources
A source class for wrapping existing xarray Dataset objects:
import xarray as xr
from rompy_binary_datasources import SourceDataset
# Create a dataset
ds = xr.Dataset(...)
# Wrap it in a SourceDataset
source = SourceDataset(obj=ds)
# Use it in rompy workflows
# ...
A source class for wrapping pandas DataFrame timeseries objects:
import pandas as pd
from rompy_binary_datasources import SourceTimeseriesDataFrame
# Create a timeseries DataFrame
df = pd.DataFrame(...)
df.index = pd.DatetimeIndex(...)
df.index.name = "time"
# Wrap it in a SourceTimeseriesDataFrame
source = SourceTimeseriesDataFrame(obj=df)
# Use it in rompy workflows
# ...
import numpy as np
import pandas as pd
import xarray as xr
from rompy_binary_datasources import SourceDataset
from rompy.core.data import Data
# Create an xarray Dataset
times = pd.date_range("2023-01-01", "2023-01-10", freq="1D")
lats = np.linspace(-90, 90, 19)
lons = np.linspace(-180, 180, 37)
ds = xr.Dataset(
data_vars={
"temperature": (["time", "lat", "lon"], np.random.rand(len(times), len(lats), len(lons))),
},
coords={
"time": times,
"lat": lats,
"lon": lons,
}
)
# Wrap in SourceDataset
source = SourceDataset(obj=ds)
# Use in rompy Data object
data = Data(
variables=["temperature"],
source=source
)
# Use the data in rompy workflows
# ...
import pandas as pd
import numpy as np
from rompy_binary_datasources import SourceTimeseriesDataFrame
from rompy.core.data import Data
# Create a timeseries DataFrame
times = pd.date_range("2023-01-01", "2023-01-10", freq="1H")
df = pd.DataFrame({
"temperature": np.random.rand(len(times)),
"humidity": np.random.rand(len(times)) * 100,
}, index=times)
df.index.name = "timestamp" # Must have a name
# Wrap in SourceTimeseriesDataFrame
source = SourceTimeseriesDataFrame(obj=df)
# Use in rompy Data object
data = Data(
variables=["temperature", "humidity"],
source=source
)
# Use the data in rompy workflows
# ...
Contributions are welcome! Please feel free to submit a Pull Request.
Free software: BSD license