Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse array experiments #315

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

sparse array experiments #315

wants to merge 7 commits into from

Conversation

crdanielbusch
Copy link
Collaborator

@crdanielbusch crdanielbusch commented Feb 26, 2025

Pull request

This is not meant to be merged. Some experiments with data structures for sparse arrays

Copy link

codecov bot commented Feb 26, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1 1 0 0
View the top 1 failed test(s) by shortest run time
primap2/tests::primap2.tests
Stack Traces | 0s run time
../../../..../uv/python/cpython-3.11.10-linux-x86_64-gnu/lib/python3.11/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1126: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:940: in exec_module
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
primap2/tests/__init__.py:3: in <module>
    from .examples import minimal_ds  # noqa: F401
primap2/tests/examples.py:5: in <module>
    import sparse
E   ModuleNotFoundError: No module named 'sparse'

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@crdanielbusch
Copy link
Collaborator Author

crdanielbusch commented Feb 26, 2025

@mikapfl @JGuetschow current state of our experiments on sparse arrays in primap2. All a bit messy, but I guess you get the idea. If we consider "merging data sets with sparse arrays" a minimal use case, I'd say sparse arrays are a promising approach. I could add a test that mimics the merging of the primap-hist data set to see if it works. I still have to check it the array remains sparse throughout the merging process.

Notes:

  • There is an issue with sparse array and xarray's pr.set, I've experienced the same issue as described here Duck array ops try to import transpose from sparse pydata/xarray#9933, it works for pinned versions sparse 15.4.0 and xarray 2024.11.0
  • This line in _merge causes errors: da_error = da_comp.where(da_comp > tolerance, drop=False) The xarray function where does not seem to be compatible with spars arrays.
  • I think it is possible to move this test to the level of numpy arrays. What we would like to find out is essentially: Are there data points with higher errors than the specified tolerance.
  • The tests in test_merge.py are passing, except for the ones that compare the error message. The error message with relevant the coordinates may be hard to put together, e.g. something like pr.merge error: found discrepancies larger than tolerance (1.00%)
  • xarray.Dataset.combine_first (what we use for pr.merge) supports sparse arrays

@mikapfl
Copy link
Member

mikapfl commented Feb 27, 2025

Quite promising. The advantage of the sparse array approach would of course be that conceptually, everything keeps working the same. The disadvantage is that the abstraction is leaky as we see with the where stuff. We'll have to keep "does this work with sparse arrays" always in mind. But maybe that is a fine trade-off, especially if we can get primap2 testing up to speed with sparse arrays and can guarantee that all primap2 functions work with sparse arrays.

Could you test if pr.set works on an array-level, e.g. setting a new country data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants