-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): read_lazy
for whole AnnData
lazy-loading + xarray
reading + read_elem_as_dask
-> read_elem_lazy
#1247
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1247 +/- ##
==========================================
- Coverage 86.11% 84.26% -1.85%
==========================================
Files 40 45 +5
Lines 6242 6673 +431
==========================================
+ Hits 5375 5623 +248
- Misses 867 1050 +183
|
68fcd2b
to
6165f07
Compare
@ivirshup @flying-sheep Not really looking for a thorough code review at the moment, more of a look at the structure of what we are exporting. The big changes are
Do we want this way of doing things? Or is there some other route? Separately, are the changes made to the core acceptable? After that, I think we can look into the specifics of the code I added. Or you can review that now, but I'd rather get big changes out of the way first. |
I will continue to make little changes to clean things up (this is still a draft!) but I think this structure is the way I would go. But maybe you have different ideas! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the approach.
Would of course be better if that stuff got upstreamed, but with only the category and mask handling being done by us, this is feasible I think, but I don’t have a lot of xarray experience.
There’s one hack in there that I really don’t want us to leave in, otherwise already looks quite clean.
I’ll take a deeper look once you’re done.
from anndata.experimental import read_lazy | ||
from anndata.tests.helpers import assert_equal, gen_adata | ||
|
||
from .conftest import ANNDATA_ELEMS, get_key_trackers_for_columns_on_axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, I thought this doesn’t work. did they change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What didn't work? Importing from `conftest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah
] | ||
dev-doc = ["towncrier>=24.8.0"] # release notes tool | ||
test-full = ["anndata[test,lazy]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a different name? Not sure about test-full
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in scanpy we have
test-min
which is used in the minimum deps job,test
which is a healthy subset of functionality, andtest-full
, which is everything (except for external I think)
OK, so I went through all open conversations and new commits, and there’s almost nothing left:
|
Putting at the top-level of the
That's fair - maybe let's wait until zarr v3. The reason is simply that "using remote data with zarr requires these, and otherwise you will get a
I can redo this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! One last thing:
I will open follow up PRs after this one to account for a few things, but, for now, I am going to leave this unmerged because the zarr v3 PR should go in first. I am very happy with the state of things :) |
This PR is a lighter weight version of #947 that involves using the original
AnnData
object as the class to holdobs
andvar
xr.Dataset
..obs
and.var
withbacked="r"
mode #981