Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

lazyarrays and zip #135

Open
nsmith- opened this issue May 29, 2019 · 5 comments
Open

lazyarrays and zip #135

nsmith- opened this issue May 29, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@nsmith-
Copy link
Member

nsmith- commented May 29, 2019

I would expect the following transformation between arrays1 and arrays3 to be an identity:

# http://scikit-hep.org/uproot/examples/HZZ.root
fin = uproot.open("HZZ.root")
tree = fin["events"]
arrays1 = tree.lazyarrays("Electron_*")
arrays2 = {k.replace('Electron_', ''): arrays1[k] for k in arrays1.columns}
arrays3 = awkward.JaggedArray.zip(**arrays2)

but I suspect a more generalized awkward.zip is needed.

@jpivarski
Copy link
Member

Yeah, this is an example of a more general problem: methods to find for one array class need to be defined for them all, either by putting that same method on all or by having the method aware of all. In this case, JaggedArray.zip doesn't know what to do or doesn't do the right thing for ChunkedArray.

Part of the issue is just organization—I need to make a big matrix of what's been implemented on which array classes, whether it deals with all classes in a sensible way, and whether it has an equivalent for all classes.

@jpivarski
Copy link
Member

I was imagining that the NanoAOD profile would be explicitly ChunkedArray-aware, that it wouldn't use JaggedArray.zip without qualification.

@nsmith-
Copy link
Member Author

nsmith- commented May 29, 2019

What is the interim recipe for such an operation, until the methods are aware of all awkward combinations?

@jpivarski
Copy link
Member

Currently, you'll need to dig into the ChunkedArray, zipping the JaggedArrays inside it, and packing that up as a new ChunkedArray.

It requires some decisions to be made about delayed evaluation. If you don't care about that right now, let the VirtualArrays be materialized and work with eagerly evaluated arrays.

The plan for the cms.nanoaod profile is to delay the zipping in new generators for new VirtualArrays. The fact that there's a module for it, not just a function, is to have a place to put classes representing the delayed zipping, which would then be persistable. (For awkward persistence, generators need to have an "address" in the whitelist: uproot_methods.profiles.* is in the whitelist.)

@jpivarski jpivarski added the enhancement New feature or request label Jul 9, 2019
@jpivarski
Copy link
Member

Even though I homogenized a lot of methods, I haven't made JaggedArray.zip work for everything.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants