-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add interface for filtering on demtsh leaves to python utility #167
Comments
Thanks, Richie. I think this is a general issue with vector< vector > branches. Here is the output of
The We can see the same thing in ROOT with
I've had a quick play with the I think we either we live with this, or we flatten things down to one dimension and associate each hit/fit with a track via an id. See some discussion here about how you could use the id in uproot: scikit-hep/uproot5#229. This would be a significant change though but may be worth it It also looks like having vector< vector > may be slower to read scikit-hep/uproot5#327 although that may have since been solved with AwkwardForth (https://arxiv.org/pdf/2102.13516) and I haven't noticed things being particularly slow |
A possible solution is to define the end times as a struct with named
values instead of an array.
This same problem presumably affects all the contents currently using
std::Array.
…On Tue, Jun 18, 2024 at 1:42 PM Andrew Edmonds ***@***.***> wrote:
Thanks, Richie. I think this is a general issue with vector< vector >
branches. Here is the output of trkana.show(filter_name=['dem', 'dem.*',
'demfit', 'demtsh'], interpretation_width=100)
name | typename | interpretation
---------------------+--------------------------+-----------------------------------------------------------------------------------------------------
dem | vector<mu2e::TrkInfo> | AsGroup(<TBranchElement 'dem' (29 subbranches) at 0x7f73ad37d400>, {'dem.status': AsJagged(AsDtyp...
dem/dem.status | int32_t[] | AsJagged(AsDtype('>i4'))
dem/dem.goodfit | int32_t[] | AsJagged(AsDtype('>i4'))
dem/dem.seedalg | int32_t[] | AsJagged(AsDtype('>i4'))
... snip ...
dem/dem.avgedep | float[] | AsJagged(AsDtype('>f4'))
demfit | std::vector<std::vect... | AsObjects(AsVector(True, AsVector(False, Model_mu2e_3a3a_TrkFitInfo)))
demtsh | std::vector<std::vect... | AsObjects(AsVector(True, AsVector(False, Model_mu2e_3a3a_TrkStrawHitInfo)))
The dem branch can have its individual leaves accessed because it is just
a vector and I guess ROOT has made subbranches for each member in the
struct. The demtsh and demfit branches don't have the same
interpretations.
We can see the same thing in ROOT with trkana->Print("dem*"): dem has
subbranches but demfit and demtsh do not
******************************************************************************
*Br 0 :dem : Int_t dem_ *
*Entries : 10 : Total Size= 17888 bytes File Size = 126 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.27 *
*............................................................................*
*Br 1 :dem.status : Int_t status[dem_] *
*Entries : 10 : Total Size= 744 bytes File Size = 129 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.26 *
*............................................................................*
*Br 2 :dem.goodfit : Int_t goodfit[dem_] *
*Entries : 10 : Total Size= 749 bytes File Size = 130 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.26 *
*............................................................................*
... snip ...
*............................................................................*
*Br 30 :demfit : vector<vector<mu2e::TrkFitInfo> > *
*Entries : 10 : Total Size= 3398 bytes File Size = 1239 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 2.34 *
*............................................................................*
*............................................................................*
*Br 31 :demlh : vector<vector<mu2e::LoopHelixInfo> > *
*Entries : 10 : Total Size= 2421 bytes File Size = 1578 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.22 *
*............................................................................*
... snip ...
*............................................................................*
*Br 57 :demtsh : vector<vector<mu2e::TrkStrawHitInfo> > *
*Entries : 10 : Total Size= 105745 bytes File Size = 68136 *
*Baskets : 5 : Basket Size= 32000 bytes Compression= 1.54 *
*............................................................................*
*Br 58 :demtsm : vector<vector<mu2e::TrkStrawMatInfo> > *
*Entries : 10 : Total Size= 27154 bytes File Size = 12775 *
*Baskets : 1 : Basket Size= 32000 bytes Compression= 2.09 *
*............................................................................*
I've had a quick play with the splitlevel of the branches and it doesn't
seem to have helped...
I think we either we live with this, or we flatten things down to one
dimension and associate each hit/fit with a track via an id. See some
discussion here about how you could use the id in uproot:
scikit-hep/uproot5#229
<scikit-hep/uproot5#229>. This would be a
significant change though but may be worth it
It also looks like having vector< vector > may be slower to read
scikit-hep/uproot5#327
<scikit-hep/uproot5#327> although that may
have since been solved with AwkwardForth (https://arxiv.org/pdf/2102.13516)
and I haven't noticed things being particularly slow
—
Reply to this email directly, view it on GitHub
<#167 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAH573ZPWU3BYDY3U5JJP3ZICLUXAVCNFSM6AAAAABJQRB73OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZWHEZTOMBWHA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
David Brown ***@***.***
Office Phone (510) 486-7261
Lawrence Berkeley National Lab
M/S 50R5008 (50-6026C) Berkeley, CA 94720
|
Hi. I'm working through issues, trying to establish if they're still a problem. This issue is fundamental to the way that "local" track fit variables are stored in Something like this?
I tested this quickly and it returns the same result as loading the entire branch and then printing the leaves one-by-one, like this:
Let me know what you think. |
I didn't know about ak.zip, that's definitely convenient. I had tried something similar using the uproot batching feature `a = {field : [] for field in fields} for field in fields: |
I like Richie's suggestion. @sam-grant some of your code might overlap with the new util/mu2epyutil, if something is missing from there please add it, and we should add something like Richie's to that too |
Thanks, everyone. I agree with Sophie, if there are little tricks that make working in uproot/awkward arry easier, then let's add them to the new utility class |
Hi everyone, let's keep this issue open until we have a working interface for it in the python utility. I will rename the issue with the new task. Anyone should feel free to assign themselves to this |
demtsh leaves no longer appear as keys in uproot ttree object and so the whole branch must be converted to an awkward array (cannot use filter_name to select a subset of leaves). Tested with uproot 5.3.8rc2
The text was updated successfully, but these errors were encountered: