-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements on dependencies calculations #172
Conversation
e68a18c
to
c5655eb
Compare
Codecov Report
@@ Coverage Diff @@
## beliy-derivate #172 +/- ##
==============================================
Coverage 0.00% 0.00%
==============================================
Files 25 29 +4
Lines 920 951 +31
==============================================
- Misses 920 951 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This should:
For qMRI some files in the I suggest we finish this PR as for now and then tackle #146 that concerns the indexing of nested derivative folders before we come back to finish fixing #157 completely. |
I suspect that having to load and read the metadata of each file takes a lot of time. I suspect that this can indeed be optimized but I would worry about that later. |
I pretty much took your code as is, fixed the test that were not passing and did a bit of refactoring and commenting. I also:
The code is pretty much the same though: thanks for providing all this. Let me know what you think. Note: I feel that the current way of indexing all the dependencies creates a lot of "redundancies" in the layout (example: the phase image of fieldmap will list all the associated imagesin But I think that we can also optimize this later, if needed as this way of indexing things makes fewer assumptions about what file is / should be the "main" one for a given group. |
It will still work if
Conserving full metadata for each file can explode memory usage. From inside we need only
Sometimes the main file is not well defined -- in phasediff and 2 magnitude, which one is main? In order to simplify we can exclude |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for mee, more abstract and generic.
As for optimisation, the list of candidates for metadata (considering only json) is same for each call. Probably it can be saved somewhere.
If (to be confirmed) data files are scanned in alphabetical order, lokking for group dependencies can be limited in file's neighbours.
Also, we probably need to move get_metadata
from internal to util. This function can be usefull outside the layout and query
Will try to adress some of your questions. Before merging. Totally agree with getting get_metadata out of Was thinking something along the same line for optimization: will open a separate issue to keep track of that. |
Metadata treatment
Split
get_metadata
into two functions:metalist = get_meta_list(filename, pattern)
returns list of files satisfying pattern with inheritance principleget_metadata(metafile)
construct meta structure from json files in passed meta-files list (from above function)Each data file in dataset receives new field
metafile
that contains list of files fromget_meta_list
.This will facilitate access to metafiles directly. Individual values can be extracted by user if needed using
get_metadata
.Dependencies
Each file structure contains dependencies sub-structure with guaranteed fields:
explicit
-- list of data files containing "IntendedFor" referencing current file.data
-- list of files with same name but different extension. This will combine files that are split in header and data (like in Brainvision), also takes care of bval/bvec filesgroup
-- list of files that have same name except extension and suffix. This will group file that logically need each other, like functional mri and events tabular file. It also takes care of fmap magnitude1/2 and phasediffAdditional custom subfields can be added for given modalities.
This allow to treat
dwi
andfmap
with genericparse_using_schema
.perf
likely also, but I can't certify it.Issues