[WIP] layout redone #134

nbeliy · 2021-02-09T10:52:49Z

Main changes:

No assumptions on modalities are used
modalities are in its own modalities substructure
Additional fields in modality structure (corresponding to files):
- basename -- filename without suffix
- gz -- true if file is compressed (ends with .gz)
- tab -- true if file is tabular
- intended -- array of files that uses current file
- metafie -- full path to corresponding json file
- entity -- substructure containing parsed entities of file
tabular files are treated as data files

Things to do:

Adapt validator and integrate it in layout
Adapt querry
Create depends field, that contains paths to all files needed for given one

Remi-Gau · 2021-02-09T11:05:48Z

The "No assumptions on modalities are used" duplicates the "schema less" approach of this #124

Also I would STRONGLY avoid storing the name of the json sidecars in the layout because the metadata for a given file can be spread across several json files spread through the different levels of the BIDS folder hierarchy (inheritance principle).

nbeliy · 2021-02-09T11:36:09Z

The "No assumptions on modalities are used" duplicates the "schema less" approach of this #124

In shema-less approach, you need to separate modality and entity fields from predefined it. You can't forbid user to use modality filename and completely broke the layout. Same with modality.

I still don't see the point to run all parse_dwi, parse_func instead of just loading the layout and run validator on top.

Remi-Gau · 2021-02-09T12:00:07Z

The "No assumptions on modalities are used" duplicates the "schema less" approach of this #124

In shema-less approach, you need to separate modality and entity fields from predefined it. You can't forbid user to use modality filename and completely broke the layout. Same with modality.

Wait I am not sure what you mean.

This is what the behavior of the schema-less use of layout function

use_schema = false();

BIDS = bids.layout(fullfile(pth_bids_example, ...
                              'ds000001-fmriprep'), use_schema);

This can then return all modalities including those that do not exist in the BIDS specification (ex: figures)

bids.query(BIDS, 'modalities')

ans =

  1×3 cell array

    'anat'    'figures'    'func'

Entities that do not exist the BIDS schema at the moment are listed in the layout.

disp(BIDS.subjects(1).func(1))

ans = 

  struct with fields:

    filename: 'sub-10_task-balloonanalogrisktask_run-1_AROMAnoiseICs.csv'
      suffix: 'AROMAnoiseICs'
         ext: '.csv'
         sub: '10'
        task: 'balloonanalogrisktask'
         run: '1'
        desc: ''
     content: []
        meta: []
        from: ''
        mode: ''
          to: ''
         res: ''
       space: ''
        hemi: ''

I am trying to see what is the test case you mentioned that would break the layout but I think I don't see what you mean, could you give me a concrete example?

nbeliy · 2021-02-09T12:08:25Z

Imagine that someone will have file sub-10_task-balloonanalogrisktask_run-1_AROMAnoiseICs_ext-test.csv.
Then the test entity will override the ext field

Remi-Gau · 2021-02-09T13:14:15Z

Imagine that someone will have file sub-10_task-balloonanalogrisktask_run-1_AROMAnoiseICs_ext-test.csv.
Then the test entity will override the ext field

Ha yeah in that case the file would actually just be skipped.

But then that changes the basic assumption we've had from the beginning of this discussion that files should at least follow this pattern: sub-<>_(<key>-<value>)*_<suffix>.<extension>.

I am not so keen in going in that direction: I think this is the minimal amount of file formatting that there should be in the subject folders.

nbeliy · 2021-02-09T13:17:21Z

I meant:
sub-10_task-balloonanalogrisktask_run-1_ext-test_AROMAnoiseICs.csv

If one of entities (or modalities) coincides with field name -- it will pass the regexp, but overwrite the corresponding field, which will make it unsafe. I agree it will be a rare occasion.

A possible fix (without creating sub-structure), is to make fields starting wth '_' (which will never appear in entity/modality name).

Remi-Gau · 2021-02-09T13:43:13Z

I meant:
sub-10_task-balloonanalogrisktask_run-1_ext-test_AROMAnoiseICs.csv

If one of entities (or modalities) coincides with field name -- it will pass the regexp, but overwrite the corresponding field, which will make it unsafe. I agree it will be a rare occasion.

Aaaahhhh I see. Sorry. It took me a while.

A possible fix (without creating sub-structure), is to make fields starting wth '_' (which will never appear in entity/modality name).

If I understand you correctly, then I am afraid matlab won't be happy about this:

A = struct('_test', 1)
Error using struct
Invalid field name "_test"

nbeliy · 2021-02-09T13:46:23Z

Then with '_' at the end?:

p.filename_ = ...

Remi-Gau · 2021-02-09T14:15:19Z

Then with '_' at the end?:
p.filename_ = ...

Yuck. No.

Let's go with sub-structures for entities : I am always afraid of deeply nested sub-structures but sometimes you can't avoid it.

Just to make sure we are on the same page, you are suggesting something like this?

disp(BIDS.subjects(1).func(1))

ans = 

  struct with fields:

      filename: 'sub-10_task-balloonanalogrisktask_run-1_AROMAnoiseICs.tsv'   
      suffix: 'AROMAnoiseICs'
      ext: '.tsv'
      sub: '10'
      entities: [1×1 struct]
      basename: 'sub-10_task-balloonanalogrisktask_run-1' 
      gz: 0  
      tab: 1
      intended_for: []
      dependency : []
      metafile: 'sub-10_task-balloonanalogrisktask_run-1_AROMAnoiseICs.json'

Entity substructure

disp(BIDS.subjects(1).func(1).entities)     

ans = 

  struct with fields: 

     task: 'balloonanalogrisktask'
     run: '1'

Remi-Gau · 2021-02-09T15:13:30Z

+bids/layout.m

-                                            sprintf(['^%s.*' pattern '$'], ...
-                                                    subject.name));
-
+                                            pattern);


You are still going to need to have something like this here because some of the "files" for meg are actually folders.

if strcmp(modality, 'meg') && ~isempty(d) for i = 1:size(d, 1) file_list{end + 1, 1} = d(i, :); end end

nbeliy · 2021-02-10T08:42:27Z

Let's go with sub-structures for entities : I am always afraid of deeply nested sub-structures but sometimes you can't avoid it.

Another possibility, probably simpler, is to disallow entities and modalities with the same name as fields (with a warning).
Realistically entity namemed filename or suffix do not have much sense.

I agree to drop this pull request (in any case I can't prvide a quality matlab code), and move to your implementation for derivatives.

Remi-Gau · 2021-02-10T09:42:45Z

Let's go with sub-structures for entities : I am always afraid of deeply nested sub-structures but sometimes you can't avoid it.

Another possibility, probably simpler, is to disallow entities and modalities with the same name as fields (with a warning).
Realistically entity namemed filename or suffix do not have much sense.

Well you just never know: so I went ahead and adapted the other PR to align better with your suggestion. Also adapted the query along the way.

I was also thinking that to get the 2 steps process you were suggesting to give the same outcome as the other PR would imply going:

through all the files to add them to the layout
-and then going through the whole layout to remove from the layout the files that don't match the schema.

So I think I prefer only indexing stuff once even if it means carrying around the schema during parsing.

But you are correct that some of the code parse_X will most likely not be needed in the long run. Seeing your code made realize there is some further refactoring I can do actually.

Remi-Gau · 2021-02-10T10:07:45Z

Quick question.

What was the idea or the usecase for adding those 2 fields?

gz -- true if file is compressed (ends with .gz)
tab -- true if file is tabular

nbeliy · 2021-02-10T10:23:21Z

What was the idea or the usecase for adding those 2 fields?

These are just small 'usefull for later' switches, based on object-oriented approach. In my mind the layout is used as structure passed to several bids-related tools or even a pipeline. And for such tools it may be usefull to know if file is compressed, or it is tabular file.

Practice example: the Christop's prepare_derivatives function will decompress compressed data (for SPM). And it is easier and more robust to just check if file.gz than test if it ends in .gz.

The tab is in the same spirit -- tabular data files should be treated differently from image (at least the sidecar json files have different info and structure), so test if given data file is tabular is useful.

The matlab-bids is more oriented to query, and in context of flat lists of files these switches not really important.

Remi-Gau · 2021-02-10T14:19:11Z

These are just small 'usefull for later' switches, based on object-oriented approach. In my mind the layout is used as structure passed to several bids-related tools or even a pipeline. And for such tools it may be usefull to know if file is compressed, or it is tabular file.

OK that 's what I though but I just wanted to make sure.

Practice example: the Christop's prepare_derivatives function will decompress compressed data (for SPM). And it is easier and more robust to just check if file.gz than test if it ends in .gz.

yup sort of ran into the same problem, so it could make sense to have it either in the structure or have query be able to filter files based on that their zipped status.

PS: @ChristophePhillips I used SPM for years and learned only recently that it can handle zipped file fine: bids-standard/bids-specification#136 (comment)

The tab is in the same spirit -- tabular data files should be treated differently from image (at least the sidecar json files have different info and structure), so test if given data file is tabular is useful.

yup, same here support either in the structure or via query seems to make sense.

Remi-Gau · 2021-02-15T08:05:17Z

@all-contributors please add @nbeliy for code, ideas

allcontributors · 2021-02-15T08:05:26Z

@Remi-Gau

I've put up a pull request to add @nbeliy! 🎉

Remi-Gau · 2021-02-15T08:05:55Z

Will close this one and wrap the other PR so it can at least live "happily" in the dev branch.

layout redone

af2d49b

Remi-Gau reviewed Feb 9, 2021

View reviewed changes

allcontributors bot mentioned this pull request Feb 15, 2021

docs: add nbeliy as a contributor #139

Merged

Remi-Gau closed this Feb 15, 2021

nbeliy deleted the crcLayout branch November 14, 2022 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] layout redone #134

[WIP] layout redone #134

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021 •

edited

Loading

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

Remi-Gau Feb 9, 2021

nbeliy commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

nbeliy commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

Remi-Gau commented Feb 15, 2021

allcontributors bot commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

[WIP] layout redone #134

[WIP] layout redone #134

Conversation

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021 • edited Loading

Remi-Gau commented Feb 9, 2021

nbeliy commented Feb 9, 2021

Remi-Gau commented Feb 9, 2021

Remi-Gau Feb 9, 2021

Choose a reason for hiding this comment

nbeliy commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

nbeliy commented Feb 10, 2021

Remi-Gau commented Feb 10, 2021

Remi-Gau commented Feb 15, 2021

allcontributors bot commented Feb 15, 2021

Remi-Gau commented Feb 15, 2021

nbeliy commented Feb 9, 2021 •

edited

Loading