Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ontology support for directories in addition to File and File[] #1617

Open
kannon92 opened this issue Feb 11, 2022 · 5 comments
Open

Comments

@kannon92
Copy link
Contributor

Hello,

We work in the image processing domain and we would like to be able to guarantee that a user has the correct ontology before submitting a workflow.

I posted this on the discourse forum and it turns out that ontology is only working for File and File[]. We have a separate issue to log a warning if someone is using format field with a Directory class.

I'd like to request the ability to use ontologies for directory.

@kinow
Copy link
Member

kinow commented Feb 11, 2022

Hi @kannon92 ! 👋

How do you see the ontology being used with a directory? Would it be something like myonto:ImagesListDirectory be applied to a Directory to validate that the directory contains a certain list of files?

I think if we do a few iterations, exercising how we would apply the ontology to a directory, we might be able to either come up with some possible future implementations, or decide to use some other workaround to validate the directories.

@kannon92
Copy link
Contributor Author

I am not sure if ontology checking actually looks at the files.

I just want to enforce a failure if somehow specfies an output of a certain format and they use it in another input but the formats don't match. That should be an error.

And yea, I think eventually we would want to adopt something like [EDAM-BIOIMAGING] (https://bioportal.bioontology.org/ontologies/EDAM-BIOIMAGING) for the format field. But right now, we will be okay with a custom list of allowed file formats on each command line tool.

@tetron
Copy link
Member

tetron commented Feb 11, 2022

@kannon92 possibly you could use secondaryFiles? If there's something you can designate as a "primary" file (usually an entry point or manifest of some sort) you can have all the other files and subdirectories that appear along with it tied in as secondary files.

@kannon92
Copy link
Contributor Author

Hello. I don't know if that really helps me out. In my case, these are typically large amount of image files that are all of a specific format.

@mr-c
Copy link
Member

mr-c commented Feb 17, 2022

Hello. I don't know if that really helps me out. In my case, these are typically large amount of image files that are all of a specific format.

Is each file a specific format? Or are they all of the same format?

Is there a specification of how this directory is supposed to be structured?

Are there subdirectories?

Do multiple apps use this structure?

Do you need to construct this structure from different parts of the workflow, and/or decompose it for later steps? Or is this directory structure provided as an initial input and some later version of it as a final output?

From a typing perspective, another option is a custom record type with entries for the different types of files (each entry being of type File or an array of Files, both having a format field; or an entry that is another custom record type with its own fields). Some entries can also be optional and/or have secondaryFiles with specific variations on the file suffixes. At execution time this custom record type could be transformed into a directory hierarchy with a particular layout and naming scheme using InitialWorkDirRequirement, but that could get a bit cumbersome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants