-
Notifications
You must be signed in to change notification settings - Fork 699
Migration guide for workflow outputs #6162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ben Sherman <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested headings to the bullet points as I think it's easier to scan.
Before making suggestion to the example migration, did you want it to be follow along? Or something more demonstration with explanations?
This will impact some of the language:
"Declare an output for each channel..." vs "An output channel for each must be declared..."
Signed-off-by: Ben Sherman <[email protected]>
Thanks Chris, all great improvements It is mostly a follow-along, with some explanation sprinkled throughout. I need to guide the reader through a specific example while also pausing at times to make more general points. The current version is my best attempt to balance these things |
Great. The steps you have included are already great, so I'll make suggestions that tweak the language to make it consistent as a follow-along. |
From our discussion last week, we will follow up this PR with a separate tutorial about the rnaseq-nf pipeline to give more context. Also, this guide is really more of a tutorial, but we already renamed the section from "Tutorials" to "Guides". Once we move to seqera docs, we should have explicit sections for both I also wouldn't be opposed to just having a "Guides" and "Tutorials" section in the current docs? I just didn't want to keep renaming the current section back and forth |
I agree with your points. Catching up on some backlog today and will review the current PR in the next day or so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took me so long to get to this.
I've added suggestions to make this an example of a migration. Past tense, "this is what was done to change this pipeline".
Some sections were multi-step, i.e., they showed something and then showed a better way of doing it, and then they switched to "you could do it this way, but it was done this way for these reasons."
Let me know what you think. Happy to keep iterating if you have suggestions.
Some of the suggestions got a little messy, so if you're happy and accept, I'll give it another check to make sure everything is consistent.
docs/guides/workflow-outputs.md
Outdated
|
||
### Replacing `publishDir` with workflow outputs | ||
|
||
We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow. | |
The `publishDir `directive is not required when you publish process outputs in the entry workflow. Instead, outputs are emitted in the entry workflow. |
docs/guides/workflow-outputs.md
Outdated
|
||
We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow. | ||
|
||
First, emit the `QUANT` and `FASTQC` outputs separately in the `RNASEQ` workflow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, emit the `QUANT` and `FASTQC` outputs separately in the `RNASEQ` workflow: | |
For example, the `QUANT` and `FASTQC` outputs are emitted in the `RNASEQ` workflow: |
docs/guides/workflow-outputs.md
Outdated
} | ||
``` | ||
|
||
We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names. | |
Maps instead of tuples were used so that fields are accessible by name, and the index file can use the map keys as column names. |
docs/guides/workflow-outputs.md
Outdated
|
||
We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names. | ||
|
||
Declare the `samples` output with an index file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare the `samples` output with an index file: | |
The `samples` are declared as outputs with an index file: |
docs/guides/workflow-outputs.md
Outdated
} | ||
``` | ||
|
||
Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location. | |
Since each channel value contains multiple files that were going to different subdirectories, the *publish statements* in the `path` directive were used to route each file to the appropriate location. |
docs/guides/workflow-outputs.md
Outdated
|
||
Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location. | ||
|
||
Finally, run the pipeline to verify the index file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, run the pipeline to verify the index file: | |
This would produce the following index file: |
docs/guides/workflow-outputs.md
Outdated
"spleen","results/fastqc/spleen","results/quant/spleen" | ||
``` | ||
|
||
In the future, if we add a tool with per-sample outputs, we only need to join the tool output into the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps our output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if we add a tool with per-sample outputs, we only need to join the tool output into the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps our output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files. | |
In the future, if a tool with per-sample outputs were added, the tool output could join the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps the output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
I split the Guides section into Tutorials and Guides. Based on our previous discussion, this migration guide seems to be a "tutorial", since it is now a strict step-by-step guide, but also contains some explanations. Whereas I think your page on spot retries is the only guide in this list? I don't think the past-tense suggestions are a good fit for what I'm trying to do. It feels too brittle and distant, whereas the present-tense imperative feels more engaging and interactive. A similar approach is taken in the "tutorials" for data lineage and Flux, for similar reasons. So I would prefer to try to make it work with the current approach |
No description provided.