Skip to content

Migration guide for workflow outputs #6162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

bentsherman
Copy link
Member

No description provided.

@bentsherman bentsherman requested a review from a team as a code owner June 5, 2025 00:42
Copy link

netlify bot commented Jun 5, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 107409c
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/685ec035b95c600008f0020c
😎 Deploy Preview https://deploy-preview-6162--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested headings to the bullet points as I think it's easier to scan.

Before making suggestion to the example migration, did you want it to be follow along? Or something more demonstration with explanations?

This will impact some of the language:

"Declare an output for each channel..." vs "An output channel for each must be declared..."

@bentsherman
Copy link
Member Author

Thanks Chris, all great improvements

It is mostly a follow-along, with some explanation sprinkled throughout. I need to guide the reader through a specific example while also pausing at times to make more general points. The current version is my best attempt to balance these things

@christopher-hakkaart
Copy link
Collaborator

christopher-hakkaart commented Jun 5, 2025

Great. The steps you have included are already great, so I'll make suggestions that tweak the language to make it consistent as a follow-along.

@bentsherman
Copy link
Member Author

From our discussion last week, we will follow up this PR with a separate tutorial about the rnaseq-nf pipeline to give more context.

Also, this guide is really more of a tutorial, but we already renamed the section from "Tutorials" to "Guides". Once we move to seqera docs, we should have explicit sections for both

I also wouldn't be opposed to just having a "Guides" and "Tutorials" section in the current docs? I just didn't want to keep renaming the current section back and forth

@christopher-hakkaart
Copy link
Collaborator

I agree with your points. Catching up on some backlog today and will review the current PR in the next day or so.

Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get to this.

I've added suggestions to make this an example of a migration. Past tense, "this is what was done to change this pipeline".

Some sections were multi-step, i.e., they showed something and then showed a better way of doing it, and then they switched to "you could do it this way, but it was done this way for these reasons."

Let me know what you think. Happy to keep iterating if you have suggestions.

Some of the suggestions got a little messy, so if you're happy and accept, I'll give it another check to make sure everything is consistent.


### Replacing `publishDir` with workflow outputs

We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow.
The `publishDir `directive is not required when you publish process outputs in the entry workflow. Instead, outputs are emitted in the entry workflow.


We'll start by removing each `publishDir` directive and publishing the corresponding process output channel in the entry workflow.

First, emit the `QUANT` and `FASTQC` outputs separately in the `RNASEQ` workflow:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
First, emit the `QUANT` and `FASTQC` outputs separately in the `RNASEQ` workflow:
For example, the `QUANT` and `FASTQC` outputs are emitted in the `RNASEQ` workflow:

}
```

We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names.
Maps instead of tuples were used so that fields are accessible by name, and the index file can use the map keys as column names.


We use maps instead of tuples so that we can access fields by name, and so that the index file can use the map keys as column names.

Declare the `samples` output with an index file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Declare the `samples` output with an index file:
The `samples` are declared as outputs with an index file:

}
```

Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location.
Since each channel value contains multiple files that were going to different subdirectories, the *publish statements* in the `path` directive were used to route each file to the appropriate location.


Since each channel value now contains multiple files that were going to different subdirectories, we must use *publish statements* in the `path` directive to route each file to the appropriate location.

Finally, run the pipeline to verify the index file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Finally, run the pipeline to verify the index file:
This would produce the following index file:

"spleen","results/fastqc/spleen","results/quant/spleen"
```

In the future, if we add a tool with per-sample outputs, we only need to join the tool output into the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps our output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the future, if we add a tool with per-sample outputs, we only need to join the tool output into the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps our output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files.
In the future, if a tool with per-sample outputs were added, the tool output could join the `samples_ch` channel and update the output `path` directive accordingly. This approach keeps the output definition concise as we add more tools to the pipeline. Additionally, a single unified index file for all per-sample outputs is easier for downstream pipelines to consume, rather than cross-referencing multiple related index files.

@bentsherman
Copy link
Member Author

I split the Guides section into Tutorials and Guides. Based on our previous discussion, this migration guide seems to be a "tutorial", since it is now a strict step-by-step guide, but also contains some explanations. Whereas I think your page on spot retries is the only guide in this list?

I don't think the past-tense suggestions are a good fit for what I'm trying to do. It feels too brittle and distant, whereas the present-tense imperative feels more engaging and interactive. A similar approach is taken in the "tutorials" for data lineage and Flux, for similar reasons. So I would prefer to try to make it work with the current approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants