Skip to content

Conversation

@jarmak-nv
Copy link
Contributor

This pull request introduces a new GitHub Actions workflow for high-level project tracking and provides supporting documentation and scripts. The main changes include the addition of the workflow configuration, a detailed README explaining the workflow, and a helper script to obtain necessary project and field IDs.

New GitHub Actions Workflow:

  • .github/workflows/project-high-level-tracker.yaml: Adds a comprehensive workflow to automate the management of GitHub Projects, including steps to process branch names, determine development stages, and update project fields based on release schedules.

Documentation:

  • docs/project-automation-readme.md: Provides an overview of the new workflow, its architecture, and configuration details. It includes a mermaid diagram to illustrate the workflow and instructions on how to obtain necessary project and field IDs.

Helper Script:

  • docs/project-graphql-helper.py: Adds a script to fetch project and field IDs from the GitHub GraphQL API, which are required for configuring the workflow.

@jarmak-nv jarmak-nv requested a review from a team as a code owner February 28, 2025 21:24
@jarmak-nv jarmak-nv requested a review from gforsyth February 28, 2025 21:24
@jarmak-nv jarmak-nv added non-breaking Introduces a non-breaking change doc Improvements or additions to documentation feature request New feature or request and removed doc Improvements or additions to documentation labels Feb 28, 2025
@jameslamb jameslamb self-requested a review February 28, 2025 21:56
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first pass at this, will do a more thorough review next week. Just publishing a non-blocking review now so you can see my first few comments.

inputs:
PROJECT_ID:
description: "Project ID"
default: "PVT_kwDOAp2shc4ArK9w"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment for all of these default: values with what look like identifiers from a service.... what are they? Where did the values come from? Are they intended to be overridden?

If every user of this is expected to provide these values at the site where this is called, then I think that:

If these values are intended to be used literally and never overridden, then I think that:

If these values could be used literally but might be overridden, then I think that:

  • the inputs should all have required: true
  • docs should be added here explaining how to set values for them
  • code comments should be added explaining where these defaults came from

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the identifiers
They are graphQL nodeIDs; they are static values associated with GitHub projects. Unfortunately, there's no clean way of dealing with these; either we hard-code them to store them, or we re-query their values using logic shown in the helper python script in docs every time we run them. While they look like private keys etc, they're totally public and anyone can query those values.

On defaults:

  1. To make it so caller workflows can be as simple as possible
  2. To allow for flexibility should others want to maintain their own projects using the same structure

If this setup isn't something we're happy with, then I'd go with the hard-code in env. Goal 1 is more important than goal 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyone can query those values.

Ok but how? Let's imagine you're not available and this breaks complaining about a project with this ID not existing... how would I go find the appropriate values?

The other thing I'm confused about... if it's ok to hard-code a single project's ID, then why is this workflow getting checked into shared-workflows (which contains workflows that are intended to be re-used in many places), instead of in whatever repo the project board is associated with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using logic shown in the helper python script

Ok, I see now that you have docs on how to get these values. Great!

https://github.com/jarmak-nv/shared-workflows/blob/high-level-tracker/docs/project-automation-readme.md#getting-project-and-field-ids

It'd still be helpful to have comments in these files indicated where these specific defaults came from. That's information I'd need to know to update these, or to choose whether or not to override those defaults in calls to these workflows from other repos.

Comment on lines 15 to 16
- Linking PRs to issues and keeping them in sync requires complex operations
- Work often begins in an Issue, but then is completed in the PR, requiring this synchronization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these workflows and the GraphQL code very well, so apologies if this is an ignorant question... does any of this automation assume that 1 issue will be resolved by exactly 1 PR?

The language here (talking about "the" PR) makes me think that it might.

I think that's an assumption that's likely to be violated often, especially for large efforts like rapidsai/cuvs#735

Copy link
Contributor Author

@jarmak-nv jarmak-nv Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah totally agree it's a little confusing; for any given workflow-triggering PR the update-linked-issues workflow synchronizes all issues that are linked to that PR (assuming they are in the same project as the PR).

Ultimately, GitHub itself doesn't have a strong mechanism for delineating the 1-to-many or many-to-many relationship of PRs and Issues in their linkages. If you link 5 PRs to 1 issue, closing any of the PRs closes all of the issues.

The linkage between the two lives inside the PR, so I say "the PR" referring to the PR that triggers the workflow, but can totally see changing it or adding some clarity!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text, lmk if it adds any clarity

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text, lmk if it adds any clarity

If you link 5 PRs to 1 issue, closing any of the PRs closes all of the issues.

Thanks. I re-read the docs today and it's good to see they no longer imply a 1-to-1 relationship between PRs and issues.

I think the biggest thing I'm still really unsure about it what you mean (in comments here and in the docs) by "link to".

For example, if I leave a comment like "Contributes to #17963" in a PR's description or in a comment, I have "linked" the PR to the issue.. GitHub will generate an update on the corresponding issues like these:

image

(ref: microsoft/LightGBM#3867)

But those types of linkages don't result in the behavior "closing the PR closes the issue".

I suspect you mean specifically using one of these keywords: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue

If so, I'd add a link to that in the docs.

Comment on lines +165 to +179
if [[ "$CURRENT_DATE" < "$DEV_START" ]]; then
STAGE_ID="${{ inputs.STAGE_PLANNED_NAME }}"
echo "Stage: Planned"
elif [[ "$CURRENT_DATE" < "$BURNDOWN_START" ]]; then
STAGE_ID="${{ inputs.STAGE_DEVELOPMENT_NAME }}"
echo "Stage: Development"
elif [[ "$CURRENT_DATE" < "$FREEZE_START" ]]; then
STAGE_ID="${{ inputs.STAGE_BURNDOWN_NAME }}"
echo "Stage: Burndown"
elif [[ "$CURRENT_DATE" < "$RELEASE_DATE" ]]; then
STAGE_ID="${{ inputs.STAGE_CODE_FREEZE_NAME }}"
echo "Stage: Code Freeze"
else
STAGE_ID="${{ inputs.STAGE_HOTFIX_NAME }}"
echo "Stage: Hotfix"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move as many of the ${{ }} pre-processor lines as possible to environment variables to prevent script injection attacks.

That goes for this block and any other lines in this workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair, if these stay as variables vs going hard-coded per James' comment above I'll update.

I don't think this is a vector for script injection though because the only way these variables get set is inside the action (they aren't affected by a user defined field in a workflow-triggering PR). If a bad actor attempted to set the values in their own branch, it wouldn't have an effect unless we merged it.

One of the frustrating parts of these workflows is that testing is challenging because they need to be merged into the default branch to be functional, but it does provide protection against script injection as a benefit in many cases!

bdice and others added 15 commits March 4, 2025 23:22
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
Forward-merge branch-25.04 into branch-25.06
pre-commit-ci bot and others added 27 commits June 24, 2025 14:09
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: James Lamb <[email protected]>
This is an experiment in using `zizmor` (https://docs.zizmor.sh/) to run static analysis on all of the shared workflow files.  This seems like a nice tool to add, especially in this repository.

Some of the classes of errors/warnings that are fixed here (and any associated configuration):

### unpinned-uses
https://docs.zizmor.sh/audits/#unpinned-uses

Any `uses:` mapping should pin to an exact hash to avoid things getting swapped out underneath.
Since we deliberately make use of some of the `uses:` to reference our own `shared-workflows` and `nv-gha-runners` -- these are allowed to point at refs.  Those patterns are configured in `.github/zizmor.yml`

### artipacked
https://docs.zizmor.sh/audits/#artipacked

We should always explicitly set `persist-credentials`, either to `true` or `false`.  Everywhere that the `checkout` action has implicitly set it to `true`, I have made it an explicit `true`.

I suspect some of these need to be true, but probably not all of them.  I'll have to do some testing downstream to see which actions break when those secrets are disabled.

### overprovisioned-secrets
https://docs.zizmor.sh/audits/#overprovisioned-secrets

Looking up secrets like `secrets[some_key]` injects the entire secrets context into the runner, but I don't see a way around that for our current secret lookup patterns. I've just ignored these instances individually.

### unpinned-images
https://docs.zizmor.sh/audits/#unpinned-images

The only unpinned images belong to us and those have been individually allowlisted.


### template-injection
https://docs.zizmor.sh/audits/#template-injection

Generally speaking lines like `run: ${{ inputs.script }}` are susceptible to template injection.  Also, that's the entire point of these lines, so they are individually allowlisted.

For other potentially injectable lines, like `run: rapids-wheels-anaconda "${{ inputs.package-name }}" "${{ inputs.package-type }}" `, I've unpacked the inputs in to the environment and passed through those env-vars instead.

In rapidsai#351, @msarahan and @jameslamb and I determined that we can use full SHAs to specify the image to use for a given step, but leave a trailing same-line comment that points to the corresponding tag.

That let's us use the more secure exact pin for the images that we don't control, but still keeps it human-readable and compatible with dependabot, e.g.

```
- uses: actions/checkout@def456 # v4.2.2

```
Forward-merge branch-25.08 into branch-25.10
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Forward-merge branch-25.08 into branch-25.10
Upload sccache client logs on success and failure, except when the workflow is cancelled.
…he-logs

Always upload sccache logs in `build-in-devcontainer.yaml`
Forward-merge branch-25.08 into branch-25.10
…evcontainers

Disable zizmor for `build-in-devcontainer.yaml`
Forward-merge branch-25.08 into branch-25.10
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…4.3.1 (rapidsai#407)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@jarmak-nv
Copy link
Contributor Author

Closing this PR; docs are still useful and have been updated/moved to #415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.