Skip to content

Conversation

@lorenyu
Copy link
Contributor

@lorenyu lorenyu commented Aug 18, 2025

Ticket

Resolves #{TICKET NUMBER OR URL}

Changes

Context for reviewers

Moving most of the platform operations docs to the repo itself https://docs.google.com/document/d/1ULutU1nTTNnJswsRD-XEHmjRpz3nclvK7gi-nxCwsVo/edit?tab=t.0

Testing

N/A

@lorenyu lorenyu requested a review from a team as a code owner August 18, 2025 23:51
@lorenyu lorenyu changed the title Add docs from platform operations Add platform operations docs Aug 18, 2025
@lorenyu lorenyu requested a review from doshitan August 18, 2025 23:52

If the platform test repo does not have the latest changes from template-infra check the infra template’s [Template Deploy](https://github.com/navapbc/template-infra/actions/workflows/template-only-cd.yml) workflow to see if there was any failure in deploying the template to the platform-test, platform-test-flask, or platform-test-nextjs repos.

### Failure due to merge conflict with files that have changed on the project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an open source repo, should we consider doing the opposite / i.e. moving the confluence content to the repo and then linking from confluence to the repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair bit easier to draft/update/collaborate on this kind of documentation outside of the repo (in some ways) and this content is largely applicable to multiple/all the template-* repos, so I think valuable to be centralized. Now that centralization could be in say https://github.com/navapbc/platform (or at least partially in https://github.com/navapbc/platform-cli), but probably needs thought out on how to structure.

@lorenyu lorenyu requested a review from doshitan August 20, 2025 15:48
@lorenyu
Copy link
Contributor Author

lorenyu commented Aug 20, 2025

@doshitan ok made changes including the link to confluence, had one open question for you on that thread i left unresolved

Comment on lines +58 to +61
Note: Loren has a branch called `lorenyu/clean` with two scripts that you can use:

* `template-only-bin/clean-account.sh`
* `template-only-bin/destroy-vpc.sh`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we shouldn't have the scripts in main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. My surface reason is that the scripts are kinda poor quality / kinda hacked together, don't have a great DevEx, error handling, etc, so it personally felt awkward to merge them to main alongside code that I feel is much higher quality. That said, curious for your opinion as a neutral third party who didn't write the scripts.

CONTRIBUTING.md Outdated

Sometimes template changes do not propagate cleanly to the platform test repos. See Platform test repo(s) do not have the latest changes from template-infra.

Also, unlike application changes, infrastructure changes aren’t always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Also, unlike application changes, infrastructure changes arent always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes.
Also, unlike application changes, infrastructure changes aren't always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes.

And also the template changes automatically propagate to more repos than just platform-test-* ones, could maybe just point to the CD action (or Ecosystem page) to avoid having another list of things to keep up to date.

For example, if you want to change the name of the ECR image repository, you should break the change down into the following steps involving three PRs.

1. Create PR #1 that adds a new image repository with a new name (This PR only modifies the build repository layer).
2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer aren’t automatically applied as part of the CD workflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer arent automatically applied as part of the CD workflow.
2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer aren't automatically applied as part of the CD workflow.

And as mentioned elsewhere, the specific list of repos impacted will change over time, so may be best to point to a more authoritative source, or leave more generic with just "platform test repos"/"test repos".

Comment on lines 22 to 24
* “OIDC provider already exists” during the SetUpAccount step
* “IAM role already exists” during the SetUpDevEnvironment step
* “SNS topic already exists” during the SetUpDevEnvironment step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* OIDC provider already exists during the SetUpAccount step
* IAM role already exists during the SetUpDevEnvironment step
* SNS topic already exists during the SetUpDevEnvironment step
* "OIDC provider already exists" during the SetUpAccount step
* "IAM role already exists" during the SetUpDevEnvironment step
* "SNS topic already exists" during the SetUpDevEnvironment step


### Preventing the problem from getting worse

If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Check run, since further runs will just create more issues you have to look into and more things you have to clean up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Check run, since further runs will just create more issues you have to look into and more things you have to clean up.
If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Checks run, since further runs will just create more issues you have to look into and more things you have to clean up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 11 to 14
Things that trigger Template CI Infra Checks runs include:

* Pushes to main branch
* Opening PRs or updating PRs with new commits on template-infra
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe just point to https://github.com/navapbc/template-infra/blob/main/.github/workflows/template-only-ci-infra.yml again for an up-to-date list of triggers.

A short summary of things unlikely to change doesn't hurt too much though. Maybe tweak:

Suggested change
Things that trigger Template CI Infra Checks runs include:
* Pushes to main branch
* Opening PRs or updating PRs with new commits on template-infra
Things that trigger Template CI Infra Checks runs include:
* Pushes to `main` branch
* Opening PRs (or updating PRs with new commits) that touch infrastructure/test code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Diagnosing the immediate problem

Look in the GitHub logs for the Template CI Infra check that failed. The logs are very long and therefore are collapsed into groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Look in the GitHub logs for the Template CI Infra check that failed. The logs are very long and therefore are collapsed into groups.
Look in the GitHub logs for the Template CI Infra Checks that failed. The logs are very long and therefore are collapsed into groups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading this sentence, I think I meant it as the lower case "check". I think it's actually Template CI Infra Checks check


### Diagnosing the root cause

If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you don’t end up repeating the problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you dont end up repeating the problem.
If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you don't end up repeating the problem.


If the platform test repo does not have the latest changes from template-infra check the infra template’s [Template Deploy](https://github.com/navapbc/template-infra/actions/workflows/template-only-cd.yml) workflow to see if there was any failure in deploying the template to the platform-test, platform-test-flask, or platform-test-nextjs repos.

### Failure due to merge conflict with files that have changed on the project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair bit easier to draft/update/collaborate on this kind of documentation outside of the repo (in some ways) and this content is largely applicable to multiple/all the template-* repos, so I think valuable to be centralized. Now that centralization could be in say https://github.com/navapbc/platform (or at least partially in https://github.com/navapbc/platform-cli), but probably needs thought out on how to structure.

@lorenyu
Copy link
Contributor Author

lorenyu commented Sep 16, 2025

It's a fair bit easier to draft/update/collaborate on this kind of documentation outside of the repo (in some ways) and this content is largely applicable to multiple/all the template-* repos, so I think valuable to be centralized. Now that centralization could be in say https://github.com/navapbc/platform (or at least partially in https://github.com/navapbc/platform-cli), but probably needs thought out on how to structure.

@doshitan for some reason can't reply to that comment in a thread. I think @btabaska is working on some open source tools for our repos, and i think one of them is setting up a wiki. would that help?

@lorenyu lorenyu requested a review from doshitan September 16, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants