-
Notifications
You must be signed in to change notification settings - Fork 3
Add platform operations docs #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| If the platform test repo does not have the latest changes from template-infra check the infra template’s [Template Deploy](https://github.com/navapbc/template-infra/actions/workflows/template-only-cd.yml) workflow to see if there was any failure in deploying the template to the platform-test, platform-test-flask, or platform-test-nextjs repos. | ||
|
|
||
| ### Failure due to merge conflict with files that have changed on the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most everything below this fairly out of date. Maybe just point to: https://navasage.atlassian.net/wiki/spaces/tss/pages/2011922659/Platform+Ecosystem#template-*-changes-fail-to-apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an open source repo, should we consider doing the opposite / i.e. moving the confluence content to the repo and then linking from confluence to the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fair bit easier to draft/update/collaborate on this kind of documentation outside of the repo (in some ways) and this content is largely applicable to multiple/all the template-* repos, so I think valuable to be centralized. Now that centralization could be in say https://github.com/navapbc/platform (or at least partially in https://github.com/navapbc/platform-cli), but probably needs thought out on how to structure.
|
@doshitan ok made changes including the link to confluence, had one open question for you on that thread i left unresolved |
| Note: Loren has a branch called `lorenyu/clean` with two scripts that you can use: | ||
|
|
||
| * `template-only-bin/clean-account.sh` | ||
| * `template-only-bin/destroy-vpc.sh` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we shouldn't have the scripts in main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. My surface reason is that the scripts are kinda poor quality / kinda hacked together, don't have a great DevEx, error handling, etc, so it personally felt awkward to merge them to main alongside code that I feel is much higher quality. That said, curious for your opinion as a neutral third party who didn't write the scripts.
CONTRIBUTING.md
Outdated
|
|
||
| Sometimes template changes do not propagate cleanly to the platform test repos. See Platform test repo(s) do not have the latest changes from template-infra. | ||
|
|
||
| Also, unlike application changes, infrastructure changes aren’t always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Also, unlike application changes, infrastructure changes aren’t always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes. | |
| Also, unlike application changes, infrastructure changes aren't always automatically applied. Make sure to think about how the changes will be applied before merging and make sure the changes get applied after merge. Double check by making sure the latest deploys (including in platform-test-nextjs and platform-test-flask test repos) completed successfully and that the terraform plans on main show no configuration changes. |
And also the template changes automatically propagate to more repos than just platform-test-* ones, could maybe just point to the CD action (or Ecosystem page) to avoid having another list of things to keep up to date.
| For example, if you want to change the name of the ECR image repository, you should break the change down into the following steps involving three PRs. | ||
|
|
||
| 1. Create PR #1 that adds a new image repository with a new name (This PR only modifies the build repository layer). | ||
| 2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer aren’t automatically applied as part of the CD workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer aren’t automatically applied as part of the CD workflow. | |
| 2. After merging the PR #1, manually apply the changes to the platform test repos (platform-test, platform-test-nextjs, platform-test-flask) since changes to the build-repository layer aren't automatically applied as part of the CD workflow. |
And as mentioned elsewhere, the specific list of repos impacted will change over time, so may be best to point to a more authoritative source, or leave more generic with just "platform test repos"/"test repos".
| * “OIDC provider already exists” during the SetUpAccount step | ||
| * “IAM role already exists” during the SetUpDevEnvironment step | ||
| * “SNS topic already exists” during the SetUpDevEnvironment step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * “OIDC provider already exists” during the SetUpAccount step | |
| * “IAM role already exists” during the SetUpDevEnvironment step | |
| * “SNS topic already exists” during the SetUpDevEnvironment step | |
| * "OIDC provider already exists" during the SetUpAccount step | |
| * "IAM role already exists" during the SetUpDevEnvironment step | |
| * "SNS topic already exists" during the SetUpDevEnvironment step |
|
|
||
| ### Preventing the problem from getting worse | ||
|
|
||
| If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Check run, since further runs will just create more issues you have to look into and more things you have to clean up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Check run, since further runs will just create more issues you have to look into and more things you have to clean up. | |
| If you notice Template CI Infra Checks failing on main, tell people to pause on doing anything that would trigger a Template CI Infra Checks run, since further runs will just create more issues you have to look into and more things you have to clean up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| Things that trigger Template CI Infra Checks runs include: | ||
|
|
||
| * Pushes to main branch | ||
| * Opening PRs or updating PRs with new commits on template-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe just point to https://github.com/navapbc/template-infra/blob/main/.github/workflows/template-only-ci-infra.yml again for an up-to-date list of triggers.
A short summary of things unlikely to change doesn't hurt too much though. Maybe tweak:
| Things that trigger Template CI Infra Checks runs include: | |
| * Pushes to main branch | |
| * Opening PRs or updating PRs with new commits on template-infra | |
| Things that trigger Template CI Infra Checks runs include: | |
| * Pushes to `main` branch | |
| * Opening PRs (or updating PRs with new commits) that touch infrastructure/test code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| ### Diagnosing the immediate problem | ||
|
|
||
| Look in the GitHub logs for the Template CI Infra check that failed. The logs are very long and therefore are collapsed into groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Look in the GitHub logs for the Template CI Infra check that failed. The logs are very long and therefore are collapsed into groups. | |
| Look in the GitHub logs for the Template CI Infra Checks that failed. The logs are very long and therefore are collapsed into groups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading this sentence, I think I meant it as the lower case "check". I think it's actually Template CI Infra Checks check
|
|
||
| ### Diagnosing the root cause | ||
|
|
||
| If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you don’t end up repeating the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you don’t end up repeating the problem. | |
| If you have good reason to believe this is a one time thing, then you can skip this step and proceed to clean up the AWS account to unblock the Template CI Infra Checks workflow. Otherwise, it is important to find out what caused the test to not properly clean up and fix that first so that you don't end up repeating the problem. |
|
|
||
| If the platform test repo does not have the latest changes from template-infra check the infra template’s [Template Deploy](https://github.com/navapbc/template-infra/actions/workflows/template-only-cd.yml) workflow to see if there was any failure in deploying the template to the platform-test, platform-test-flask, or platform-test-nextjs repos. | ||
|
|
||
| ### Failure due to merge conflict with files that have changed on the project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fair bit easier to draft/update/collaborate on this kind of documentation outside of the repo (in some ways) and this content is largely applicable to multiple/all the template-* repos, so I think valuable to be centralized. Now that centralization could be in say https://github.com/navapbc/platform (or at least partially in https://github.com/navapbc/platform-cli), but probably needs thought out on how to structure.
Use regular quotes Co-authored-by: Tanner Doshier <[email protected]>
…to lorenyu/docops
@doshitan for some reason can't reply to that comment in a thread. I think @btabaska is working on some open source tools for our repos, and i think one of them is setting up a wiki. would that help? |
Ticket
Resolves #{TICKET NUMBER OR URL}
Changes
Context for reviewers
Moving most of the platform operations docs to the repo itself https://docs.google.com/document/d/1ULutU1nTTNnJswsRD-XEHmjRpz3nclvK7gi-nxCwsVo/edit?tab=t.0
Testing
N/A