Skip to content

Add v1 Deployment & Ops Skills Taxonomy #19400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Feb 25, 2025

@rmloveland rmloveland marked this pull request as draft February 25, 2025 17:19
Copy link

Files changed:

Copy link

netlify bot commented Feb 25, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit e8400ab
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/68065dfa7dac240008676c73

Copy link

netlify bot commented Feb 25, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit e8400ab
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/68065dfa74f9430008e2ef86

Copy link

netlify bot commented Feb 25, 2025

Netlify Preview

Name Link
🔨 Latest commit e8400ab
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/68065dfa74f9430008e2ef84
😎 Deploy Preview https://deploy-preview-19400--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@rmloveland rmloveland force-pushed the 20250225-DOC-12354-deployment-ops-skills-taxonomy branch 3 times, most recently from f5ae0bd to 5d71071 Compare February 27, 2025 16:11
Fixes DOC-12354
@rmloveland rmloveland force-pushed the 20250225-DOC-12354-deployment-ops-skills-taxonomy branch from 5d71071 to 9cab2b7 Compare February 27, 2025 16:33
@rmloveland rmloveland marked this pull request as ready for review March 12, 2025 20:28
@rmloveland rmloveland requested a review from mwang1026 March 12, 2025 20:28
Copy link
Contributor

@mwang1026 mwang1026 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments mainly. I wonder if we should run this by someone from the PS team?

@rmloveland
Copy link
Contributor Author

Minor comments mainly. I wonder if we should run this by someone from the PS team?

thanks @mwang1026 ! updated in latest commit based on your feedback

happy to have someone from the PS team to look, who do you think we should tag ?

@rmloveland
Copy link
Contributor Author

hi @BramGruneir !

@mwang1026 suggested getting someone from the PS team to look at this docs PR, are you the right person to ask for help finding a reviewer?

context for this docs PR is:

In the January 2025 docs on-site, one of the things we discussed was a project called “Making CockroachDB Ops & Admin More Self-serve”

Associated with that project was a list of tasks (aka a “skills taxonomy”) that users need help learning how to do for themselves

This docs PR is an attempt to gather links to those tasks/skills in one place so that users can quickly find links to how to do these specific tasks

Copy link

@MattWhelan MattWhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is a really good thing to have in our docs.

@rmloveland
Copy link
Contributor Author

rmloveland commented Apr 17, 2025

TODO based on offline comment from another team member: we should add a link on how to get debug/tsdump

@rmloveland rmloveland requested review from taroface and jhlodin and removed request for taroface April 21, 2025 18:05
@rmloveland
Copy link
Contributor Author

@taroface pls ignore the accidental tag for review, sending this one to @jhlodin for docs review since this is in his area and wanted him to be informed on the change

Copy link
Contributor

@jhlodin jhlodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments, mostly small nits/suggestions

Cockroach Labs offers [Professional Services](https://www.cockroachlabs.com/company/professional-services/) that can assist you with getting applications into production faster and more efficiently.
{{site.data.alerts.end}}

## Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a one-liner description of each skill would be helpful under each header to introduce the following list, like:
"The configuration skill involves managing your CockroachDB monitoring and making informed configuration changes based on trends and alerts" or similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, "configuration" in this context seems very specifically geared towards configuration of the underlying deployment infrastructure rather than configuration of the cluster itself. I think the term "configuration" itself is ambiguous, and should maybe be "Infrastructure configuration" or similar?

- [Rolling upgrades]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#perform-a-patch-upgrade)
- Downgrade a cluster from a [patch version]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#roll-back-a-patch-upgrade)
- Downgrade a cluster from a [major version]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#roll-back-a-major-version-upgrade)
- [Change a cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}#change-a-cluster-setting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, this in particular feels like it's related to the "configuration" skill unless you specify that "configuration" is for infrastructure.


- [Shut down a node gracefully]({% link {{ page.version.version }}/node-shutdown.md %})
- [Handling unplanned node outages]({% link {{ page.version.version }}/recommended-production-settings.md %}#load-balancing)
- [Adding nodes]({% link {{ page.version.version }}/cockroach-start.md %}#add-a-node-to-a-cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists contain a mix of passive and active verb usage. Suggest "Add nodes"/"Remove nodes" rather than "Adding nodes"/"Removing nodes" etc

Comment on lines +49 to +55
- Cluster repaving involves the following individual skills, which are also used during [rolling upgrades]({% link {{ page.version.version }}/upgrade-cockroach-version.md %}#perform-a-patch-upgrade):
1. [Shut down a node gracefully]({% link {{ page.version.version }}/node-shutdown.md %})
1. Detach the [persistent volume]({% link {{ page.version.version }}/kubernetes-overview.md %}#kubernetes-terminology) (a.k.a. persistent disk) from the removed node's virtual machine (VM) (this step is optional but recommended)
1. Delete the removed node's VM
1. Start a new VM
1. Reattach the persistent disk to the new VM (necessary if you did step #2)
1. [Add a node to the cluster]({% link {{ page.version.version }}/cockroach-start.md %}#add-a-node-to-a-cluster) from the new VM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sudden switch to an instructional list feels awkward, is there not a better place to link to that describes cluster repaving in more depth?

- [Cluster instability: Dead/suspect nodes]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#node-liveness-issues)
- [Out of memory problems]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#out-of-memory-oom-crash)
- [Imbalanced cluster load]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing)
- [EOF errors]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#client-connection-issues)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should spell this out, "End of file (EOF) errors"

1. Reattach the persistent disk to the new VM (necessary if you did step #2)
1. [Add a node to the cluster]({% link {{ page.version.version }}/cockroach-start.md %}#add-a-node-to-a-cluster) from the new VM

## Troubleshooting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this list describes problems rather than tasks, an introductory line for this section like I proposed above is definitely necessary to clarify.

- [Imbalanced cluster load]({% link {{ page.version.version }}/architecture/replication-layer.md %}#load-based-replica-rebalancing)
- [EOF errors]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#client-connection-issues)
- [Changefeed is falling behind]({% link {{ page.version.version }}/advanced-changefeed-configuration.md %}#lagging-ranges)
- [Get a "debug zip" file]({% link {{ page.version.version }}/cockroach-debug-zip.md %})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the term "debug zip" commonly jargon for our customers? I'd think we should spell it out more clearly in this context, like "Download an archive for debugging"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless support is regularly asking customers to download a "debug zip" so we know that's the terminology they're looking for.

- [EOF errors]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %}#client-connection-issues)
- [Changefeed is falling behind]({% link {{ page.version.version }}/advanced-changefeed-configuration.md %}#lagging-ranges)
- [Get a "debug zip" file]({% link {{ page.version.version }}/cockroach-debug-zip.md %})
- [Get a "tsdump" (timeseries dump) file]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, how about "Collect timestamped diagnostic logs" or similar? This one I'm less inclined to describe because that's a mess.

Comment on lines +72 to +75
- [Create S3 bucket for backup data]({% link {{ page.version.version }}/use-cloud-storage.md %}#amazon-s3-storage-classes)
- [Full cluster backup to S3]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#full-backups)
- [Incremental backup to S3]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#incremental-backups)
- [Cluster restore from AWS S3]({% link {{ page.version.version }}/restore.md %}#restore-a-cluster)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have non-S3 storage topics for this?

Comment on lines +79 to +84
- [Production Checklist]({% link {{ page.version.version }}/recommended-production-settings.md %})
- [Deploy CockroachDB Manually]({% link {{ page.version.version }}/manual-deployment.md %})
- [Deploy a Local Cluster from Binary (Secure)]({% link {{ page.version.version }}/secure-a-cluster.md %})
- [SQL Performance Best Practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %})
- [Performance Tuning Recipes]({% link {{ page.version.version }}/performance-recipes.md %})
- [Troubleshoot Self-Hosted Setup]({% link {{ page.version.version }}/cluster-setup-troubleshooting.md %})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be sentence case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants