Skip to content

DOC-11497 Docs for obs: Enabling troubleshooting hot spots externally (e.g., logs or metrics) #19577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

florence-crl
Copy link
Contributor

@florence-crl florence-crl commented May 1, 2025

Fixes DOC-11497

Added detect-hotspots.md and associated images.

Rendered previews:

Copy link

github-actions bot commented May 1, 2025

Files changed:

Copy link

netlify bot commented May 1, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit f0370b3
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/6851b312362ada00083641dd

Copy link

netlify bot commented May 1, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit f0370b3
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/6851b312a578fc00083c1794

Copy link

netlify bot commented May 1, 2025

Deploy Preview for cockroachdb-docs failed. Why did it fail? →

Name Link
🔨 Latest commit 0173eb8
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/6813b55b6c4a2d00084eadec

Copy link

netlify bot commented May 1, 2025

Netlify Preview

Name Link
🔨 Latest commit f0370b3
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/6851b3120add9a000825327e
😎 Deploy Preview https://deploy-preview-19577--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@florence-crl florence-crl requested a review from kevin-v-ngo May 13, 2025 19:17
Copy link
Contributor Author

@florence-crl florence-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the first look, @angles-n-daemons please review again.

Copy link

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, couple more quick comments here.

Copy link
Contributor Author

@florence-crl florence-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR

Copy link

@kevin-v-ngo kevin-v-ngo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome Doc! Few questions and suggestions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions and suggestions,

  1. Can we simplify this and remove the second box ("Is there a node outlier in the metrics?")?
  2. Are guaranteed to have a 'hot ranges log' when there is a popular key log for the latch contention workflow? CC @angles-n-daemons

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified diagram

- Once you identify a relevant log, note the range ID in the tag section of the log.

{{site.data.alerts.callout_info}}
There may be false positives of the `popular key detected` log.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? If we determined that there is a metric anomaly in latch or CPU, don't we remove the false positives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angles-n-daemons Would you be able to answer the above questions?

Copy link
Contributor Author

@florence-crl florence-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-v-ngo thanks for your first review, please take a second look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified diagram

- Once you identify a relevant log, note the range ID in the tag section of the log.

{{site.data.alerts.callout_info}}
There may be false positives of the `popular key detected` log.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angles-n-daemons Would you be able to answer the above questions?

@florence-crl florence-crl requested a review from kevin-v-ngo June 17, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants