Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-746 update redpanda_cloud_storage metrics #906

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 200 additions & 78 deletions modules/reference/pages/public-metrics-reference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -969,182 +969,304 @@ The number of transform processors in a specific state (running, inactive, error

== Cloud storage metrics

include::reference:partial$public_metrics_tip.adoc[]

ifndef::env-cloud[]
NOTE: Cloud storage metrics are only available if you have:

- xref:manage:tiered-storage.adoc[] enabled
- The cluster property xref:reference:properties/object-storage-properties.adoc#cloud_storage_enabled[cloud_storage_enabled] set to `true`
[NOTE]
====
Cloud storage metrics are only available if you have:

* xref:manage:tiered-storage.adoc[] enabled
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved
* The cluster property xref:reference:properties/object-storage-properties.adoc#cloud_storage_enabled[cloud_storage_enabled] set to `true`
====
endif::[]

=== redpanda_cloud_storage_cache_space_size_bytes
=== redpanda_cloud_storage_active_segments

Sum of size of cached objects.
Number of remote log segments currently hydrated for read.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

=== redpanda_cloud_storage_housekeeping_drains
*Type*: gauge

Number of times upload housekeeping queue was drained.
=== redpanda_cloud_storage_anomalies

=== redpanda_cloud_storage_spillover_manifests_materialized_bytes
Count of missing partition manifest anomalies for the topic.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

Bytes of memory used for spilled manifests currently cached in memory.
*Type*: gauge

=== redpanda_cloud_storage_cache_op_hit

Number of get requests for objects that are already in cache.

*Type*: counter

=== redpanda_cloud_storage_cache_op_in_progress_files

Number of files that are being put to cache.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: gauge

=== redpanda_cloud_storage_cache_op_miss

Number of failed get requests because of missing object in the cache.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_op_put

Number of objects written into cache.

=== redpanda_cloud_storage_segments
*Type*: counter

Total number of accounted topic segments in the cloud.
=== redpanda_cloud_storage_cache_space_files

=== redpanda_cloud_storage_jobs_local_segment_reuploads
Number of objects in cache.

Number of segment reuploads from local data directory.
*Type*: gauge

=== redpanda_cloud_storage_cache_trim_failed_trims
=== redpanda_cloud_storage_cache_space_hwm_files

High watermark of number of objects in cache.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: gauge

=== redpanda_cloud_storage_cache_space_hwm_size_bytes

High watermark of sum of size of cached objects.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: gauge

=== redpanda_cloud_storage_cache_space_size_bytes

Sum of size of cached objects.

*Type*: gauge

=== redpanda_cloud_storage_cache_space_tracker_size

Number of entries in cache access tracker.

Number of times Redpanda could not free the expected amount of space, indicating possible bug or configuration issue.
*Type*: gauge

=== redpanda_cloud_storage_cache_space_tracker_syncs

Number of times the access tracker was updated with cache disk data.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_trim_carryover_trims

Number of times we invoked carryover trim.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_trim_exhaustive_trims

Number of times a fast trim could not free enough space and had to fall back to a slower exhaustive trim.
Number of times we couldn't free enough space with a fast trim and had to fall back to a slower exhaustive trim.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_trim_failed_trims

Number of times could not free the expected amount of space, indicating possible bug or configuration issue.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_trim_fast_trims

Number of times we have trimmed the cache using the normal (fast) mode.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cache_trim_in_mem_trims

Number of times we trimmed the cache using the in-memory access tracker.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

*Type*: counter

=== redpanda_cloud_storage_cloud_log_size

Total size in bytes of the user-visible log for the topic.

*Type*: gauge

=== redpanda_cloud_storage_deleted_segments

Count of deleted remote segments.
Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non compacted segments being replaced with their compacted equivalent.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

=== redpanda_cloud_storage_segment_uploads_total
*Type*: counter

Successful data segment uploads.
=== redpanda_cloud_storage_errors_total

=== redpanda_cloud_storage_active_segments
Number of transmit errors.

Number of remote log segments currently hydrated for read.
*Type*: counter

=== redpanda_cloud_storage_cache_trim_fast_trims
=== redpanda_cloud_storage_housekeeping_drains

Number of times upload housekeeping queue was drained.

*Type*: gauge

Number of times Redpanda trimmed the cache using the normal (fast) mode.
=== redpanda_cloud_storage_housekeeping_jobs_completed

Number of executed housekeeping jobs.

*Type*: counter

=== redpanda_cloud_storage_housekeeping_jobs_failed

Number of failed housekeeping jobs.

=== redpanda_cloud_storage_partition_readers_delayed
*Type*: counter

How many partition readers were delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
=== redpanda_cloud_storage_housekeeping_jobs_skipped

=== redpanda_cloud_storage_segments_pending_deletion
Number of skipped housekeeping jobs.

Total number of topic segments pending deletion from the cloud.
*Type*: counter

=== redpanda_cloud_storage_housekeeping_rounds
=== redpanda_cloud_storage_housekeeping_pauses

Number of upload housekeeping rounds.
Number of times upload housekeeping was paused.

=== redpanda_cloud_storage_segment_readers_delayed
*Type*: gauge

Number of segment readers delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
=== redpanda_cloud_storage_housekeeping_requests_throttled_average_rate

=== redpanda_cloud_storage_cache_space_hwm_size_bytes
Average rate of requests from the read and write path which were throttled by tiered storage (per shard).
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

High watermark of sum of size of cached objects.
*Type*: gauge

=== redpanda_cloud_storage_cache_space_hwm_files
=== redpanda_cloud_storage_housekeeping_resumes

High watermark of number of objects in cache.
Number of times upload housekeeping was resumed.

=== redpanda_cloud_storage_cache_op_in_progress_files
*Type*: gauge

Number of files that are being added to cache.
=== redpanda_cloud_storage_housekeeping_rounds

Number of upload housekeeping rounds.

*Type*: counter

=== redpanda_cloud_storage_jobs_cloud_segment_reuploads

Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage).

*Type*: gauge

=== redpanda_cloud_storage_jobs_local_segment_reuploads

Number of segment reuploads from local data directory.

*Type*: gauge

=== redpanda_cloud_storage_jobs_manifest_reuploads

Number of manifest reuploads performed by all housekeeping jobs.

=== redpanda_cloud_storage_housekeeping_pauses
*Type*: gauge

Number of times upload housekeeping was paused.
=== redpanda_cloud_storage_jobs_metadata_syncs

=== redpanda_cloud_storage_segment_index_uploads_total
Number of archival configuration updates performed by all housekeeping jobs.

Successful segment index uploads.
*Type*: gauge

=== redpanda_cloud_storage_cache_op_miss
=== redpanda_cloud_storage_jobs_segment_deletions

Number of failed get requests because of missing object in the cache.
Number of segments deleted by all housekeeping jobs.

=== redpanda_cloud_storage_errors_total
*Type*: gauge

Number of transmit errors.
=== redpanda_cloud_storage_limits_downloads_throttled_sum

=== redpanda_cloud_storage_spillover_manifest_uploads_total
Total amount of time downloads were throttled (ms).

Successful spillover manifest uploads.
*Type*: counter

=== redpanda_cloud_storage_housekeeping_requests_throttled_average_rate
=== redpanda_cloud_storage_partition_manifest_uploads_total

Average rate per shard of requests from the read and write path that were throttled by Tiered Storage.
Successful partition manifest uploads.

=== redpanda_cloud_storage_jobs_segment_deletions
*Type*: counter

Number of segments deleted by all housekeeping jobs.
=== redpanda_cloud_storage_partition_readers

=== redpanda_cloud_storage_segment_materializations_delayed
Number of partition reader instances (number of current fetch/timequery requests reading from tiered storage).
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

Number of segment materializations delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
*Type*: gauge

=== redpanda_cloud_storage_jobs_metadata_syncs
=== redpanda_cloud_storage_partition_readers_delayed

Number of archival configuration updates performed by all housekeeping jobs.
How many partition reades were delayed due to hitting reader limit. This indicates cluster is saturated with tiered storage reads.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

=== redpanda_cloud_storage_housekeeping_jobs_completed

Number of successful housekeeping jobs.
*Type*: counter

=== redpanda_cloud_storage_readers

Number of segment read cursors for hydrated remote log segments.

=== redpanda_cloud_storage_partition_manifest_uploads_total
*Type*: gauge

Successful partition manifest uploads.
=== redpanda_cloud_storage_segment_index_uploads_total

=== redpanda_cloud_storage_limits_downloads_throttled_sum
Successful segment index uploads.

Total amount of throttling applied to cloud storage downloads.
*Type*: counter

=== redpanda_cloud_storage_housekeeping_resumes
=== redpanda_cloud_storage_segment_materializations_delayed

Number of times upload housekeeping was resumed.
How many segment materializations were delayed due to hitting reader limit. This indicates cluster is saturated with tiered storage reads.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

=== redpanda_cloud_storage_cache_op_hit
*Type*: counter

Number of get requests for objects that are already in cache.
=== redpanda_cloud_storage_segment_readers_delayed

=== redpanda_cloud_storage_spillover_manifests_materialized_count
How many segment readers were delayed due to hitting reader limit. This indicates cluster is saturated with tiered storage reads.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

Number of spilled manifests currently cached in memory.
*Type*: counter

=== redpanda_cloud_storage_uploaded_bytes
=== redpanda_cloud_storage_segment_uploads_total

Total number of uploaded bytes for the topic.
Successful data segment uploads.
Deflaimun marked this conversation as resolved.
Show resolved Hide resolved

=== redpanda_cloud_storage_cache_space_files
*Type*: counter

Number of objects in cache.
=== redpanda_cloud_storage_segments

=== redpanda_cloud_storage_housekeeping_jobs_skipped
Total number of accounted segments in the cloud for the topic.

Number of skipped housekeeping jobs.
*Type*: gauge

=== redpanda_cloud_storage_partition_readers
=== redpanda_cloud_storage_segments_pending_deletion

Total number of segments pending deletion from the cloud for the topic.

*Type*: gauge

=== redpanda_cloud_storage_spillover_manifest_uploads_total

Successful spillover manifest uploads.

*Type*: counter

=== redpanda_cloud_storage_spillover_manifests_materialized_bytes

Number of partition reader instances, based on the number of current fetch/timequery requests reading from Tiered Storage.
Bytes of memory used for spilled manifests currently cached in memory.

*Type*: gauge

=== redpanda_cloud_storage_spillover_manifests_materialized_count

How many spilled manifests are currently cached in memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
How many spilled manifests are currently cached in memory.
Number of spilled manifests that are currently cached in memory.


*Type*: gauge

=== redpanda_cloud_storage_uploaded_bytes

Total number of uploaded bytes for the topic.

*Type*: counter

== Related topics

Expand Down
Loading