Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-746 update redpanda_cloud_storage metrics #906

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 200 additions & 78 deletions modules/reference/pages/public-metrics-reference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -969,182 +969,304 @@ The number of transform processors in a specific state (running, inactive, error

== Cloud storage metrics

include::reference:partial$public_metrics_tip.adoc[]

ifndef::env-cloud[]
NOTE: Cloud storage metrics are only available if you have:

- xref:manage:tiered-storage.adoc[] enabled
- The cluster property xref:reference:properties/object-storage-properties.adoc#cloud_storage_enabled[cloud_storage_enabled] set to `true`
[NOTE]
====
Cloud storage metrics are only available if you have:

* xref:manage:tiered-storage.adoc[Tiered Storage] enabled
* The cluster property xref:reference:properties/object-storage-properties.adoc#cloud_storage_enabled[cloud_storage_enabled] set to `true`
====
endif::[]

=== redpanda_cloud_storage_cache_space_size_bytes
=== redpanda_cloud_storage_active_segments

Sum of size of cached objects.
Number of remote log segments currently hydrated for read.

=== redpanda_cloud_storage_housekeeping_drains
*Type*: gauge

Number of times upload housekeeping queue was drained.
=== redpanda_cloud_storage_anomalies

=== redpanda_cloud_storage_spillover_manifests_materialized_bytes
Number of missing partition manifest anomalies for the topic.

Bytes of memory used for spilled manifests currently cached in memory.
*Type*: gauge

=== redpanda_cloud_storage_cache_op_hit

Number of get requests for objects that are already in cache.

*Type*: counter

=== redpanda_cloud_storage_cache_op_in_progress_files

Number of files that are being put into cache.

*Type*: gauge

=== redpanda_cloud_storage_cache_op_miss

Number of failed get requests due to missing object in the cache.

*Type*: counter

=== redpanda_cloud_storage_cache_op_put

Number of objects written into cache.

=== redpanda_cloud_storage_segments
*Type*: counter

Total number of accounted topic segments in the cloud.
=== redpanda_cloud_storage_cache_space_files

=== redpanda_cloud_storage_jobs_local_segment_reuploads
Number of objects in cache.

Number of segment reuploads from local data directory.
*Type*: gauge

=== redpanda_cloud_storage_cache_trim_failed_trims
=== redpanda_cloud_storage_cache_space_hwm_files

High watermark for number of objects in cache.

*Type*: gauge

=== redpanda_cloud_storage_cache_space_hwm_size_bytes

High watermark for sum of size of cached objects.

*Type*: gauge

=== redpanda_cloud_storage_cache_space_size_bytes

Sum of size of cached objects.

*Type*: gauge

=== redpanda_cloud_storage_cache_space_tracker_size

Number of entries in cache access tracker.

*Type*: gauge

Number of times Redpanda could not free the expected amount of space, indicating possible bug or configuration issue.
=== redpanda_cloud_storage_cache_space_tracker_syncs

Number of times the access tracker has been updated with cache disk data.

*Type*: counter

=== redpanda_cloud_storage_cache_trim_carryover_trims

Number of times carryover trim has been invoked.

*Type*: counter

=== redpanda_cloud_storage_cache_trim_exhaustive_trims

Number of times a fast trim could not free enough space and had to fall back to a slower exhaustive trim.
Number of times sufficient space could not be accommodated with a fast trim and had to fall back to a slower exhaustive trim.

*Type*: counter

=== redpanda_cloud_storage_cache_trim_failed_trims

Number of times the expected amount of space could not be freed up, indicating a possible bug or configuration issue.

*Type*: counter

=== redpanda_cloud_storage_cache_trim_fast_trims

Number of times the cache has been trimmed using the normal (fast) mode.

*Type*: counter

=== redpanda_cloud_storage_cache_trim_in_mem_trims

Number of times the cache has been trimmed using the in-memory access tracker.

*Type*: counter

=== redpanda_cloud_storage_cloud_log_size

Total size in bytes of the user-visible log for the topic.

*Type*: gauge

=== redpanda_cloud_storage_deleted_segments

Count of deleted remote segments.
Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non-compacted segments being replaced with their compacted equivalent.

=== redpanda_cloud_storage_segment_uploads_total
*Type*: counter

Successful data segment uploads.
=== redpanda_cloud_storage_errors_total

=== redpanda_cloud_storage_active_segments
Number of transmit errors.

Number of remote log segments currently hydrated for read.
*Type*: counter

=== redpanda_cloud_storage_cache_trim_fast_trims
=== redpanda_cloud_storage_housekeeping_drains

Number of times upload housekeeping queue was drained.

*Type*: gauge

=== redpanda_cloud_storage_housekeeping_jobs_completed

Number of times Redpanda trimmed the cache using the normal (fast) mode.
Number of executed housekeeping jobs.

*Type*: counter

=== redpanda_cloud_storage_housekeeping_jobs_failed

Number of failed housekeeping jobs.

=== redpanda_cloud_storage_partition_readers_delayed
*Type*: counter

How many partition readers were delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
=== redpanda_cloud_storage_housekeeping_jobs_skipped

=== redpanda_cloud_storage_segments_pending_deletion
Number of skipped housekeeping jobs.

Total number of topic segments pending deletion from the cloud.
*Type*: counter

=== redpanda_cloud_storage_housekeeping_rounds
=== redpanda_cloud_storage_housekeeping_pauses

Number of upload housekeeping rounds.
Number of times upload housekeeping was paused.

=== redpanda_cloud_storage_segment_readers_delayed
*Type*: gauge

Number of segment readers delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
=== redpanda_cloud_storage_housekeeping_requests_throttled_average_rate

=== redpanda_cloud_storage_cache_space_hwm_size_bytes
Average rate of requests from the read and write path that were throttled by Tiered Storage (per shard).

High watermark of sum of size of cached objects.
*Type*: gauge

=== redpanda_cloud_storage_cache_space_hwm_files
=== redpanda_cloud_storage_housekeeping_resumes

Number of times upload housekeeping was resumed.

High watermark of number of objects in cache.
*Type*: gauge

=== redpanda_cloud_storage_cache_op_in_progress_files
=== redpanda_cloud_storage_housekeeping_rounds

Number of files that are being added to cache.
Number of upload housekeeping rounds.

*Type*: counter

=== redpanda_cloud_storage_jobs_cloud_segment_reuploads

Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage).

*Type*: gauge

=== redpanda_cloud_storage_jobs_local_segment_reuploads

Number of segment reuploads from local data directory.

*Type*: gauge

=== redpanda_cloud_storage_jobs_manifest_reuploads

Number of manifest reuploads performed by all housekeeping jobs.

=== redpanda_cloud_storage_housekeeping_pauses

Number of times upload housekeeping was paused.
*Type*: gauge

=== redpanda_cloud_storage_segment_index_uploads_total
=== redpanda_cloud_storage_jobs_metadata_syncs

Successful segment index uploads.
Number of archival configuration updates performed by all housekeeping jobs.

=== redpanda_cloud_storage_cache_op_miss
*Type*: gauge

Number of failed get requests because of missing object in the cache.
=== redpanda_cloud_storage_jobs_segment_deletions

=== redpanda_cloud_storage_errors_total
Number of segments deleted by all housekeeping jobs.

Number of transmit errors.
*Type*: gauge

=== redpanda_cloud_storage_spillover_manifest_uploads_total
=== redpanda_cloud_storage_limits_downloads_throttled_sum

Successful spillover manifest uploads.
Total amount of time downloads were throttled (ms).

=== redpanda_cloud_storage_housekeeping_requests_throttled_average_rate
*Type*: counter

Average rate per shard of requests from the read and write path that were throttled by Tiered Storage.
=== redpanda_cloud_storage_partition_manifest_uploads_total

=== redpanda_cloud_storage_jobs_segment_deletions
Successful partition manifest uploads.

Number of segments deleted by all housekeeping jobs.
*Type*: counter

=== redpanda_cloud_storage_segment_materializations_delayed
=== redpanda_cloud_storage_partition_readers

Number of segment materializations delayed due to hitting reader limit. This indicates cluster is saturated with Tiered Storage reads.
Number of partition reader instances (number of current fetch/timequery requests reading from Tiered Storage).

=== redpanda_cloud_storage_jobs_metadata_syncs
*Type*: gauge

Number of archival configuration updates performed by all housekeeping jobs.
=== redpanda_cloud_storage_partition_readers_delayed

=== redpanda_cloud_storage_housekeeping_jobs_completed
Number of partition reads that were delayed due to hitting reader limit. This indicates a cluster is saturated with Tiered Storage reads.

Number of successful housekeeping jobs.
*Type*: counter

=== redpanda_cloud_storage_readers

Number of segment read cursors for hydrated remote log segments.

=== redpanda_cloud_storage_partition_manifest_uploads_total
*Type*: gauge

Successful partition manifest uploads.
=== redpanda_cloud_storage_segment_index_uploads_total

=== redpanda_cloud_storage_limits_downloads_throttled_sum
Successful segment index uploads.

Total amount of throttling applied to cloud storage downloads.
*Type*: counter

=== redpanda_cloud_storage_housekeeping_resumes
=== redpanda_cloud_storage_segment_materializations_delayed

Number of times upload housekeeping was resumed.
Number of segment materializations that were delayed due to hitting reader limit. This indicates a cluster is saturated with Tiered Storage reads.

=== redpanda_cloud_storage_cache_op_hit
*Type*: counter

Number of get requests for objects that are already in cache.
=== redpanda_cloud_storage_segment_readers_delayed

=== redpanda_cloud_storage_spillover_manifests_materialized_count
Number of segment readers that were delayed due to hitting reader limit. This indicates a cluster is saturated with Tiered Storage reads.

Number of spilled manifests currently cached in memory.
*Type*: counter

=== redpanda_cloud_storage_uploaded_bytes
=== redpanda_cloud_storage_segment_uploads_total

Total number of uploaded bytes for the topic.
Number of successful data segment uploads.

=== redpanda_cloud_storage_cache_space_files
*Type*: counter

Number of objects in cache.
=== redpanda_cloud_storage_segments

=== redpanda_cloud_storage_housekeeping_jobs_skipped
Total number of accounted segments in the cloud for the topic.

Number of skipped housekeeping jobs.
*Type*: gauge

=== redpanda_cloud_storage_partition_readers
=== redpanda_cloud_storage_segments_pending_deletion

Total number of segments pending deletion from the cloud for the topic.

*Type*: gauge

=== redpanda_cloud_storage_spillover_manifest_uploads_total

Number of partition reader instances, based on the number of current fetch/timequery requests reading from Tiered Storage.
Successful spillover manifest uploads.

*Type*: counter

=== redpanda_cloud_storage_spillover_manifests_materialized_bytes

Bytes of memory used for spilled manifests currently cached in memory.

*Type*: gauge

=== redpanda_cloud_storage_spillover_manifests_materialized_count

Number of spilled manifests that are currently cached in memory.

*Type*: gauge

=== redpanda_cloud_storage_uploaded_bytes

Total number of uploaded bytes for the topic.

*Type*: counter

== Related topics

Expand Down
Loading