[BUG] YACE returning last non-null value for S3 replication metric, instead of null/NaN #1626

notaninterestingusername · 2025-01-16T01:19:59Z

Is there an existing issue for this?

I have searched the existing issues

YACE version

0.58.0

Config file

apiVersion: v1alpha1
sts-region: us-east-1
discovery:
  jobs: 
  - type: s3
    regions:
      - us-east-1
    searchTags:
      - key: customKey
        value: customValue
    roles:
      - roleArn: "aws:role_arn"
    delay: 600
    metrics:
      - name: ReplicationLatency
        statistics:
        - Maximum
        - Minimum
        - Average
        - Sum
        period: 600
        length: 600
      - name: BytesPendingReplication
        statistics:
        - Maximum
        - Minimum
        - Average
        - Sum
        period: 600
        length: 600
      - name: OperationsPendingReplication
        statistics:
        - Maximum
        - Minimum
        - Average
        - Sum
        period: 600
        length: 600
      - name: OperationsFailedReplication
        statistics:
        - Maximum
        - Minimum
        - Average
        - Sum
        - SampleCount
        period: 600
        length: 600

Current Behavior

I'm attempting to scrape S3 replication metrics, and have observed some odd behaviour with one of the metrics, OperationsFailedReplication.

On CloudWatch, this metric presents a data point when there is a replication action (triggered when objects are uploaded to the source bucket, a replication rule is in place, and replication metrics are enabled for the rule), and is null, otherwise. If replication is successful, a zero is produced. If replication fails, a value that equals the number of objects that couldn't be replicated, is produced.

YACE appears to emit a the last non-null value that it encounters. This value only changes at the time of the next replication, when the metric assumes the new non-null value, holding it there until it changes again. This is not desired; we would expect YACE to emit nulls when there isn't a value to report.

On hitting the CloudWatch get-metric-statistics API using an equivalent range (start and end times are set to match YACE's current scraping interval, which is 5 minutes, with a delay of 10 minutes, to match YACE's configs shown above), period, and all the other necessary parameters (metric name, dimension names and values, region, etc.), a []is returned.

Verified that YACE was, in fact, returning the last non-null value by hitting the /metrics endpoint on the container that YACE is running on, and, sure it enough, it returns the last non-null value for the metric.

What could be causing this? How may this be fixed?

Attached are screenshots from the AWS console (UTC) and Grafana (which relies on metrics coming into Prometheus from YACE, UTC-8).

Expected Behavior

Expect YACE to return null/NaN when there is no data point coming in from CloudWatch.

Steps To Reproduce

Run YACE with the configs detailed above
Create two S3 buckets, B1 and B2
Create a replication rule to copy objects from B1 to B2, but use an IAM role with insufficient permissions to do so
Upload objects to B1, which fail to be replicated to B2. Metrics take ~ 5 minutes to appear on CloudWatch

Anything else?

No response

The text was updated successfully, but these errors were encountered:

notaninterestingusername added the bug Something isn't working label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] YACE returning last non-null value for S3 replication metric, instead of null/NaN #1626

[BUG] YACE returning last non-null value for S3 replication metric, instead of null/NaN #1626

notaninterestingusername commented Jan 16, 2025

[BUG] YACE returning last non-null value for S3 replication metric, instead of null/NaN #1626

[BUG] YACE returning last non-null value for S3 replication metric, instead of null/NaN #1626

Comments

notaninterestingusername commented Jan 16, 2025

Is there an existing issue for this?

YACE version

Config file

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?