Skip to content

Remove Stats datastructure in favour of OTel #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rubvs
Copy link

@rubvs rubvs commented May 2, 2025

Closes #246

Remove the Stats datastructure and leverage the current OTel implementation.

Updated Unit Tests

  • TestAppender
  • TestAppenderRetry
  • TestAppenderAvailableAppenders
  • TestAppenderCompressionLevel
  • TestAppenderFlushRequestError
  • TestAppenderCloseInterruptAdd
  • TestAppenderCloseBusyIndexer
  • TestAppenderScaling

@elastic-observability-automation elastic-observability-automation bot added the safe-to-test Automated label for running bench-diff on forked PRs label May 2, 2025
@rubvs rubvs marked this pull request as ready for review May 5, 2025 17:43
@rubvs rubvs requested a review from a team as a code owner May 5, 2025 17:43
@rubvs rubvs requested a review from kruskall May 5, 2025 17:44
@rubvs rubvs force-pushed the remove-stats-datastructure branch 4 times, most recently from 525b8b3 to 1141f1c Compare May 5, 2025 18:22
@rubvs rubvs force-pushed the remove-stats-datastructure branch from 1141f1c to c6f2f06 Compare May 5, 2025 18:36
1pkg
1pkg previously approved these changes May 5, 2025
Copy link
Member

@1pkg 1pkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes looks good, thank you!

Albeit OTel metrics assertions are quite verbose and repetitive, it would be nice to sugar it up a bit to reduce the tests code. I would recommend to look into expanding docappendertest.AssertOTelMetrics helper to accept a metrics assertion map.

docappendertest.AssertOTelMetricsWithMap(t, rm.ScopeMetrics[0].Metrics, map[string]int64{
	"elasticsearch.bulk_requests.count": 1,
	"elasticsearch.indexer.created": 2,
	"elasticsearch.indexer.destroyed": 3,
})

With such helper most if not all of the tests could be simmered down a bit.

@rubvs rubvs marked this pull request as draft May 11, 2025 18:52
@rubvs rubvs force-pushed the remove-stats-datastructure branch 7 times, most recently from b450780 to 89c79b2 Compare May 11, 2025 23:19
@rubvs rubvs force-pushed the remove-stats-datastructure branch from 89c79b2 to e650c89 Compare May 16, 2025 21:04
@rubvs
Copy link
Author

rubvs commented May 16, 2025

Finally the CI passes. This TestAppenderRetry needs to be rethought. How did we get to 750 here, see Code? cc @kruskall

@rubvs rubvs force-pushed the remove-stats-datastructure branch from e2de81c to f48cdad Compare May 22, 2025 20:37
@rubvs rubvs force-pushed the remove-stats-datastructure branch from f48cdad to bd8e118 Compare May 22, 2025 20:47
@rubvs rubvs marked this pull request as ready for review May 23, 2025 02:38
@rubvs rubvs requested review from kruskall and 1pkg May 23, 2025 14:18
const N = 10
for i := 0; i < N; i++ {
addMinimalDoc(t, indexer, "logs-foo-testing")
}
<-time.After(1 * time.Second)
Copy link
Author

@rubvs rubvs May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Reviewers: I have created this ticket to improve this wait using a channel: #253

}
if status != "" {
a.addCount(int64(n), legacy, a.metrics.docsIndexed,
metric.WithAttributes(attribute.String("status", status), semconv.HTTPResponseStatusCode(errFailed.statusCode)),
a.tooManyRequests.Add(legacy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think this is updating the wrong metric. It should only update toomanyrequests in the switch case for tooMany

Copy link

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of smaller questions in the first review.

@@ -274,7 +268,7 @@ func TestAppenderRetry(t *testing.T) {

indexer, err := docappender.New(client, docappender.Config{
FlushInterval: time.Minute,
FlushBytes: 750, // this is enough to flush after 9 documents
FlushBytes: 800, // this is enough to flush after 9 documents
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this value change?

@@ -288,79 +282,74 @@ func TestAppenderRetry(t *testing.T) {
for i := 0; i < N; i++ {
addMinimalDoc(t, indexer, "logs-foo-testing")
}
<-time.After(1 * time.Second)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 1 second enough? it used to be 2 seconds..

}
if status != "" {
a.addCount(int64(n), legacy, a.metrics.docsIndexed,
metric.WithAttributes(attribute.String("status", status), semconv.HTTPResponseStatusCode(errFailed.statusCode)),
a.tooManyRequests.Add(legacy)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add this in the errFailed.tooMany case?


// Collect metrics before flushing.
var rm metricdata.ResourceMetrics
assert.NoError(t, rdr.Collect(context.Background(), &rm))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.NoError(t, rdr.Collect(context.Background(), &rm))
require.NoError(t, rdr.Collect(context.Background(), &rm))

case "elasticsearch.events.count":
assertCounter(m, N, indexerAttrs)
case "elasticsearch.events.queued":
assertCounter(m, N, indexerAttrs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this test tested for docs.active - why is this not necessary anymore?

assert.True(t, exist)
switch status.AsString() {
case "Success":
assert.Equal(t, int64(8), dp.Value)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this number from?

BulkRequests: 2,
Failed: failed,
FailedClient: 1,
FailedServer: 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these checked now?

@@ -92,7 +92,7 @@ func TestAppender(t *testing.T) {

rdr := sdkmetric.NewManualReader(sdkmetric.WithTemporalitySelector(
func(ik sdkmetric.InstrumentKind) metricdata.Temporality {
return metricdata.DeltaTemporality
return metricdata.CumulativeTemporality
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this change to Cumulative?

BulkRequests: 1,
Failed: failed,
FailedClient: 1,
FailedServer: 1,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the failed events checked now? Haven't seen them below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe-to-test Automated label for running bench-diff on forked PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove legacy stats
4 participants