Replies: 1 comment 2 replies
-
|
Thanks for reaching out and writing this up @mugli. So yes, we've also identified tailsampling at scale as a great use case for Rotel. We're tracking it internally but hadn't opened a GH issue yet as no one has asked. I've opened one here now for us to track #214. Re: [2] for tailsampling we've batted around the idea of writing it as a pure Rust processor rather than with the Python processor SDK. The processor SDK is very fast (as it's a Rust backed extension) and in many cases much faster than Go processors, however we were thinking in regards to tailsampling at scale, it would be best to provide the most resource efficient and high performance option. Open to discussing this though and even potentially offering two options here. Re: [3] We've definitely considered stateful use cases like this, but have yet to implement. This is a good potential first one we could address. I think it would be great to collaborate on tailsampling and would love to have you PoC it. If you're interested, hop in the Discord https://rotel.dev/discord and we can discuss more this week. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
OTel collector (contrib) is a vast ecosystem, it's probably not worth trying to rewrite all of that for performance reasons. But Rotel could shine on resource constrained envs (like Lambda, you folks are working on that already), and additionally, (here is the pitch), where OTel collectors need to deal with extremely high volume of data. Tailsampling is one of that area. 💡
[1]
We maintain a tailsampling pipeline with OTel collector. It works fine, but at scale a performance focused alternative collector components for tailsampling (loadbalancing exporter and tailsampling processor) could be potentially very interesting.
[2]
Besides performance, here's another interesting take:
Since Rotel is allowing Python based SDKs for creating custom processors, different sampling configuration for tailsampling processor could be very good fit for that, and more flexible.
[3]
Third and a very important reason that could make it worthwhile working on an alternative to official OTel collector is this: open-telemetry/opentelemetry-collector-contrib#33568
OTel maintainers mentioned they have no plans to address this (understandibly because it will require rethinking the loadbalancing exporter architecture).
To summarize the issue, OTel loadbalancing exporters are stateless and don't communicate with each other or with the tailsampler collectors. The loadbalancer works by using consistent hashing on
trace_idto distribute the spans to the backend tailsampling collectors, so that all spans with the sametrace_idreach the same destination.But the unfortunate oversight was that traces from async messaging systems don't always share the same
trace_id, but can use Span Links instead (an example with Kafka). Currently there's no way to ensure if a particular trace is sampled in the OTel tailsampler, all related linked traces will be sampled as well.[4]
And there is more areas for improvement on autoscaling the tailsampling collectors, like open-telemetry/opentelemetry-collector-contrib#36717
All these are hard and interesting problems, currently without any good solutions. Vector seems to be looking into tailsampling use case as well. But they just started adding OTLP source support, so I assume it will be a long road if they handle tailsampling.
I don't know what's the roadmap for Rotel is, it's still early days. I'm happy to be a PoC subject if you folks consider supporting tailsampling. Let me know if you have any questions.
Beta Was this translation helpful? Give feedback.
All reactions