-
Notifications
You must be signed in to change notification settings - Fork 246
A80: gRPC Metrics for TCP connection #428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me as discussed offline, I'll leave others to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(approved by mistake)
…rics-for-tcp-connection.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buckets for histogram are explicitly specified for request latency in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md. Do you plan to reuse the same buckets? It might be worth specifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, but using buckets for the min_rtt
metric wouldn't offer much significant insights. The value of min_rtt
is bound to the physical length of the path between sender and receiver or the queueing time caused by throttling and load. A step-change in min_rtt
values usually means that traffic is being throttled or experiencing congestion, or has been re-routed through a different path. I think it's better to opt out buckets in this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nanahpang If you do not specify buckets, we will get the default OpenTelemetry buckets which is { 0, 5, 10, 25, 50, 75,100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}. Are there better defaults that we can suggest or do we leave it to the user to figure it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding OpenTelemetry buckets, if you're referring to a plugin used for exporting metrics, that would be a separate concern from the metric types we're defining here. However, I agree that specifying a smaller range for the min_rtt metric buckets is the right approach, especially since we're using seconds as the unit (to align with the open-source standard) while the underlying measurements are usually in microseconds.
|
||
| Name | Type | Unit | Labels | Description | | ||
| ------------- | ----- | ----- | ------- | ----------- | | ||
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nanahpang If you do not specify buckets, we will get the default OpenTelemetry buckets which is { 0, 5, 10, 25, 50, 75,100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}. Are there better defaults that we can suggest or do we leave it to the user to figure it out?
Update proposal with more information
Add the proposal of gRPC TCP per-connection metrics.