Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A89: Backend Service Metric Label #471

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Jan 10, 2025

No description provided.

to have a different value. This is the case for locality as well, and the last
pick's value should be used.

The `grpc.xds.cluster` label will be available on the following per-call
Copy link
Member Author

@ejona86 ejona86 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashykt, Java hard-codes support for locality today. I remember there being a conversation about supporting arbitrary optional labels, and I think we had kicked that can down the road.

I'm envisioning just adding another hard-coded case to the otel module for this.

## Implementation

@ejona86 will immediately implement in gRPC Java. Other languages will follow as
able. The implementation is very quick.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The implementation is very quick" is more rationale than blueprint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an answer to "where are the resources coming from." "It is quick, so doesn't need a ton of planning."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that kind of justification is needed for this document. It feels a little out of place in a specification intended for long-term use. But I wouldn't block the PR on it if you feel like it's needed.

I changed the link name so I wouldn't have to specify the link name, and
then forgot to remove the link name.
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 stars, would read again. :)

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to wait for the requisite two week comment period before merging, and update "Status:".

## Implementation

@ejona86 will immediately implement in gRPC Java. Other languages will follow as
able. The implementation is very quick.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that kind of justification is needed for this document. It feels a little out of place in a specification intended for long-term use. But I wouldn't block the PR on it if you feel like it's needed.

This impacts a lot of the language, but I'll fix that up later.
The value will be communicated to the gRPC OpenTelemetry module by the call
attempt tracer. When an LB policy provides the label value to the tracer it
will do so each pick that the information is available, regardless of the pick's
result. This allows DEADLINE_EXCEEDED and UNAVAILABLE failures to include a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is talking solely about including this information in metrics, right? Do we already include the locality label for DEADLINE_EXCEEDED and UNAVAILABLE? @yashykt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was meaning in metrics. The current locality implementation can't include this information in erring metrics because xds_cluster_impl is what does the addOptionalLabel() and it only knows xds_wrr_locality's locality decision by looking at the subchannel of a successful pick.

I had some discussion with Yash yesterday about some edge cases and improving things post-A75, just to see how he felt. I was going to bring them up in our Friday meeting. (I just invited Yash to it.)

(Preview: Post-A75, have this logic in the leaf cds policy (the first policy under priority) so we get labels for outlier detection failures. Similarly, we could move the locality handling to weighted_target, or a new policy between xds_wrr_locality/weighted_target and the endpoint-picking policy, to add the locality label for failed picks. The question is mostly "do we care.")

@markdroth markdroth changed the title A89: xDS Cluster Metric Label A89: Backend Service Metric Label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants