Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed feign-micrometer exception handling. #2644

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

r-dimitrov
Copy link

The current MR improves exception handling for feign-micrometer clients. Previously, exceptions such as 404, 400, 500, etc., were not marked with error=true in the Observation and thus were not reported in the trace as errors.

The exception declaration has been changed from (FeignException ex) to (Exception ex), as none of the supported clients (e.g., Apache HttpClient, OkHttp) throws FeignException, but rather client-specific exceptions.

The fix is currently in production in my organization, and it has been working effectively.

Fixes #2566

@velo velo enabled auto-merge (squash) November 22, 2024 12:19
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • annotation-error-decoder/pom.xml
    • lines 49-48
  • apt-test-generator/pom.xml
    • lines 62-61
  • benchmark/pom.xml
    • lines 70-69
  • core/pom.xml
    • lines 37-36
  • googlehttpclient/pom.xml
    • lines 54-53
  • hc5/pom.xml
    • lines 67-66
  • httpclient/pom.xml
    • lines 75-74
  • hystrix/pom.xml
    • lines 78-77
  • jackson-jaxb/pom.xml
    • lines 82-83
  • java11/pom.xml
    • lines 50-49
  • jaxb-jakarta/pom.xml
    • lines 34-35
    • lines 55-57
  • jaxb/pom.xml
    • lines 45-49
  • jaxrs2/pom.xml
    • lines 103-102
  • jaxrs3/pom.xml
    • lines 89-88
  • jaxrs4/pom.xml
    • lines 83-82
  • kotlin/pom.xml
    • lines 71-70
  • okhttp/pom.xml
    • lines 59-58
  • pom.xml
    • lines 579-580
  • reactive/pom.xml
    • lines 98-97
  • ribbon/pom.xml
    • lines 83-82
  • soap-jakarta/pom.xml
    • lines 34-35
    • lines 63-62
    • lines 79-81
  • soap/pom.xml
    • lines 55-60
    • lines 70-71

@r-dimitrov r-dimitrov disabled auto-merge November 22, 2024 12:48
@r-dimitrov
Copy link
Author

I've just added some fixes, as there were exceptions when some clients had defined a custom ErrorDecoder. The input stream is consumed, and the decoder throws an exception. This way, we are copying the input stream so it can be used multiple times.

@velo velo enabled auto-merge (squash) December 9, 2024 01:11
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • annotation-error-decoder/pom.xml
    • lines 49-48
  • apt-test-generator/pom.xml
    • lines 62-61
  • benchmark/pom.xml
    • lines 70-69
  • core/pom.xml
    • lines 37-36
  • googlehttpclient/pom.xml
    • lines 54-53
  • hc5/pom.xml
    • lines 67-66
  • httpclient/pom.xml
    • lines 75-74
  • hystrix/pom.xml
    • lines 78-77
  • jackson-jaxb/pom.xml
    • lines 82-83
  • java11/pom.xml
    • lines 50-49
  • jaxb-jakarta/pom.xml
    • lines 34-35
    • lines 55-57
  • jaxb/pom.xml
    • lines 45-49
  • jaxrs2/pom.xml
    • lines 103-102
  • jaxrs3/pom.xml
    • lines 89-88
  • jaxrs4/pom.xml
    • lines 83-82
  • kotlin/pom.xml
    • lines 71-70
  • okhttp/pom.xml
    • lines 59-58
  • pom.xml
    • lines 579-580
  • reactive/pom.xml
    • lines 98-97
  • ribbon/pom.xml
    • lines 83-82
  • soap-jakarta/pom.xml
    • lines 34-35
    • lines 63-62
    • lines 79-81
  • soap/pom.xml
    • lines 55-60
    • lines 70-71

Copy link
Member

@kdavisk6 kdavisk6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we discuss the need to read the request body into memory more?

Comment on lines 63 to 66
if (response.body() != null) {
response =
response.toBuilder().body(Util.toByteArray(response.body().asInputStream())).build();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your intent is to properly flag the request with the appropriate error based on status, reading the response into memory before checking the status code may be problematic in certain scenarios. Particularly in one where a large number of successful responses are monitored with large data sets.

By doing this, we impress upon our users that their applications must be able to stream all data from all tracked operations into memory in order to observe this.

Can you help me understand the exceptions observed when trying to observe a client with a custom ErrorDecoder?

If this must be part of the solution, we should document this assumption and control this behavior with a configurable feature flag, initially off, and allow our users to make the decision whether to take this on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @kdavisk6 , thank you for your comment. I've made the changes to cache exceptions (errors) only, so successful responses are no longer cached.

Regarding the custom ErrorDecoder, the issue is that when the exception response message body is read, the stream is consumed in MicrometerObservationCapability.java, making it unavailable in the ErrorDecoder. This is problematic because, in most cases, we need to read the response again and map it to another object. When the response is cached, we can read the response as many times as needed. For example, this is useful if we have loggers, additional logic in filters, and so on.

To summarize:
We need to read the response body for exceptions (client/server error codes in the response) to mark the span with error details and exception information. If we don't cache the response, we cannot read it in another layer, such as the ErrorDecoder, because it is already consumed for observability purposes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

Reviewing MicrometerObservabilityCapability, I'm do not see it explicitly consuming the Response.body, only wrapping it. Do you have an example where the Response is consumed during metrics processing? I'd like to confirm that this is intentional or not. If not, then we should address that, which should remove the need for this workaround.

If it is intentional, I'm still leery of allowing this without some level of user control. I would like to see some way for users to control this behavior. There may be situations where even the error response may be large enough to cause memory pressure and we should not be pressing this onto our users without their consent.

One way is for us to check the Content-Length on the response and if that is over a certain threshold, we should defer streaming the response into memory. Another is to have a configurable option as part of the MicrometerCapability that explicitly enables response caching.

auto-merge was automatically disabled January 10, 2025 17:01

Head branch was pushed to by a user without write access

@r-dimitrov r-dimitrov requested a review from kdavisk6 January 20, 2025 08:37
Copy link
Member

@kdavisk6 kdavisk6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] MicrometerObservationCapability not reporting timeouts
2 participants