Skip to content

Conversation

@yhabteab
Copy link
Member

This PR changes some of our HTTP handlers to stream responses instead of buffering them in memory, just like how @jschmidt-icinga did for the ObjectQueryHandler in #10516. However, this change breaks also some the fine-grained HTTP error code reporting for the /v1/actions handler. Previously, the actual HTTP status code was chosen from all the individual action results, but since we now stream the response, we can't wait until the end to determine the final status code. Thus, I've changed the handlers to use 202 Accepted as the general status code for these endpoints 1, indicating that the request has been accepted for processing, without guaranteeing that the processing will be completed successfully.

fixes #10544
fixes #10407
fixes #9535

Footnotes

  1. https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/202

@yhabteab yhabteab added enhancement New feature or request area/api REST API labels Sep 11, 2025
@cla-bot cla-bot bot added the cla/signed label Sep 11, 2025
Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The change to return http::status::accepted instead seems reasonable, but shouldn't that be documented somewhere?

I didn't find much referencing the status codes to expect from our API, except the example client code for the various languages. At least those should be adapted. Maybe adding a small paragraph/table detailing which handlers return 200 and which return 202 would be a good idea.

Edit: The "Responses" section needs to be adapted.

@yhabteab
Copy link
Member Author

The change to return http::status::accepted instead seems reasonable, but shouldn't that be documented somewhere?

Not that I know of. Of course, this is kinda a breaking change, so it should definitely be mentioned in the changelog, but I don't think that we have a specific place for documenting changes to the API return codes.

Edit: The "Responses" section needs to be adapted.

Ok!

@jschmidt-icinga
Copy link
Contributor

Edit: The "Responses" section needs to be adapted.

On second thought, I think it is basically fine as it is.

The next section "HTTP Statuses" mentions that everything from 200 to 299 is a success. Maybe a short mention that most handlers return either 200 or 202 in case of success, depending on whether the result is known at the time the Header is sent.

But the example code snippets definitely need to be fixed, currently they all check for Ok/200 specifically on "v1/objects/services", which would fail now. Even though that's easy to notice what's wrong, that'd not be a good documentation experience for the user.

Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified the changes to the golang snippet, but otherwise the documentation changes look great. I also gave each handler a quick test yesterday and everything works as expected.

After looking through the remaining handlers, another one that could be streamed is /v1/templates. The change would be minimal, I think freezing the Array and adding the yc to SendJsonBody() should be enough now. Same for the GET side of /v1/config/packages and /v1/config/stages. Those are probably not strictly necessary, but I'd say about on the same level as /v1/types and /v1/variables and since you did those, you might as well do these.

@yhabteab
Copy link
Member Author

yhabteab commented Sep 12, 2025

Same for the GET side of /v1/config/packages

That handler holds a locked mutex while generating the body, so can't use streaming. Plus, the response is minimal, so it doesn't need any workaround.

@yhabteab yhabteab force-pushed the stream-http-response branch from 128746b to 5a8c46b Compare September 12, 2025 09:16
@yhabteab
Copy link
Member Author

The change would be minimal, I think freezing the Array and adding the yc to SendJsonBody() should be enough now. Same for the GET side of /v1/config/packages and /v1/config/stages.

Done! I've frozen the results array for the /v1/config/packages handler as well, to allow streaming in case the package has a bunch of orphaned stages that could exceed our flush threshold when accumulated.

Comment on lines 80 to 83
ActionsHandler::AuthenticatedApiUser = user;
Defer a ([]() {
ActionsHandler::AuthenticatedApiUser = nullptr;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently is a thread_local variable:

thread_local ApiUser::Ptr ActionsHandler::AuthenticatedApiUser;

As soon as a yielding operation is performed, this function may be resumed on another thread with this variable being wrong.

Copy link
Member Author

@yhabteab yhabteab Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I see only two possible ways to resolve this issue. Either all of the API actions accept an additional ApiUser::Ptr argument or we put that user into the already supported params dict and each action can get the user from that if it needs to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I can also reassign the user within the generator function. I think, that would also work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we put that user into the already supported params dict and each action can get the user from that if it needs to

I'd advise against doing that. Yes, it will work, but mixing trusted information into user-controlled input is something you can easily get wrong.1

all of the API actions accept an additional ApiUser::Ptr argument

I can also reassign the user within the generator function.

Both should work. In general, I'd say if a thread-local variable can be avoided, that's nicer, but if this requires too much refactoring, keeping it is also fine.

Footnotes

  1. Just compare it for example to a webserver that's authenticating the user and wants to pass the authenticated user to an application. If that application is accessed via HTTP, you have to be extremely cautious in your webserver to filter out all headers from user requests that might be misinterpreted by the application. Compare that to (Fast)CGI for example where you can distinguish clearly between user headers and additional variables from the webserver (unless it injects them to look just like a header).

@jschmidt-icinga
Copy link
Contributor

jschmidt-icinga commented Sep 12, 2025

Does one of you understand why this test failed? It says AssertServerDisconnected was unsuccessful. I'm generally not especially happy with that code, but it shouldn't cause any problems, 11 seconds is generous as it is. Also this PR touches none of that code and I haven't seen a check fail like that until now.

The exception could be from the HttpServerConnection object outliving the stream/socket object due to the keep_alive in the coroutine. But what is preventing CheckLiveness form initiating the disconnect here?

Edit: It might be because of a clock anomaly where the runner might have adjusted time back a second which would affect HttpServerConnection, but not the test case. Fortunately something like this might be fixed by the changes in #10550 by using std::chrono::steady_clock instead.

@yhabteab
Copy link
Member Author

yhabteab commented Sep 12, 2025

Edit: It might be because of a clock anomaly where the runner might have adjusted time back a second which would affect HttpServerConnection, but not the test case. Fortunately something like this might be fixed by the changes in #10550 by using std::chrono::steady_clock instead.

I think, I have seen the error with #10550 before I changed it to use std::chrono::steady_clock as well and I just retried it without thinking that much about it, so you might be right about this.

@julianbrost
Copy link
Contributor

I think, I have seen the error with #10550 before I changed it to use std::chrono::steady_clock as well and I just retried it without thinking that much about it, so you might be right about this.

Was this also on Windows?

Edit: It might be because of a clock anomaly where the runner might have adjusted time back a second which would affect HttpServerConnection, but not the test case. Fortunately something like this might be fixed by the changes in #10550 by using std::chrono::steady_clock instead.

Sounds plausible, but also like it would take quite some digging to confirm. Anyway, this PR only touches HTTP handlers, which aren't even accessed by that test case (it only creates a idle connection), so really nothing to do here in this PR. Either #10550 just fixes it and we're happy, or if it happens again, we might want to take a closer look.

@yhabteab yhabteab force-pushed the stream-http-response branch from e6d91b2 to 93421a8 Compare September 15, 2025 09:50
@yhabteab
Copy link
Member Author

Rebased and addressed all the requested changes:

  • 68deaaf refactors the existing ValueGenerator class and makes it templated, so that it can be used with any std container.
  • 3f54253 removes the ActionsHandler::AuthenticatedApiUser and passes the api user as param to the actions instead.

@yhabteab yhabteab requested review from jschmidt-icinga and julianbrost and removed request for jschmidt-icinga September 15, 2025 10:11
@yhabteab
Copy link
Member Author

I think, I have seen the error with #10550 before I changed it to use std::chrono::steady_clock as well and I just retried it without thinking that much about it, so you might be right about this.

Was this also on Windows?

Edit: It might be because of a clock anomaly where the runner might have adjusted time back a second which would affect HttpServerConnection, but not the test case. Fortunately something like this might be fixed by the changes in #10550 by using std::chrono::steady_clock instead.

Sounds plausible, but also like it would take quite some digging to confirm. Anyway, this PR only touches HTTP handlers, which aren't even accessed by that test case (it only creates a idle connection), so really nothing to do here in this PR. Either #10550 just fixes it and we're happy, or if it happens again, we might want to take a closer look.

Apparently, Windows has some serious issues. Now, Win64 is failing due to the newly changed timer/invoke intervals. It didn't fail before in the PR that changed the intervals, but now it does. I don't understand this ** OS at all.

Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new ValueGenerator is a good improvement on the interface side without complicating the implementation by much. But I do have one small request about the test cases (see below).

@yhabteab yhabteab force-pushed the stream-http-response branch from 97d1004 to 620949f Compare September 16, 2025 12:05
@yhabteab
Copy link
Member Author

Rebased + addressed the comment from #10554 (comment).

Copy link
Contributor

@jschmidt-icinga jschmidt-icinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. The concerns about the thread_local global were addressed in the best way possible, by removing the global variable. The refactoring to do this is entirely justified in my eyes.

@julianbrost
Copy link
Contributor

FYI: This PR got a merge conflict now.

This commit refactors the ValueGenerator class to be a template that can
work with any container type. Previously, one has to manually take care
of the used container by lazily iterating over it within a lambda. Now,
the `ValueGenerator` class itself takes care of all the iteration,
making it easier to use and less error-prone. The new base `Generator`
class is required to allow the `JsonEncoder` to handle generators in a
type-erased manner.
This is similar to the already existing `HttpUtility::SendJsonBody()`
function but for streaming.
This better reflects the request handling, since the headers are sent
before the request is fully processed. This status code just indicates
that we successfully accepted the request but is not guaranteed that all
the subsequent actions will also be successful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api REST API cla/signed enhancement New feature or request

Projects

None yet

5 participants