Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions from send, and expected behaviors on closed connections. #66

Closed
tomchristie opened this issue Sep 26, 2018 · 49 comments · Fixed by #431
Closed

Exceptions from send, and expected behaviors on closed connections. #66

tomchristie opened this issue Sep 26, 2018 · 49 comments · Fixed by #431

Comments

@tomchristie
Copy link
Member

tomchristie commented Sep 26, 2018

Before digging into the meat of this topic I want to start with this assumption:

  • Exceptions from send and receive should be treated as opaque, and should be allowed to bubble back up to the server in order to be logged. They may be intercepted by logging or 500 technical response middlewares, and subsequently re-raised, but applications shouldn't have any expectations about the type that may be raised, since these will differ from implementation to implementation.

That's pretty much how the spec is currently worded, and I think that's all as it should be. An alternative design could choose to have exceptions be used in a limited capacity to communicate information back to the application, but I don't think that's likely to be desirable. An exception raised by the server in send or receive is a black box indicating an error condition with an associated traceback, and we shouldn't expect to catch or handle particular subclasses in differing ways.

The core question in this issue is: 1. What expectations (if any) should the application have about the behavior when it makes a send() to a closed connection? 2. What behavior is most desirable from servers in that case?

There are three possible cases for the application here:

  • The application should expect an exception to be raised, indicating that the connection is closed.
  • The application should expect that no exception be raised.
  • An exception may or may not be raised, but the application cannot rely on either behavior.

In the case of HTTP, we've adopted "Note messages received by a server after the connection has been closed are not considered errors. In this case the send awaitable callable should act as a no-op." See #49

I don't recall where the conversation that lead to #49 is, but I think it's a reasonable behaviour because premature HTTP client disconnects once receiving the start of the response are valid behaviour, and not an error condition. If we raise an exception in that case then we end up with lots of erroneous exceptions being raised in response to perfectly valid server and client behavior.

If we're happy with that decision then there's two questions that it leads on to:

  1. What do we expect to happen in long-polling / SSE connections if send doesn't raise an exception once the connection has been closed?
  2. What should we expect in the case of websockets?

For the case of (1) I think that #49 means we have to have the expectation that the disconnect is communicated through the receive disconnect message. No exception should be raised from the server on send to the closed connection because that'd pollute server logs with false errors in the standard case, and we can't differentiate between long-polling connections and regular HTTP requests/responses.

Mature server implementations probably would want to enforce a timeout limit on the task once the connection has been closed, and that is the guardrail against applications that fail to properly listen for and act on the disconnect.

For the case of (2) it's less obvious. We could perfectly well raise exceptions in response to send on a disconnected connection. (Which I assume is what all current implementations do.) The issue here is that we'd be raising logged error conditions on what is actually perfectly correct behavior... there's no guarantee that an application will have yet "seen" the disconnect message at the point it sends any given outgoing message.

The upshot of all this is that I think the following would be the right sort of recommendations to make:

  • Applications must not assume that an exception will necessarily be raised when sending to a closed connection, or rely on the server doing so. They should listen to a disconnect message and terminate the task when that occurs.
  • Mature server implementations should prefer not to raise exceptions when a send occurs against a closed websocket, but should instead ignore the message, ensure that disconnect is sent on the "receive" channel, and should forcibly kill the task if it has not completed after a reasonable timeout. Server implementations that do raise exceptions in response to send on a closed channel are valid, but may end up falsely logging server errors in response to standard client disconnects.

Anyways, thorny issue, and more than happy to talk this through more, but I'm moderately sure that's the conclusion I'd come to.

@tomchristie
Copy link
Member Author

Prompted by encode/starlette#56 and encode/starlette#58 (comment)

@andrewgodwin
Copy link
Member

I totally agree that it's just going to happen, due to the async nature of the code in question, that you're going to send just after the socket is disconnected in any case - even if you have a perfect disconnect handling loop. I tell client code writers to write code that expects disconnects at any time, we should be thinking the same way.

My questioning of the problem before was around safety - is it better for us to suppress those errors and maybe let bugs go uncaught? Initially I was erring on the side of not suppressing errors, which tends to be my default, but now I think I would agree that if we didn't suppress them, people would just see these errors randomly and they'd become ignored noise, which is perhaps even more dangerous.

Given that, I'm more than happy to enshrine in the spec that "send should not raise errors if used after disconnect". The only exception I might carve out is if the server sent a disconnect packet and then sent something else, but that's more a nice thing that mature servers could maybe raise a warning over rather than an edge case I want to specify out.

@tomchristie
Copy link
Member Author

tomchristie commented Sep 28, 2018

The only exception I might carve out is if the server sent a disconnect packet and then sent something else

Oh absolutely yes. Sending invalid state transitions should raise errors.

if we didn't suppress them, people would just see these errors randomly and they'd become ignored noise

Good point, yes.

I don't think we necessarily need to strongarm "servers must not do this", more that "server implementations should prefer not to do this, and instead prefer to guard against poor app behavior with an enforced timeout after disconnect" (A server implementation could validly raise an error if tasks fail to terminate within the timeframe after a disconnect.)

At some point I'll aim to have a little dig into daphne, hypercorn, and uvicorn's current behavior here, since I doubt it's in line with this proposal. (Paging @pgjones)

@andrewgodwin
Copy link
Member

OK, then I think we're agreed. I think this belongs in the www spec rather than the core one - if we're agreed, I'll get it added in?

@Kludex
Copy link
Contributor

Kludex commented Sep 3, 2023

I've reraised the same concern 5 years later on #405... 🤣

The proposed agreement here doesn't satisfy the case the user doesn't even call receive:

import asyncio

from websockets import exceptions as ex
from wsproto.utilities import LocalProtocolError


async def app(scope, receive, send):
    await send({"type": "websocket.accept"})
    try:
        while True:
            await send({"type": "websocket.send", "text": "potato"})
            await asyncio.sleep(1)
    except ex.WebSocketException:
        print("Server exception leaked to application.")
    except LocalProtocolError:
        print("Server exception leaked to application.")

I've proposed an alternative approach on #405 (comment), but I'm happy with any solution that doesn't leak the server exception to the application - I don't want the ASGI app to try to catch exceptions that are related to the server.

@andrewgodwin
Copy link
Member

My position remains the same as it was on #405 - in cases like this, there should be a single kind of exception that the server may raise if the send() call cannot be done for some reason, and we can collapse any internal server errors down to those.

The key thing I want to preserve is that there is enough debugging information in that error (e.g. in its first message argument) so that a user can work out what's going on.

@Kludex
Copy link
Contributor

Kludex commented Sep 3, 2023

You didn't comment about what I proposed (which seems aligned with what you said, right?).

What are your thoughts about it?

@andrewgodwin
Copy link
Member

Oh, I missed that one in the flurry of comments. I don't think an extension for this makes sense; there's only so far we can push things like this.

@Kludex
Copy link
Contributor

Kludex commented Sep 3, 2023

Well, it doesn't need to be an extension. I just think it makes sense for the web application to be ready to receive the exception, and then raise something more meaningful.

@Kludex
Copy link
Contributor

Kludex commented Sep 3, 2023

What if it's a new field in the scope?

@andrewgodwin
Copy link
Member

I still don't see a particular advantage over just always saying it must subclass IOError - it's a builtin exception class that fits (you're doing IO) and is unlikely to be used by web applications at that point for anything else.

@Kludex
Copy link
Contributor

Kludex commented Sep 4, 2023

Ok. It solves my needs.

I can change the documentation to reflect this decision. Can I?

Is this considered a breaking change or a bug fix in terms of the specification?

@andrewgodwin
Copy link
Member

Of course, please submit a PR and I'll review it.

It's probably worthy of a minor version update, since it's not something that clients/servers can entirely ignore, but we're also not introducing something that will break things in new ways.

@Kludex
Copy link
Contributor

Kludex commented Sep 4, 2023

Thanks. I'm a bit full in the next weeks, but I don't think there's much hurry here.

I'll get back to it in November. :)

Also, I never said this, but I really appreciate our interactions here @andrewgodwin 😁🙏

@andrewgodwin
Copy link
Member

Yes, and I appreciate you helping me work through this!

@adriangb
Copy link
Contributor

adriangb commented Nov 8, 2023

Let me try to summarize the discussion and see if I understood it correctly.

We are going to make the server raise an error if send() is called after the client's disconnected. The error raised should be a subclass of IOError but we don't specify any behavior other than that. In particular, we're not specifying a concrete subclass of IOError because we don't want to make every server depend on asgiref.

A couple of thoughts:

  • If backwards compatibility is a concern we could make the server only raise an error if some key is included in send(), e.g. send({..., "raise_exc_on_disconnect": True}). We could even make it the default in some future version of the spec so we don't have the key hanging around forever.
  • We could specify the behavior we want from the raised exception via a typing.Protocol, e.g. a cause() -> Literal[...] method. That way the server can transmit information to the client without necessitating a concrete subclass to be the API contract. That is, asgiref would provide a typing.Protocol and the server would implement it in it's IOError subclass.

@andrewgodwin
Copy link
Member

I don't think I'm that concerned about backwards compat in this case - the behaviour here is already undefined depending on the server, and I think making that into an explicit error is an improvement.

As for the Protocol - sure, but I'm not sure that's a blocker either way. asgiref has always been a pain to type as it's so dynamic.

@Kludex
Copy link
Contributor

Kludex commented Nov 18, 2023

@andrewgodwin We need to come back to this discussion.

I don't think the IOError applies for HTTP, only for WebSockets. Since HTTP is already covered by http.disconnect. Do you disagree?

@andrewgodwin
Copy link
Member

An application could, in theory, still try to send() after a client disconnection has happened, so I think it can happen with HTTP as well?

@Kludex
Copy link
Contributor

Kludex commented Nov 18, 2023

Yes, you are correct, but that case is already handled by the spec, and it was agreed to be a no-op. It's mentioned in the description of this issue.

This was the PR that included those words: #49

@pgjones If you have time, could you give me your opinion here?

@Kludex
Copy link
Contributor

Kludex commented Nov 19, 2023

@andrewgodwin Are you disagreeing with your previous message: #66 (comment) ?

I'm trying to understand this specific part:

Given that, I'm more than happy to enshrine in the spec that "send should not raise errors if used after disconnect". The only exception I might carve out is if the server sent a disconnect packet and then sent something else, but that's more a nice thing that mature servers could maybe raise a warning over rather than an edge case I want to specify out.

@andrewgodwin
Copy link
Member

To be fair, I wrote that previous message five years ago; in the time since, I have now come to believe that explicit errors in this case could be good, but if we make servers start raising them where they previously did not, that might be considered backwards incompatible, which is why I only want to raise them if the behaviour was undefined enough that not a lot of code was relying on this.

That's a lot easier with, say, websocket code due to the much smaller surface area of who is writing it. Web handling code is trickier, since there's increasing numbers of ASGI-compatible web frameworks (which is great!). From the parts of those that I've seen, raising an error if you try to send after a HTTP disconnection might just be annoying (much like the "client already closed connection" errors you get from intermediary servers with this problem already), which is why I am floating the point that it totally can happen that you can send() after a HTTP disconnect, and that I'm not sure about how I feel that HTTP and websocket would behave differently in this case.

@adriangb
Copy link
Contributor

I think that error would be annoying if it gets to the end user, but it doesn't necessarily need to. If it's part of the spec the ASGI application can catch the error and simply stop streaming the data / doing the work without raising an error.

I do see the point that using the error as a mode of communication between the asgi server and application when the event is part of normal operation and doesn't require a user's intervention or attention is not a good use of exceptions

I think the right thing to do would be for send() to return a SendResult or something that the application could voluntarily check to see if send() has become a no-op but that would require everything in the chain of server -> application to comply with this new API and hence will be harder to get working than an exception. But maybe a backward-compatible albeit longer-term solution is to implement that API and say that servers and middleware SHOULD return and forward a SendResult (effectively immediately making every implementation be slightly out of spec) and let applications that expect a SendResult handle the case where None is returned. Then next time we ratchet up the major version that SHOULD become a MUST and hopefully by that point it won't be a problem.

@andrewgodwin
Copy link
Member

It would never get to the end user as, by its nature, they are disconnected at that point - it would just clutter up server error tracking (which it would then stop doing once people wrote apps to handle it, but you get my point).

SendResult is an interesting idea - it certainly preserves the async idea of returning futures - but we could also just return a Future itself if we wanted to go that route I think?

Also, at the end of the day I want pragmatism rather than perfection, and I think returning an error on websocket send after closure is much more doable in a back-compat way (because it's something you should be expecting to handle anyway) - doing it on HTTP would be nice, but maybe something for a future spec version.

@adriangb
Copy link
Contributor

it would just clutter up server error tracking (which it would then stop doing once people wrote apps to handle it, but you get my point)

right by "end users" I was referring to users of asgi frameworks. and as you say once applications handle it there won't be cluttering of server error logs. I don't think that would be a problem though: getting errors would require you to update your asgi server without updating your asgi web framework. People tend to need to update their web framework more often than their server (to get new features).

SendResult is an interesting idea - it certainly preserves the async idea of returning futures - but we could also just return a Future itself if we wanted to go that route I think?

I'm not sure I'm understanding. I envisioned SendResult as something like {'disconnected': False} a TypedDict with some information (and maybe metrics?) about the sending of the data.

@andrewgodwin
Copy link
Member

Well I am proposing that we instead say you can return a future that says if it's sent or not - servers can immediately set_result with true/false if they want to, or we then have the option where they can do it a little later if there's queues to flush.

@adriangb
Copy link
Contributor

So you're proposing send(Message) -> SomeAwaitable?

@Kludex
Copy link
Contributor

Kludex commented Nov 30, 2023

Also, at the end of the day I want pragmatism rather than perfection, and I think returning an error on websocket send after closure is much more doable in a back-compat way (because it's something you should be expecting to handle anyway) - doing it on HTTP would be nice, but maybe something for a future spec version.

I'll take this part, and create a PR.

@adriangb
Copy link
Contributor

@andrewgodwin we can make the HTTP version non-breaking if the server raises and catches the exception. That is, given any of these scenarios:

  • Server updated, framework not -> server raises exception, bubbles up through framework, server catches it and does what it currently does (Uvicorn emits a trace level log). In fact this is better than the current status quo because you only get 1 trace log while currently, assuming the framework is not also listening on rcv() for a disconnect, you get as many as the framework calls send()
  • Server not updated, framework is. Nothing happens, the exception is never raised.
  • Server updated, framework updated. The framework can catch the exception and stop sending data or take any other action it wants to. This will be the end state.

@andrewgodwin
Copy link
Member

Ah yes, that is an interesting take on it - of course, it does bring us back to the problem of "which exception", since we can't have servers catch and squash anything except a unique exception, but we don't want to make them depend on asgiref...

@adriangb
Copy link
Contributor

How about we make it a protocol of some sort? Say we add an is_asgi_disconnected -> bool property? Does the server even need to know that a disconnect happened? As in, if you call send() and get some generic IOError what action would the application take that differs from what it would do if it knows the client disconnected? I’d imagine that in either case it’d want to do a somewhat graceful stop (maybe cancelling some task), but I don’t see how it would handle them differently. The only difference is that the client disconnecting is much more likely than other sources of errors, so as long as we say “when the client disconnects you’ll get an IOError so you should handle it” we should be okay.

@andrewgodwin
Copy link
Member

It's always been my personal view that you can never assume that a send has happened even if the server thinks it has written bytes to a socket (as the client could disconnect while they're in transit, for example), and so disconnects aren't actually important as any sufficiently well-written protocol should be able to handle random disconnects at any point anyway, and not be saving important per-connection state; that's why I didn't add it to the protocol in the first place.

People seem to want it, though, so it's those use-cases we have to consider in terms of what would work best and what wouldn't. I don't exactly know what the work that people would want to do after receiving a disconnection is?

@adriangb
Copy link
Contributor

adriangb commented Dec 12, 2023

Fair points! Do you feel we need to look into what actions people are taking or want to take, or go with the assumption that since raising the exception gives you the same control as the disconnect message it's valuable as is? Not that we have to remove the disconnect message, we can have both.

Given that we've figured out how to make it a non-breaking change, @Kludex maybe we can start by implementing it in Uvicorn and handling it in Starlette?

@andrewgodwin
Copy link
Member

I don't really know if we can get actually good user research on this, so I'm OK going with the backwards compatible solution if we can correctly specify what the exception should be. I'd suggest "a unique subclass of IOError specified by the server", so that user applications can catch IOError but the server can catch its own specific subclass.

@adriangb
Copy link
Contributor

That sounds good to me!

@adriangb
Copy link
Contributor

adriangb commented Jan 7, 2024

@carltongibson
Copy link
Member

Looking at the linked issues, the spec is pretty clear that the server should wait for the more_body=False flag before beginning to process the request. At that point you can listen to receive() for http.disconnect to know to stop a long running task.

The concerns with consuming the body (potentially unnecessarily, but then why send it?) would be better addressed by allowing the server to pass the request body as (say) a file descriptor, always in a single message, always with more_body=False -- or something in that direction -- rather than introducing a breaking change around http.disconnect (as proposed in #431)

@adriangb
Copy link
Contributor

adriangb commented Jan 8, 2024

Looking at the linked issues, the spec is pretty clear that the server should wait for the more_body=False flag before beginning to process the request

That's not clear to me, could you point out where it says that? And do you mean the framework should wait? It's the server that sets more_body=False so I don't see how the server would wait for that. There's a lot of use cases for streaming a request into something. I feel it would be constrictive to say that any ASGI framework should buffer the entire request into memory before processing it.

There's a trivial use case which users doing a hello world tutorial encounter: a streaming echo endpoint. Currently, that's impossible to write with Starlette because streaming a response listens for a disconnect (so that you stop streaming a response / doing work if the client disconnects) leading to a race condition in reading the request body.

@carltongibson
Copy link
Member

Sure. HTTP spec, receive event docs:

... the body message serves as a way to stream large incoming HTTP bodies in chunks, and as a trigger to actually run request code (as you should not trigger on a connection opening alone).

more_body (bool) – Signifies if there is additional content to come (as part of a Request message). If True, the consuming application should wait until it gets a chunk with this set to False. If False, the request is complete and should be processed.

@adriangb
Copy link
Contributor

adriangb commented Jan 8, 2024

I'm not convinced that's the intention of that paragraph. IMO the wording is not super clear. And if that is what it means, isn't that very limiting? Are we saying that ASGI doesn't support chunk-by-chunk processing of a streaming request?

@carltongibson
Copy link
Member

Well... I think it is pretty clear. It says what it does. (But I don't want to argue that.)

What (I think) we need is (to repeat from my earlier comment):

… better addressed by allowing the server to pass the request body as (say) a file descriptor, always in a single message, always with more_body=False -- or something in that direction

Having to consume the whole body, whether you want it or not isn't ideal, so give me a stream object I can consume on demand. I would prefer that, yes.

Once the body is handled, any other message I get on receive is known to be http.disconnect, so listening for that whilst performing long-running tasks is no problem. (It would be a concern for me if we broke that without exploring other avenues first.)

This thread supposedly concerns send. I'm somewhat curious how it suddenly resulted in a Deprecate a receive message suggestion, but yeah... 🤔

@adriangb
Copy link
Contributor

adriangb commented Jan 8, 2024

This thread supposedly concerns send. I'm somewhat curious how it suddenly resulted in a Deprecate a receive message suggestion, but yeah... 🤔

As per the very long discussion in this thread it seems like the only backward compatible (as far as users are concerned) way to get disconnect information in send() as opposed to receive() is to raise an exception from send() that the server itself can catch. It would seem then that it makes sense to do the same in receive() to keep them symmetrical and the spec simpler.

Having to consume the whole body, whether you want it or not isn't ideal, so give me a stream object I can consume on demand. I would prefer that, yes.

Agreed, I think where we disagree is that I believe that was the intention of the current spec / receive() giving you pieces instead of an already buffered body.

better addressed by allowing the server to pass the request body as (say) a file descriptor, always in a single message, always with more_body=False -- or something in that direction

This seems like a lot more code churn for everyone and I don't see how it'd be backward compatible. I'm not opposed to it but I'd like to see a concrete implementation or proposal.

@carltongibson
Copy link
Member

carltongibson commented Jan 8, 2024

Yes, I've been following the thread for quite some time.

My semi-idea is simply that the first http.request message could have (e.g.) a body_file file descriptor (any filelike), and then more_body=False. It would be about three/four lines in Django to use that if it were there to populate the request object (in addition to allowing the existing flow otherwise).

I don't think orginalist arguments about intentions are that helpful. What we have is the wording of the spec. Removing http.disconnect is not backwards compatible. The suggestion of that breaking change is what I'm objecting to.
From my perspective that's the churn — it would break Django's disconnect handling here.

@andrewgodwin
Copy link
Member

andrewgodwin commented Jan 9, 2024

Even better, we can know the intentions since I wrote most of the spec and I mostly remember my intentions.

I do not agree with removing http.disconnect, as that would be backwards incompatible and that's literally the worst thing we can do to fix anything. Thus, I will veto any change that removes or deprecates it.

Now, I thought we were discussing the specific case of calling send() on a connection which is already closed, and adding a new IOError in that case - that is something I am more supportive of, since it is not really backwards incompatible (yes, it introduces a new possible error, but in a situation where you were already losing data and didn't know about it)

@adriangb
Copy link
Contributor

adriangb commented Jan 9, 2024

I do not agree with removing http.disconnect, as that would be backwards incompatible and that's literally the worst thing we can do to fix anything. Thus, I will veto any change that removes or deprecates it.

Ok no worries, I was proposing removing it in the next major version (which may never come) since it seems to become redundant. I'll remove that wording from the PR.

Even better, we can know the intentions since I wrote most of the spec and I mostly remember my intentions.

I guess it's worth clarifying for the record then: is the intention of the spec that you must buffer a request body in memory entirely before you begin to process it?

@andrewgodwin
Copy link
Member

I guess it's worth clarifying for the record then: is the intention of the spec that you must buffer a request body in memory entirely before you begin to process it?

I had hoped to avoid this being necessary, yes - this is why the request object has more_body in the first place rather than being a giant message, as I wanted to avoid exposing the raw file descriptor as the WSGI workaround did. That said, it obviously hasn't come out perfectly in practice; what we have now lets you stream the incoming data out to a temporary file on disk as you read it, though, which is something.

@carltongibson
Copy link
Member

carltongibson commented Jan 9, 2024

Hey @andrewgodwin.

I wanted to avoid exposing the raw file descriptor as the WSGI workaround did.

That's kind of the semi-idea that I sort of semi-want. ("Semi" here really meaning, I haven't looked into it in sufficient depth.) Could you perhaps outline what the issues are there? (I don't know them.) Thanks 🙏

(There's the whole multiple requests reading from the same input stream thing, leading to the need to ensure requests can only read their data, but I assume there'd be a way of avoiding that... 🤔)

@andrewgodwin
Copy link
Member

Well, the problem with that originally was that I wanted the entire body of asgi messages to be JSON-safe, as I had intended to do a process model where the process terminating the sockets was separate to the one running Django/etc.

I think that's still a valuable property to try and maintain - it means you can prefork and have warm worker processes more easily, for example.

@Kludex
Copy link
Contributor

Kludex commented Feb 15, 2024

We need to update the spec. It should be OSError, and not IOError.

Screenshot 2024-02-15 at 10 57 53

There's also ConnectionError: https://docs.python.org/3/library/exceptions.html#ConnectionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants