Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe large file chunking as a part of protocol #24

Closed
wronglink opened this issue May 15, 2013 · 11 comments
Closed

Describe large file chunking as a part of protocol #24

wronglink opened this issue May 15, 2013 · 11 comments
Labels

Comments

@wronglink
Copy link

Many frameworks and webservers have predefined request maximum size. Hanlding a huge file upload in one request is not trivial task. The current version of protocol says:

Clients SHOULD send all remaining bytes of a resource in a single PATCH request, but MAY also use multiple small requests for scenarios where this is desirable (e.g. NGINX buffering requests before they reach their backend).

But how can client know what is the maximum size of 1 chunk? In most cases if client tries to send too big request the server would return a 413 error or something like this and client would not know what to do now.

I think that an additional header (lets say Max-Content-Length) that server returns on initial POST and HEAD requests can help us with that.

I haven't found any existing headers for such task so I suggest to use a custom one. Here is a small example (we want to send 50 mb file):

Request:

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Final-Length: 52428800

Response:

HTTP/1.1 201 Created
Max-Content-Length: 10485760
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216

Ok. Now client knows that only 10 mb per request is allowed. Than it sends chunks. If something went wrong - make HEAD request, detect the offset and continue.

@felixge
Copy link
Contributor

felixge commented May 17, 2013

But how can client know what is the maximum size of 1 chunk?

It exposes it as a configuration option to the developer.

I think that an additional header (lets say Max-Content-Length) that server returns on initial POST and HEAD requests can help us with that.

I'm not convinced that we need this kind of feature negotiation. I mean I can see how it would be nice to have, but for now I'd like to avoid the complexity this introduces.

What do you think?

@wronglink
Copy link
Author

On the one hand it adds some complexity and maybe makes protocol less flexible.
But on the other hand - it seems to me that it's a really common task and if it is mentioned in the protocol description developers wouldn't be forced to invent a bicycle every time. Most of frameworks I know are not ready to deal with 1 big file upload but they can do it much easier with a sequence of chunks.

@felixge
Copy link
Contributor

felixge commented May 17, 2013

@wronglink thinking a little more about this, I think can be convinced that this is a good idea. @vayam what do you think?

@vayam
Copy link
Member

vayam commented May 17, 2013

@felixge @wronglink It is probably useful for integrating with legacy frameworks.
IMO not a good idea to return with POST or GET I propose we return Max-Content-Length with 413 and immediately close the connection

Request:
PATCH /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Content-Length: 104857600
Offset: 0

HTTP/1.1 413 Request Entity Too Large
Max-Content-Length: 10485760

Client can then adjust the PATCH request accordingly.
What do you think?

@wronglink
Copy link
Author

@vayam I thought about that. It sounds reasonable, but there is a problem: 413 error could be returned by some middleware (a web server, or some framework inside logics). And it can cause the situation when client has to resend the beginning of file again, but the protocol main idea is in the opposite - avoid extra data exchange.

I still think that it's a good idea to warn client if server has some restrictions before it tries to send all data.

BTW I've forgotten to note that Max-Content-Length header could be optional - in case that server hasn't any problems with big file handling it doesn't send anything.

@vayam
Copy link
Member

vayam commented May 17, 2013

@wronglink The initial POST could be optional. If the server closes the connection right away It can be fast failed. I agree it is an extra step. If 413 is not acceptable how about using 422 Unprocessable Entity

@wronglink
Copy link
Author

The initial POST could be optional.

Sorry I've missed that.

But I think I've got it: server could detect too large request by Content-Length header and return 4xx error without waiting for a whole request data was sent. As for response code, I agree that 413 Request Entity Too Large is the most appropriate.

@felixge
Copy link
Contributor

felixge commented May 22, 2013

But I think I've got it: server could detect too large request by Content-Length header and return 4xx error without waiting for a whole request data was sent.

Well, this is tricky business. Generally speaking http clients do not expect to be send a reply before finishing their body transmission (unless 100 Continue is in play). So doing this usually means killing the connection as well.

Anyway, I spend a bunch of time yesterday to think about hypermedia/REST and a potentially different way of approaching this protocol. The end result could be something that's compatible with RFC 1867 by default, and could also easily allow the server to control chunk sizes by providing the client with appropriate "form" documents (these could be defined as a JSON media type). I'm not convinced that this is the way to go, but I'll try to come up with a proof of concept for this soon so we can at least consider it.

@Acconut
Copy link
Member

Acconut commented Dec 16, 2014

The protocol mustn't define a maximum size by default. The current way we plan to approach your issue is by adding the discovery mechanism as discussed in #29. It will provide a way to get the maximum upload size from the server.

@Acconut Acconut closed this as completed Oct 17, 2015
@g-bull
Copy link

g-bull commented Mar 19, 2020

I am faced with this exact issue and like the Max-Content-Length proposal by wronglink.
That didn't get up because "The current way we plan to approach your issue is by adding the discovery mechanism as discussed in #29" - but I think that plan never came to fruition right?
Since I have control of the client and server, I am going with the Max-Content-Length proposal unless someone tells me a better way actually exists.

@Acconut
Copy link
Member

Acconut commented Mar 22, 2020

@g-bull It sounds as if you are looking for the proposal as in #93. Did you have a look at that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants