Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous definition of Upload-Metadata #124

Open
jawi opened this issue Apr 22, 2017 · 5 comments
Open

Ambiguous definition of Upload-Metadata #124

jawi opened this issue Apr 22, 2017 · 5 comments

Comments

@jawi
Copy link

jawi commented Apr 22, 2017

The Upload-Metadata header is a bit ill-defined with respect to its encoding: first, it tells that values must be base64 encoded, but there are several variants of base64 encoding. From the code I can deduce that it is using base64 encoding as defined in RFC4648 section 4. To aid interoperability, it might be good to add this detail.

In addition, the value represents an array of bytes encoded as base64. How should we interpret this byte array? Is it to be interpreted by the TUS server or only considered of something of value for the TUS client?

@Acconut
Copy link
Member

Acconut commented Apr 27, 2017

The Upload-Metadata header is a bit ill-defined with respect to its encoding: first, it tells that values must be base64 encoded, but there are several variants of base64 encoding. From the code I can deduce that it is using base64 encoding as defined in RFC4648 section 4. To aid interoperability, it might be good to add this detail.

Good idea!

In addition, the value represents an array of bytes encoded as base64. How should we interpret this byte array?

The protocol does not define how the metadata values should be used or interpreted. Typically they will be converted to a string but in theory a server could also handle the raw byte array, even though I don't see a practical reason for that.

Is it to be interpreted by the TUS server or only considered of something of value for the TUS client?

Both, it's a way for the server to get additional details about an upload from the client as well as a tool for the client to store information together with the upload.

@jawi
Copy link
Author

jawi commented May 1, 2017

The protocol does not define how the metadata values should be used or interpreted. Typically they will be converted to a string but in theory a server could also handle the raw byte array, even though I don't see a practical reason for that.

Ok, understood. However, this does raise the question on what encoding the server (and client) is to expect: if there's nothing specified on how to interpret these bytes, then it must be agreed, probably out-of-band, how the encoding should be. Trying to interpret a Cp1252 encoded string as UTF-8 does not give the expected results and as such it should be possible for the server to tell the client that it has rejected or invalidated the provided metadata.

Both, it's a way for the server to get additional details about an upload from the client as well as a tool for the client to store information together with the upload.

Ok, aside the encoding question, this does open up a potential other problem: the amount of metadata a client can send. There are no official limitations to the length of a HTTP header, but passing in "any" amount of data that is to be controlled from the client is a potential can of worms. It might be a good idea to limit the maximum length of metadata a client can send to the server.

@Acconut
Copy link
Member

Acconut commented May 1, 2017

However, this does raise the question on what encoding the server (and client) is to expect: if there's nothing specified on how to interpret these bytes, then it must be agreed, probably out-of-band, how the encoding should be. Trying to interpret a Cp1252 encoded string as UTF-8 does not give the expected results

You are correct, the client and server should agree on the details outside of the tus protocol. To provide more context: metadata is meant to pass application-specific data from the client to the server. It's not necessarily designed to allow an arbitrary pair of client and server to exchange meta data (that's also why we don't specific and meta data values, such as filename or filetype).

it should be possible for the server to tell the client that it has rejected or invalidated the provided metadata.

The server can reject the upload creation POST request by responding with a 4XX status code, indicating that the request is invalid.

this does open up a potential other problem: the amount of metadata a client can send. There are no official limitations to the length of a HTTP header, but passing in "any" amount of data that is to be controlled from the client is a potential can of worms. It might be a good idea to limit the maximum length of metadata a client can send to the server.

I am not much concerned about this. Every proper webserver/proxy I am aware of has a limit on how big headers may be which is enabled by default. This should represent a good enough protection against such attacks.

@SpotlightKid
Copy link

SpotlightKid commented Mar 18, 2019

Ok, understood. However, this does raise the question on what encoding the server (and client) is to expect: if there's nothing specified on how to interpret these bytes, then it must be agreed, probably out-of-band, how the encoding should be.

You are correct, the client and server should agree on the details outside of the tus protocol.

I find that a bit disappointing. I think at least it should be spec'ed how to indicate the encoding of the value and spec a default encoding.

Imagine, the client and server agreed on a metadata key named filename. The value, if it should be able to reflect real-life filenames, could have any encoding. The server and client could agree to put the encoding at the start of the (base64-decoded) value, but how to encode the encoding itself? As a name in ascii/latin1/utf-8 separated form the actual value by some separator (e.g. a colon)? Then the separator needs to be escaped in the value or you couldn't have a default encoding. Encoded in binary, e.g. in the first two bytes? Then they'd also need to define the mapping between numerical values and encoding names.

I think the TUS protocol should refine the definition of the Upload-Metadata format, otherwise everybody and their dog needs to and will come up with their own way of defining the encoding and we get interoperability problems.

@Acconut
Copy link
Member

Acconut commented Mar 21, 2019

@SpotlightKid I find that a bit disappointing. I think at least it should be spec'ed how to indicate the encoding of the value and spec a default encoding.

I can understand where that opinion is coming from but I don't fully agree with you. Let me try to explain: The entire idea behind the Upload-Metadata header is to provide a common way to attach application-specific data to an upload. So it is completely intentional that the metadata itself is not defined by the specification and therefore the client and server have to agree on what information they want to represent with which metadata key. Since those two factors must already be decided, I don't see a reason why the tus specification should require them to use a specific data format (for example, string encoding) instead of letting the application handle that on its own. The application usually knows best what data format should be used in its environment.

Furthermore, and I am completely open here, I don't know enough about different encoding methods and how they are handled in various environments. Would it be enough to say that they are always UTF-8 encoded? What if we want to transport a bytestream which cannot be UTF-8 encoded? I don't see myself as competent enough to answer those questions and in the past I haven't heard of a single problem that arose from metadata encoding incompatibility.

Of course, I am always happy to discuss this and improve the protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants