Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement oc chunking/checksum using tus metadata #1279

Closed
butonic opened this issue Apr 21, 2020 · 21 comments
Closed

implement oc chunking/checksum using tus metadata #1279

butonic opened this issue Apr 21, 2020 · 21 comments
Assignees

Comments

@butonic
Copy link
Member

butonic commented Apr 21, 2020

I ran into checksum problems. Maybe the tus chunking extension is not what we want because it may discard whole chunks?

See clarification in tus/tus-resumable-upload-protocol#143 (comment)

all client disconnects would remove the last chunk.

Yes, that's true and one of the major drawbacks of checksum extension.

  • we can send the checksum in the metadata, but for large files that would
    require reading the file twice.
  • the client might calculate the checksum while sending, the server as well and can return the checksum in the response. but that would require resending the whole file in case of an error.
  • actually this is the right thing to do
  • together with the concatenation extension the full bandwith can be used
  • but tusd does not support it, yet

AFAICT we can send the checksum in the metadata to mimic the current checksum protocol: https://doc.owncloud.org/desktop/2.4/architecture.html#upload

@butonic butonic self-assigned this Apr 21, 2020
@butonic
Copy link
Member Author

butonic commented Apr 21, 2020

add code to cs3org/reva#669

@PVince81
Copy link
Contributor

rename title s/chunking/checksum/ ?

@butonic butonic changed the title implement oc chunking using tus metadata implement oc chunking/checksum using tus metadata Apr 24, 2020
@butonic
Copy link
Member Author

butonic commented Apr 24, 2020

AFAICT we should be good with tus/tus-resumable-upload-protocol#93

  • it allows admins to specify the chunksize for uploads to bypass things like mobile iron who simply die at ~50M I was told. Treat it as rumor ...

but this resume, together with a checksum that is calculated ... during upload and sent as an HTTP trailer would cover the requirements.

I don't know what is going to die when uploading 50GB in the browser ... we need to check.

@butonic butonic transferred this issue from owncloud/ocis-reva Jan 18, 2021
@refs refs added the Topic:TUS label Jan 18, 2021
@settings settings bot removed the Topic:TUS label Jan 29, 2021
@butonic
Copy link
Member Author

butonic commented Feb 5, 2021

@michaelstingl could you comment on tus/tus-resumable-upload-protocol#93 and explain why the Tus-Min/Max-Chunk-Size header makes sense for us? It seems they are looking for good reasons to add it: tus/tusd#353 (comment)

@michaelstingl
Copy link
Contributor

michaelstingl commented Feb 5, 2021

@butonic no need for min/max from my POV. TUS already supports fixed chunk sizes. We can have a quick call later. 🤦

@butonic
Copy link
Member Author

butonic commented Feb 5, 2021

@phil-davis @individual-it I implemented checksumming for TUS uploads in reva but I don't see acceptance tests for that in core. Checksumming when using chunks is still an open task as we need throw away any aborted PATCH request (see tus/tus-resumable-upload-protocol#143 (comment)).

@butonic
Copy link
Member Author

butonic commented Feb 5, 2021

@michaelstingl IIRC you wanted to be able to let the admin configure the chunksize to make uploads pass firewalls and intermediary services ...

@individual-it
Copy link
Member

@butonic I don't think we have done checksums in TUS tests yet.
Please create an issue to create acceptance tests for that with a link to the PR and the part of the TUS protocol docs how it suppose to work and we can add tests for it

@butonic
Copy link
Member Author

butonic commented Feb 5, 2021

Initial Checksum support in cs3org/reva#1400

@michaelstingl
Copy link
Contributor

michaelstingl commented Feb 5, 2021

@michaelstingl IIRC you wanted to be able to let the admin configure the chunksize to make uploads pass firewalls and intermediary services ...

Yes, already solved with fixed-size chunk information in header. Why do we need min/max for that?

Yes, already solved with tus-max-size chunk information in header. Why do we need another min/max for that? 🤦‍♂️

@michaelstingl
Copy link
Contributor

@butonic see my updated comment ⬆️

@michaelstingl
Copy link
Contributor

Yes, already solved with tus-max-size chunk information in header. Why do we need another min/max for that?

Oh, crap. Not my day today. tus-max-size is the "maximum allowed size of an entire upload". I need to check first.

@felix-schwarz
Copy link

felix-schwarz commented Feb 7, 2021

As far as OC is concerned, there already is a max_chunk_size in the files > tus_support part of capabilities that the iOS client is already supporting:

"files" : {
	"tus_support" : {
		"max_chunk_size" : 
		
	}
	
}

IMO in its current form the Checksum extension is not suitable for mobile clients, as it requires that all data of an incomplete PATCH request be discarded, contradicting efforts like the Upload-Tag extension that tries to save bandwidth by not re-transmitting already sent data.

A possible solution I can see is to take a transactional approach, where the checksum is recorded after each PATCH request, allowing a rollback and retransmission, with commits happening whenever checksums match:

  • a chunk is sent via PATCH, including the checksum for the chunk in the header
  • if the request succeeds:
    • if checksum is correct, the chunk gets committed
    • if checksum is incorrect, the chunk gets thrown out and an error returned
    • (so far, that's essentially how the existing Checksum extension works)
  • if the connection breaks down and the request is incomplete:
    • the subsequent HEAD request from the client on the upload resource returns additional info:
      • the offset range (begin offset - end offset) of the incomplete chunk
      • a server-computed checksum over the received parts of the incomplete chunk
    • the client can now use that info to check if the checksum matches that part of the file
      • if it matches, client sends a PATCH request
        • with extra range+checksum metadata commit-partial-upload-range=500-700,commit-partial-upload-checksum=sha256:a76… in the header to confirm the integrity and allow the server to commit the partial upload. Since, in case of success, that range + checksum will match the server's previous response, no re-computation of the checksum will be necessary and a simply comparison will do. But in the unlikely event of the client sending bogus data in the form of a mismatching range or checksum, the server should respond with a 409 Conflict status.
        • otherwise continues the upload from the end of the partial upload (standard tus PATCH syntax)
      • if it does not match, client sends a PATCH request
        • resuming from the start of the corrupted chunk (standard tus PATCH syntax), causing the server to throw out the uncommitted chunk
  • rinse and repeat until the upload is complete

This could be implemented as an extension to the existing Checksum extension.

What do you think @butonic @michaelstingl ?

@butonic
Copy link
Member Author

butonic commented Feb 8, 2021

To make chunked upload requests always land on the same dataprovider, the internal upload url can be used to route all requests to the data provider that is responsible for the upload.

🤔 I think this is already done ;-) @kulmann implmented the datagateway and dataprovider part for that IIRC.

@felix-schwarz
Copy link

I have finished a first draft for the proposal of a Partial Checksum tus extension here:

https://github.com/felix-schwarz/tus-resumable-upload-protocol/blob/extension/partial-checksum/protocol.md#partial-checksum

Feedback welcome! I plan to submit the PR for it soon.

@felix-schwarz
Copy link

After further editing for clarity and some simplifications regarding headers, I just created the PR: tus/tus-resumable-upload-protocol#172

@butonic
Copy link
Member Author

butonic commented Mar 3, 2021

Nothing to add! Great writeup that integrates well with the core resumable upload nature of TUS!

@butonic
Copy link
Member Author

butonic commented Mar 3, 2021

tracking missing TUS headers in #1747

@butonic
Copy link
Member Author

butonic commented Mar 3, 2021

tracking wrong checksum for TUS in #1755

@butonic
Copy link
Member Author

butonic commented Mar 3, 2021

cs3org/reva#1400 got merged, closing this issue as open bugs are tracked in dedicated issues.

@butonic butonic closed this as completed Mar 3, 2021
@michaelstingl
Copy link
Contributor

@michaelstingl could you comment on tus/tus-resumable-upload-protocol#93 and explain why the Tus-Min/Max-Chunk-Size header makes sense for us? It seems they are looking for good reasons to add it: tus/tusd#353 (comment)

Done: tus/tus-resumable-upload-protocol#93 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants