Skip to content

Add Copy blob to client and use in compose when composing a new blob from a single blob #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 22, 2025

Conversation

ericenns
Copy link
Contributor

In our app we wanted to reduce data handling in compose when only a single source blob is provided. To reduce data handling, I have added in support for copy-blob.

Now compose has slightly different behaviour. When a single source key is specified, the compose method will instead use copy-blob operation, otherwise the existing behaviour remains.

@JoeDupuis
Copy link
Member

Looks good! Thank you for the contribution!

Any reason for using put blob from url instead of copy blob from url?

Copy blob is async, but copy blob from url is synchronous and would unblock 256MB limit instead of 5MB.

I did some testing by omitting x-ms-blob-type and passing x-ms-requires-sync instead seem to work.

We'll need a changelog entry too.

Thank you!

@ericenns
Copy link
Contributor Author

ericenns commented Feb 6, 2025

Looks good! Thank you for the contribution!

Any reason for using put blob from url instead of copy blob from url?

Copy blob is async, but copy blob from url is synchronous and would unblock 256MB limit instead of 5MB.

I did some testing by omitting x-ms-blob-type and passing x-ms-requires-sync instead seem to work.

We'll need a changelog entry too.

Thank you!

I originally was using copy blob from url but it was failing for some reason, which I now realize was because I was not using a signed uri, which I found that I needed to use for put blob from url.

Even though put blob from url has a 5000MiB limit versus the 256MiB limit for copy blob from url, I have chosen to revert to using copy blob from url as we have been getting random failures trying to copy large blobs 3GB plus. The failures are timeout failures as the put method in this library has a default timeout of 60s. So instead of adding timeout handling in compose when using copy_blob reducing the limit to 256MiB might just be the best choice.

If you feel that this is too much of a risk for this library, i.e. using copy_blob in compose which could potentially timeout and fail instead of stream which I guess could also have the same issue though we have never encountered it.

@ericenns
Copy link
Contributor Author

ericenns commented Feb 6, 2025

@JoeDupuis what do you think about me adding an faster_compose: true option to the library and if set to true compose will use copy_blob for files <= 256MIB. The name is just an idea, but then users of this library can decide if they want that behaviour.

@JoeDupuis
Copy link
Member

Oh! I misread that—5000MiB, not 5MiB! 😅

Sorry for the late reply! I've been quite busy since the start of the year.

I'm not too worried about timeouts for 256MiB copies, but I should really look into adding an option to change the timeout (not in this PR).

I don't think we need an option for cases under 256MiB. Larger copies might require it, as they would also need a longer timeout.

Do you use compose just to copy single files, or do you also do "true" multi-file compose? If it's just single-file copies and you have a lot of larger files (3GB+), perhaps using the AzureBlob client directly for copying would make sense. The configured client is accessible from the AzureBlobService, so hopefully the interface won't be too rough. I know you can't do that now (copy bigger files) because of the timeout, but this should be configurable—I'll look into it.

Pretty cool to see the NML is using the gem! Hopefully I am not about to have my name on the git blame of some apocalyptic event 😂

Co-authored-by: Joé Dupuis <[email protected]>
@JoeDupuis JoeDupuis merged commit 961c085 into testdouble:main Feb 22, 2025
@JoeDupuis
Copy link
Member

Thank you! I just released you changes in 0.5.7

@ericenns
Copy link
Contributor Author

Thank you! I just released you changes in 0.5.7

Awesome, thanks I'll switch over our dependency to this release from our fork.

Oh! I misread that—5000MiB, not 5MiB! 😅

Sorry for the late reply! I've been quite busy since the start of the year.

I'm not too worried about timeouts for 256MiB copies, but I should really look into adding an option to change the timeout (not in this PR).

I don't think we need an option for cases under 256MiB. Larger copies might require it, as they would also need a longer timeout.

Do you use compose just to copy single files, or do you also do "true" multi-file compose? If it's just single-file copies and you have a lot of larger files (3GB+), perhaps using the AzureBlob client directly for copying would make sense. The configured client is accessible from the AzureBlobService, so hopefully the interface won't be too rough. I know you can't do that now (copy bigger files) because of the timeout, but this should be configurable—I'll look into it.

Pretty cool to see the NML is using the gem! Hopefully I am not about to have my name on the git blame of some apocalyptic event 😂

We use compose for copying single files and to concatenate text based files. We were sticking with compose as our software is open source and other health institutions across the world are anticipated to use it and they might not be using azure infrastructure. So it was easiest for us to use compose as it works for all other cloud providers and local storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants