feat!(clp-package): Add support for providing multiple S3 object keys as input for compression. #1383
+254
−39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Before this PR, the CLP package compression only accepted a single S3 URL as the input. The S3 URL is decomposed into a bucket, a region code, and a key prefix. Everything under the key prefix will be compressed.
This PR adds support for providing multiple S3 object keys as input for compression. Now, users can specify a list of S3 URLs, where each URL is treated as a URL to an actual key in the bucket. Key-prefix-based ingestion is still supported, however, users must explicitly specify that the input is a prefix by using
--s3-single-prefix
option.The current implementation has the following requirements for the given input URLs:
To support multi-keys, we update
S3InputConfig
to store an optionalkeys
field. If this field is not set, prefix-based ingestion will be used as before. Otherwise, we will traverse the bucket to collect object metadata of the given keys to create compression jobs.Checklist
breaking change.
Validation performed