Support per-bucket S3 configuration #6584
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces support for per-bucket configuration in AWS S3 operations in Nextflow, allowing fine-grained control over settings like region, endpoint, and transfer options for each bucket. The changes refactor the codebase to leverage these per-bucket settings throughout S3 client creation and the
.command.rungeneration, improving flexibility and correctness when working with multiple buckets.A new
bucketsoption has been added in theawsconfiguration to define bucket-specific configuration. Currentclientoptions are still valid and it is used for general bucket configuration. It will be applied to all the buckets that do not contain an specific configuration.A specific bucket property is defined as follows:
aws.buckets.'ngi-igenomes'.anonymous = trueSpecific configurations will override or complement general ones. So, if a property is defined in
clientand not inbuckets.<bucket.name>, the bucket will be configured with theclientvalue. If a property is defined in both or just inbuckets.<bucket.name>, the bucket will be configured with the value inbuckets.<bucket.name>.Per-bucket configuration support
AwsBucketConfigmodel to represent S3 bucket-specific settings (region, endpoint, options etc.), and updatedAwsConfigto parse and store a map of bucket configurations. (plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsBucketConfig.groovy,plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsConfig.groovy) [1] [2] [3]AwsClientFactoryto accept a bucket name and use the corresponding bucket-specific configuration for credentials, endpoint, and path style access. (plugins/nf-amazon/src/main/nextflow/cloud/aws/AwsClientFactory.groovy) [1] [2] [3]Task
.command.rungeneration changesUpdated batch executor (
AwsBatchExecutor) and file copy strategy (AwsBatchFileCopyStrategy) to inject per-bucket CLI arguments (storage class, encryption, requester pays, etc.) into S3 upload/download commands, using the new configuration model. (plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchExecutor.groovy,plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchFileCopyStrategy.groovy) [1] [2] [3] [4] [5]Removed redundant getters for S3 transfer options from
AwsOptionsand replaced them with methods that generate CLI arguments based on bucket-specific configuration. (plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsOptions.groovy) [1] [2]Configuration model refactoring
Renamed
AwsS3ConfigtoAwsS3ClientConfigfor clarity and updated references throughout the codebase. (plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsConfig.groovy) [1] [2]Refactored region resolution logic to prioritize bucket-specific endpoint then global client endpoint, bucket region,
aws.region, and finally default region. (plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsConfig.groovy)Improved region resolution to check the
AWS_REGIONenvironment variable beforeAWS_DEFAULT_REGION, ensuring correct region selection in more scenarios. (plugins/nf-amazon/src/main/nextflow/cloud/aws/config/AwsConfig.groovy)TODO
close #4732
replaces #6553