Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/guides/aws-java-sdk-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,17 @@ The HTTP client in SDK v2 does not support overriding certain advanced HTTP opti
- `aws.client.socketSendBufferSizeHint`
- `aws.client.userAgent`

## S3 transfer manager
## S3 concurrency

The *S3 transfer manager* is a subsystem of SDK v2 that handles S3 uploads and downloads.
You can use the `aws.client.maxConnections` config option to control the maximum number of concurrent HTTP connections to S3.

You can configure the concurrency and throughput of the S3 transfer manager manually using the `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` configuration options. Alternatively, you can use the `aws.client.targetThroughputInGbps` option to set both values automatically based on a target throughput.
You can also use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads specifically, based on the available network bandwidth. This setting is `10` by default, which means that Nextflow performs S3 transfers concurrently up to 10 Gbps of network throughput, up to the maximum connection limit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads specifically, based on the available network bandwidth. This setting is `10` by default, which means that Nextflow performs S3 transfers concurrently up to 10 Gbps of network throughput, up to the maximum connection limit.
Use the `aws.client.targetThroughputInGbps` option to control the concurrency of S3 uploads and downloads based on the available network bandwidth. This setting defaults to `10`, which allows Nextflow to perform concurrent S3 transfers up to 10 Gbps of network throughput, limited by the maximum connection count.


Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. You can enable virtual threads by setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. You can enable virtual threads by setting the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`.
Use these settings with virtual threads to achieve optimal performance for your environment. Increasing these settings beyond their defaults may improve performance for large runs. To enable virtual threads, set the `NXF_ENABLE_VIRTUAL_THREADS` environment variable to `true`.


## Multi-part uploads

Multi-part uploads are handled by the S3 transfer manager. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed.
Nextflow uploads large files to S3 as multi-part uploads. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nextflow uploads large files to S3 as multi-part uploads. You can use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` config options to control when and how multi-part uploads are performed.
Nextflow uploads large files to S3 as multi-part uploads. Use the `aws.client.minimumPartSize` and `aws.client.multipartThreshold` configuration options to control when and how multi-part uploads are performed.


The following multi-part upload config options are no longer supported:

Expand Down
23 changes: 4 additions & 19 deletions docs/reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,31 +167,21 @@ The following settings are available:
`aws.client.endpoint`
: The AWS S3 API entry point e.g. `https://s3-us-west-1.amazonaws.com`. The endpoint must include the protocol prefix e.g. `https://`.

`aws.client.maxConcurrency`
: :::{versionadded} 25.06.0-edge
:::
: The maximum number of concurrent S3 transfers used by the S3 transfer manager. By default, this setting is determined by `aws.client.targetThroughputInGbps`. Modifying this value can affect the amount of memory used for S3 transfers.

`aws.client.maxConnections`
: The maximum number of open HTTP connections used by the S3 transfer manager (default: `50`).
: The maximum number of open HTTP connections used by the S3 client (default: `50`).

`aws.client.maxErrorRetry`
: The maximum number of retry attempts for failed retryable requests (default: `-1`).

`aws.client.maxNativeMemory`
: :::{versionadded} 25.06.0-edge
:::
: The maximum native memory used by the S3 transfer manager. By default, this setting is determined by `aws.client.targetThroughputInGbps`.

`aws.client.minimumPartSize`
: :::{versionadded} 25.06.0-edge
:::
: The minimum part size used by the S3 transfer manager for multi-part uploads (default: `8 MB`).
: The minimum part size used for multi-part uploads to S3 (default: `8 MB`).

`aws.client.multipartThreshold`
: :::{versionadded} 25.06.0-edge
:::
: The object size threshold used by the S3 transfer manager for performing multi-part uploads (default: same as `aws.cllient.minimumPartSize`).
: The object size threshold used for multi-part uploads to S3 (default: same as `aws.cllient.minimumPartSize`).

`aws.client.protocol`
: :::{deprecated} 25.06.0-edge
Expand Down Expand Up @@ -262,12 +252,7 @@ The following settings are available:
`aws.client.targetThroughputInGbps`
: :::{versionadded} 25.06.0-edge
:::
: The target network throughput (in Gbps) used by the S3 transfer manager (default: `10`). This setting is not used when `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` are specified.

`aws.client.transferManagerThreads`
: :::{versionadded} 25.06.0-edge
:::
: The number of threads used by the S3 transfer manager (default: `10`).
: The target network throughput (in Gbps) used for S3 uploads and downloads (default: `10`).

`aws.client.userAgent`
: :::{deprecated} 25.06.0-edge
Expand Down
1 change: 1 addition & 0 deletions modules/nextflow/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ dependencies {
api 'org.pf4j:pf4j:3.12.0'
api 'dev.failsafe:failsafe:3.1.0'
api 'io.seqera:lib-trace:0.1.0'
api 'com.fasterxml.woodstox:woodstox-core:7.1.1'

testImplementation 'org.subethamail:subethasmtp:3.1.7'
testImplementation (project(':nf-lineage'))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ class ThreadPoolManager {
}

return executorService = Threads.useVirtual()
? Executors.newThreadPerTaskExecutor(new CustomThreadFactory(name ?: "nf-thread-pool-${poolCount.getAndIncrement()}".toString()))
? Executors.newThreadPerTaskExecutor(VirtualThreadFactoryBuilder.create(name ?: "nf-thread-pool-${poolCount.getAndIncrement()}".toString()))
: legacyThreadPool()
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
package nextflow.util

import java.util.concurrent.ThreadFactory

/**
* Builder class to create a named virtual thread factory.
* This class is required to avoid failures when using with Java version < 21
*
* @author [email protected]
*/
class VirtualThreadFactoryBuilder {

static ThreadFactory create(String name){
return Thread.ofVirtual().name(name).factory()
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
com.ctc.wstx.stax.WstxInputFactory
Original file line number Diff line number Diff line change
Expand Up @@ -260,10 +260,6 @@ class AwsClientFactory {
if( throughput != null )
builder.targetThroughputInGbps(throughput)

final nativeMemory = s3ClientConfig.getMaxNativeMemoryInBytes()
if (nativeMemory != null )
builder.maxNativeMemoryLimitInBytes(nativeMemory)

final maxConcurrency = s3ClientConfig.getMaxConcurrency()
if( maxConcurrency != null )
builder.maxConcurrency(maxConcurrency)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,7 @@ class AwsS3Config implements ConfigScope {

@ConfigOption
@Description("""
The maximum number of concurrent S3 transfers used by the S3 transfer manager. By default, this setting is determined by `aws.client.targetThroughputInGbps`. Modifying this value can affect the amount of memory used for S3 transfers.
""")
final Integer maxConcurrency

@ConfigOption
@Description("""
The maximum number of open HTTP connections used by the S3 transfer manager (default: `50`).
The maximum number of open HTTP connections used by the S3 client (default: `50`).
""")
final Integer maxConnections

Expand All @@ -78,19 +72,13 @@ class AwsS3Config implements ConfigScope {

@ConfigOption
@Description("""
The maximum native memory used by the S3 transfer manager. By default, this setting is determined by `aws.client.targetThroughputInGbps`.
""")
final MemoryUnit maxNativeMemory

@ConfigOption
@Description("""
The minimum part size used by the S3 transfer manager for multi-part uploads (default: `8 MB`).
The minimum part size used for multi-part uploads to S3 (default: `8 MB`).
""")
final MemoryUnit minimumPartSize

@ConfigOption
@Description("""
The object size threshold used by the S3 transfer manager for performing multi-part uploads (default: same as `aws.cllient.minimumPartSize`).
The object size threshold used for multi-part uploads to S3 (default: same as `aws.cllient.minimumPartSize`).
""")
final MemoryUnit multipartThreshold

Expand Down Expand Up @@ -168,16 +156,10 @@ class AwsS3Config implements ConfigScope {

@ConfigOption
@Description("""
The target network throughput (in Gbps) used by the S3 transfer manager (default: `10`). This setting is not used when `aws.client.maxConcurrency` and `aws.client.maxNativeMemory` are specified.
The target network throughput (in Gbps) used for S3 uploads and downloads (default: `10`).
""")
final Double targetThroughputInGbps

@ConfigOption
@Description("""
The number of threads used by the S3 transfer manager (default: `10`).
""")
final Integer transferManagerThreads

// deprecated

@Deprecated
Expand Down Expand Up @@ -222,10 +204,8 @@ class AwsS3Config implements ConfigScope {
this.endpoint = opts.endpoint ?: SysEnv.get('AWS_S3_ENDPOINT')
if( endpoint && FileHelper.getUrlProtocol(endpoint) !in ['http','https'] )
throw new IllegalArgumentException("S3 endpoint must begin with http:// or https:// prefix - offending value: '${endpoint}'")
this.maxConcurrency = opts.maxConcurrency as Integer
this.maxConnections = opts.maxConnections as Integer
this.maxErrorRetry = opts.maxErrorRetry as Integer
this.maxNativeMemory = opts.maxNativeMemory as MemoryUnit
this.minimumPartSize = opts.minimumPartSize as MemoryUnit
this.multipartThreshold = opts.multipartThreshold as MemoryUnit
this.proxyHost = opts.proxyHost
Expand All @@ -241,7 +221,6 @@ class AwsS3Config implements ConfigScope {
this.storageEncryption = parseStorageEncryption(opts.storageEncryption as String)
this.storageKmsKeyId = opts.storageKmsKeyId
this.targetThroughputInGbps = opts.targetThroughputInGbps as Double
this.transferManagerThreads = opts.transferManagerThreads as Integer
this.uploadChunkSize = opts.uploadChunkSize as MemoryUnit
this.uploadMaxAttempts = opts.uploadMaxAttempts as Integer
this.uploadMaxThreads = opts.uploadMaxThreads as Integer
Expand Down Expand Up @@ -281,10 +260,8 @@ class AwsS3Config implements ConfigScope {
Map<String,String> getAwsClientConfig() {
return [
connection_timeout: connectionTimeout?.toString(),
max_concurrency: maxConcurrency?.toString(),
max_connections: maxConnections?.toString(),
max_error_retry: maxErrorRetry?.toString(),
max_native_memory: maxNativeMemory?.toBytes()?.toString(),
minimum_part_size: minimumPartSize?.toBytes()?.toString(),
multipart_threshold: multipartThreshold?.toBytes()?.toString(),
proxy_host: proxyHost?.toString(),
Expand All @@ -298,7 +275,6 @@ class AwsS3Config implements ConfigScope {
storage_encryption: storageEncryption?.toString(),
storage_kms_key_id: storageKmsKeyId?.toString(),
target_throughput_in_gbps: targetThroughputInGbps?.toString(),
transfer_manager_threads: transferManagerThreads?.toString(),
upload_chunk_size: uploadChunkSize?.toBytes()?.toString(),
upload_max_attempts: uploadMaxAttempts?.toString(),
upload_max_threads: uploadMaxThreads?.toString(),
Expand Down
Loading