(AWS) Docs: List all AWS S3 properties from all language impl. #11383

Neuw84 · 2024-10-23T18:28:28Z

As @hsiang-c made another pull request building a table here I didn't want to collide.

Fixes List all AWS S3 properties in the docs #10674

Therefore, I added:

Added Amazon MSK Connect as option.
Added HTTP client advice when high throughput scenarios (not just tune the retries but also number of connections).
Added specific configs for data prefetching on EMR 7.1.0

As personal opinion, if using AWS SDKs most of the properties shouldn't be there ( there is a standard way of configure them, prioritize them, etc). However, is clear that using 3rd party libraries in different languages would require info like the tables @hsiang-c has built.

The problem with this is that different libraries will have different configs ( on the same language).

In my personal opinion instead of dividing by language, maybe by library (but here maybe just adding a link to the corresponding doc page should be enough)? And having a separate section/table for AWS SDKs supported ones (anything using official libraries will have the same config, no matter the language)?

Thanks!

Added Amazon MSK Connect as option. Added HTTP client advice when high throughput scenarios. Added specific configs for data prefetching on EMR 7.1.0

danielcweeks · 2024-10-30T23:21:32Z

docs/docs/aws.md

+For versions after 7.1.0 there is an specific config that can be used to enable data prefecth optimization. You just need to add the following property on your Spark config.
+
+```shell
+spark.sql.iceberg.data-prefetch.enabled=true


I don't believe this is an Iceberg property. If this is specific to EMR, I don't believe it should be included here.

this is specific to EMR yes ( internal iceberg runtime), however we are on the "aws" docs page.

I think that stating that you can add that parameter to improve the performance of Iceberg workloads on EMR is good to have/know?

danielcweeks · 2024-10-30T23:32:12Z

docs/docs/aws.md

+**Note that for workloads with exceptionally high throughput against tables that S3 where you will likely to increase Retries, you will also like to increase the number of connections for the HTTP client**
+
+```shell
+spark.sql.catalog.my_catalog.http-client.apache.max-connections=200


This doesn't look like an Iceberg setting from what I can tell. If this is EMR specific, it should not be included here.

It is a thing of AWS SDK and Spark ( not specifically to EMR). If you use Spark on your laptop writing to S3 and you are on this high throughput write scenario you will likely tune the parameter.

Any spark runtime will use this ( maybe photon runtime do use another S3 client but I don´t have that info :) ).

On the previous case agree with you that is super specific to EMR and it may or not be added on the aws "docs".

I mean, we are speaking about a AWS docs in this page ( the parameter is quite specific to the S3 client of the AWS SDK).

danielcweeks · 2024-10-30T23:34:02Z

@Neuw84 it looks like we're duplicating what should be EMR documentation here. We already link off to the EMR docs, so I don't feel this is the right place for putting specific configuration info.

Neuw84 · 2024-10-31T10:58:00Z

@danielcweeks let me know your thoughts on the comments ( agree on the specific one about EMR, although for me it does not hurt as we are on aws docs page).

What are your thoughts about the S3 clients info?

Update aws.md

d405582

Added Amazon MSK Connect as option. Added HTTP client advice when high throughput scenarios. Added specific configs for data prefetching on EMR 7.1.0

github-actions bot added the docs label Oct 23, 2024

Neuw84 changed the title ~~(AWS) Docs: List all AWS S3 properties from all language impl. #10674~~ (AWS) Docs: List all AWS S3 properties from all language impl. fixes #10674 Oct 23, 2024

Neuw84 changed the title ~~(AWS) Docs: List all AWS S3 properties from all language impl. fixes #10674~~ (AWS) Docs: List all AWS S3 properties from all language impl. Oct 23, 2024

danielcweeks reviewed Oct 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(AWS) Docs: List all AWS S3 properties from all language impl. #11383

(AWS) Docs: List all AWS S3 properties from all language impl. #11383

Neuw84 commented Oct 23, 2024 •

edited

Loading

danielcweeks Oct 30, 2024

Neuw84 Oct 31, 2024 •

edited

Loading

danielcweeks Oct 30, 2024

Neuw84 Oct 31, 2024 •

edited

Loading

danielcweeks commented Oct 30, 2024

Neuw84 commented Oct 31, 2024

(AWS) Docs: List all AWS S3 properties from all language impl. #11383

Are you sure you want to change the base?

(AWS) Docs: List all AWS S3 properties from all language impl. #11383

Conversation

Neuw84 commented Oct 23, 2024 • edited Loading

danielcweeks Oct 30, 2024

Choose a reason for hiding this comment

Neuw84 Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

danielcweeks Oct 30, 2024

Choose a reason for hiding this comment

Neuw84 Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

danielcweeks commented Oct 30, 2024

Neuw84 commented Oct 31, 2024

Neuw84 commented Oct 23, 2024 •

edited

Loading

Neuw84 Oct 31, 2024 •

edited

Loading

Neuw84 Oct 31, 2024 •

edited

Loading