Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FileSystem unable to gain access in the context of a pod on AWS EKS using Pod Identity Association #45603

Open
jleben opened this issue Feb 22, 2025 · 0 comments

Comments

@jleben
Copy link

jleben commented Feb 22, 2025

Describe the bug, including details regarding any error messages, version, and platform.

I am trying to use parrow.fs.S3FileSystem in a pod running on AWS EKS. The cluster is configured so that the pod assumes an IAM role via Pod Identity Association.

S3FileSystem seems to have no way to obtain credentials from the Pod Identity Association directly. When instantiated with no arguments, as in S3FileSystem(), it gains no access (receives ACCESS_DENIED on get_file_info for example).

I am able to give S3FileSytem access however by first manually obtaining temporary credentials (access key, secret key and session token) from the Pod Identity Association (e.g. through boto3) and then either passing the temporary credentials as arguments when instantiating S3FileSytem, or storing them in environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.

However, in some systems, e.g. Ray Train, an instance of S3FileSytem is created and used potentially for a very long time (e.g. for the duration of training a large model which could take days). Relying on a single set of expiring credentials is therefore limiting.

It is worth noting that if S3FileSytem is instantiated with no temporary credentials in constructor arguments, then I am able to keep updating the temporary credentials in environment variables when they expire and the instance of S3FileSytem will use the refreshed ones on every method call (such as get_file_info). This is the only method I found to give S3FileSytem long-term access through Pod Identity Association beyond the expiry of a single set of temporary credentials.

However, the method of updating environment variables has its own drawbacks:

  • the environment variables affect the entire Python process - it is not possible to give specific credentials to the instance of S3FileSytem.
  • with some libraries that internally use S3FileSytem during a long operation (e.g. Ray Train), the user may not necessarily get a reliable opportunity (e.g. through a callback etc.) to update the environment variables. There is always the option of doing it on a separate Python thread, but that may risk race conditions.

Due to the reasons above, it would be very beneficial if S3FileSytem was able to automatically (internally) gain access through EKS Pod Identity Association, and maintain that access beyond the expiry of a single set of temporary credentials.

Component(s)

Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant