Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Credentials cache must be protected by a lock #1282

Open
3 of 6 tasks
Veetaha opened this issue Jan 31, 2025 · 3 comments
Open
3 of 6 tasks

AWS Credentials cache must be protected by a lock #1282

Veetaha opened this issue Jan 31, 2025 · 3 comments

Comments

@Veetaha
Copy link

Veetaha commented Jan 31, 2025

Describe the bug
Today session.get_credentials() and session.create_client() don't use a lock when sourcing the credentials. This results in a performance problem when multiple AWS clients are created concurrently and awaited with asyncio.gather. This is a real-world problem when the script runs in an environment that uses credential_process that takes an exclusive file lock. For example, it reproduces in my real environment that uses aws-vault SSO config with pass backend.

Here is a minimized reproduction of the bug. Take this Python code as an example (it doesn't close the AsyncExitStack, but that's irrelevant):

import asyncio
import time
import aiobotocore.session
from contextlib import AsyncExitStack

async def main():
    session = aiobotocore.session.get_session()

    timer = time.perf_counter()

    exit_stack = AsyncExitStack()

    await asyncio.gather(
        exit_stack.enter_async_context(session.create_client('sts')),
        exit_stack.enter_async_context(session.create_client('organizations')),
        exit_stack.enter_async_context(session.create_client('s3')),
        exit_stack.enter_async_context(session.create_client('ec2')),
        exit_stack.enter_async_context(session.create_client('efs')),
        exit_stack.enter_async_context(session.create_client('fsx')),
    )

    print(f"Client creation took {time.perf_counter() - timer:.2f} seconds")

asyncio.run(main())

Then create an AWS config with the following profile:

[profile test-cred-process]
region = eu-central-1
credential_process = /tmp/cred-process.sh

Put the following bash script into /tmp/cred-process.sh and make it executable. This script takes an advisory file lock on /tmp/lockfile and imitates the delay of 0.5s of resolving credentials (just like aws-vault with SSO setup has considerable delay):

#!/usr/bin/env bash

(
    flock 200 -c "
        sleep 0.5s
        echo '"'{
            "Version": 1,
            "AccessKeyId": "kkk",
            "SecretAccessKey": "aaa",
            "SessionToken": "sss",
            "Expiration": "2025-01-31T11:16:40Z"
        }
    '"'"
) 200>/tmp/lockfile

Now if you run the python code with asyncio.gather(), it'll take ~3 seconds to execute. This is because every create_client future invokes its own credential process, and waits for 0.5 on a file lock to be released. If you call await session.get_credentials() right before asyncio.gather() then the runtime decreases to ~0.6 seconds.

So the problem is that the credential loading process is not locked, and multiple futures try to spawn their own credential process when the credential cache is empty, thus resulting in a significant lag.

Checklist

  • I have reproduced in environment where pip check passes without errors
  • I have provided pip freeze results
  • I have provided sample code or detailed way to reproduce
  • I have tried the same code in botocore to ensure this is an aiobotocore specific issue
  • I have tried similar code in aiohttp to ensure this is an aiobotocore specific issue
  • I have checked the latest and older versions of aiobotocore/aiohttp/python to see if this is a regression / injection
pip freeze results
aiobotocore==2.19.0
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aioitertools==0.12.0
aiosignal==1.3.2
async-timeout==5.0.1
attrs==25.1.0
botocore==1.36.3
botocore-stubs==1.36.9
frozenlist==1.5.0
idna==3.10
jmespath==1.0.1
multidict==6.1.0
propcache==0.2.1
python-dateutil==2.9.0.post0
six==1.17.0
types-aiobotocore==2.19.0
types-aiobotocore-backup==2.19.0
types-aiobotocore-ec2==2.19.0
types-aiobotocore-efs==2.19.0
types-aiobotocore-fsx==2.19.0
types-aiobotocore-organizations==2.19.0
types-aiobotocore-s3==2.19.0
types-awscrt==0.23.8
typing_extensions==4.12.2
urllib3==2.3.0
wrapt==1.17.2
yarl==1.18.3

Environment:

  • Python Version: 3.10.12
  • OS name and version: 22.04.5 LTS (Jammy Jellyfish)
@thehesiod
Copy link
Collaborator

thehesiod commented Jan 31, 2025

this is the same thing in botocore if you create multiple botocore clients in different threads. We basically just parallel what botocore does. I'd log there with the threading equivalent, when they do it there we'll do it here :) Note we follow this pattern with refreshable credentials: botocore vs aiobotocore. But please do double check with botocore and threads.

@Veetaha
Copy link
Author

Veetaha commented Jan 31, 2025

Created boto/botocore#3364. Didn't think that botocore would suffer from the same problem

@thehesiod
Copy link
Collaborator

ya you can think of aiobotocore as an async parallel to botocore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants