Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refresh drives config credentials using a timer #78

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gogakoreli
Copy link
Contributor

@gogakoreli gogakoreli commented Feb 19, 2025

Addresses Issue #74

  • Initialize _load_credentials refresh every 15 minutes so that new boto credentials get picked up before they are expired. Botocore library takes care of renewing the credentials under the hood seamlessly, we just need to read the latest values before the token is expired. The token by default gets expired in 60 minutes.
  • Make DrivesConfig class singleton so that we only have one refresh timer running at a single point of time
  • Verified installing updated package in Sagemaker JupyterLab and once token expired in 60 minutes, S3 Browser extension continued to function as expected, i.e the credentials were refreshed successfully.

Copy link

Binder 👈 Launch a Binder on branch gogakoreli/jupyter-drives/main

@gogakoreli
Copy link
Contributor Author

CI checks are failing due to unrelated reasons:

Error: This request has been automatically failed because it uses a deprecated version of actions/upload-artifact: v3. Learn more: https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/

raise tornado.web.HTTPError(
status_code= httpx.codes.NOT_IMPLEMENTED,
reason="Listing drives not supported for given provider.",
)

results = []
for drive in drives:
results += drive.list_containers()
for drive in self._drives:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are changing the behavior of list drives function.
old: create a new drive for each call S3Drive
new: used the same cached S3Drive that is periodically refreshed.

Is this ok ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verified that this is okay and also I believe it is meant to be stored for subsequent calls, so it should be an improvement. Only callout would be do the connections get correctly disposed when we refresh to get a new instance and discard the old one.

if self._config.provider == 's3':
if self._config.access_key_id and self._config.secret_access_key:
self._s3_session = get_session()
self._file_system = s3fs.S3FileSystem(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to s3fs page - S3FS builds on [aiobotocore](https://aiobotocore.readthedocs.io/en/latest/) to provide a convenient Python filesystem interface for S3.
ideally this should pick up credentials from SageMaker Studio instance automatically. have you tested it ?

According to this github issue - nsidc/earthaccess#765 it may not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my understanding ideally it should work, however with the current implementation I see that we are explicitly providing the access key and token, so from my understanding S3FS will respect these values and won't refresh unless we do it explicitly again.

def _initialize_credentials_refresh(self):
self._drives_refresh_callback()
if not self._config.credentials_already_set:
self._drives_refresh_timer = PeriodicCallback(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is JupyterDrivesManager initialized multiple times ? This will need some deeper refactoring to make sure we dont have multiple refresh callbacks are not invoked for each instance of JupyterDrivesManager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JupyterDrivesManager gets initialized only once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants