-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refresh drives config credentials using a timer #78
base: main
Are you sure you want to change the base?
Conversation
CI checks are failing due to unrelated reasons:
|
raise tornado.web.HTTPError( | ||
status_code= httpx.codes.NOT_IMPLEMENTED, | ||
reason="Listing drives not supported for given provider.", | ||
) | ||
|
||
results = [] | ||
for drive in drives: | ||
results += drive.list_containers() | ||
for drive in self._drives: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are changing the behavior of list drives function.
old: create a new drive for each call S3Drive
new: used the same cached S3Drive
that is periodically refreshed.
Is this ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified that this is okay and also I believe it is meant to be stored for subsequent calls, so it should be an improvement. Only callout would be do the connections get correctly disposed when we refresh to get a new instance and discard the old one.
if self._config.provider == 's3': | ||
if self._config.access_key_id and self._config.secret_access_key: | ||
self._s3_session = get_session() | ||
self._file_system = s3fs.S3FileSystem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to s3fs page - S3FS builds on [aiobotocore](https://aiobotocore.readthedocs.io/en/latest/) to provide a convenient Python filesystem interface for S3.
ideally this should pick up credentials from SageMaker Studio instance automatically. have you tested it ?
According to this github issue - nsidc/earthaccess#765 it may not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my understanding ideally it should work, however with the current implementation I see that we are explicitly providing the access key and token, so from my understanding S3FS will respect these values and won't refresh unless we do it explicitly again.
def _initialize_credentials_refresh(self): | ||
self._drives_refresh_callback() | ||
if not self._config.credentials_already_set: | ||
self._drives_refresh_timer = PeriodicCallback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is JupyterDrivesManager
initialized multiple times ? This will need some deeper refactoring to make sure we dont have multiple refresh callbacks are not invoked for each instance of JupyterDrivesManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JupyterDrivesManager gets initialized only once
Addresses Issue #74
_load_credentials
refresh every 15 minutes so that new boto credentials get picked up before they are expired. Botocore library takes care of renewing the credentials under the hood seamlessly, we just need to read the latest values before the token is expired. The token by default gets expired in 60 minutes.DrivesConfig
class singleton so that we only have one refresh timer running at a single point of time