Skip to content

More frictionless S3 direct access #431

@abarciauskas-bgse

Description

@abarciauskas-bgse

earthaccess allows for filtering datasets by cloud_hosted, and allows for discovering the S3 links using data_links(access="direct"), and even downloading. But I'm not able to use earthdata to open the data directly from S3 using the VEDA JupyterHub. Could this be because the VEDA JupyterHub is associated with a role for Earthdata cloud access?

Right now this is how the code is executing:

first_result = earthaccess.search_data(
    short_name='MUR-JPL-L4-GLOB-v4.1',
    cloud_hosted=True,
    count=1
)
# Granules found: 7899

direct_link = first_result[0].data_links(access="direct")
direct_link
# ['s3://podaac-ops-cumulus-protected/MUR-JPL-L4-GLOB-v4.1/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc']

earthaccess.open(direct_link)
# We cannot open S3 links when we are not in-region, try using HTTPS links

earthaccess responds it can't open the dataset, even though this code was run in-region. I'm using the VEDA hub with direct access so I can resort to using xarray + s3fs to open the link, but having earthaccess.open work for direct access would be good to add for in-region users who are not using a NASA-managed hub like VEDA.

Ideally, this search and open would be like:

first_result = earthaccess.search_data(
    short_name='MUR-JPL-L4-GLOB-v4.1',
    cloud_hosted=True,
    count=1,
    access="direct"
)
# Granules found: 7899

earthaccess.open(first_result) # opens the data directly from S3

This is very much the example from the README (minus the access="direct" parameter), but, at least in the VEDA JupyterHub results and .open are using an HTTPFileSystem not S3.

Perhaps the issue is it's not recognizing that the code is being run in-region?

Apologies if I missed something about how the library is supposed to work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions