Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle an s3 URI in asdf_cut() #117

Merged
merged 5 commits into from
May 21, 2024

Conversation

snbianco
Copy link
Collaborator

If an s3 URI is passed in as the input_file parameter to the asdf_cut function, open the resource with s3fs and use its http URL in asdf.open().

This prevents users from having to open the resource themselves before passing the file into the asdf_cut function.

@snbianco snbianco requested a review from havok2063 May 17, 2024 15:04
@snbianco snbianco self-assigned this May 17, 2024
@snbianco snbianco marked this pull request as draft May 17, 2024 15:12
@snbianco snbianco marked this pull request as ready for review May 17, 2024 15:45
@dr-rodriguez dr-rodriguez self-requested a review May 20, 2024 14:11
@dr-rodriguez
Copy link

Double check if any of the unit tests need to be updated for this.

Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I would agree that we should add some explicit tests for string vs s3 path in test_asdf_cut.py.

@@ -247,8 +248,15 @@ def asdf_cut(input_file: str, ra: float, dec: float, cutout_size: int = 20,
an image cutout object
"""

# if file comes from AWS cloud bucket, get URL
file = input_file
if isinstance(input_file, str) and input_file.startswith('s3://'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we may want to do at some point is expand this to support pathlib.Path as input, in addition to a string. I don't think pathlib.Path handles s3 urls, but it looks like there are packages that do, e.g. s3path or s3pathlib. I don't think we need to do it for this PR, but maybe we can create an issue for the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will make a new issue for this!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue is ASB-27085

@snbianco
Copy link
Collaborator Author

snbianco commented May 20, 2024

@dr-rodriguez @havok2063

Trying to write unit tests for this now, and wondering what should be used for the input_file parameter since we don't have an s3 link with a working ASDF file yet. Also, should we be putting in a link to this data at all given that this is a public repo? I read on one of the other pull requests that the Roman test data shouldn't be publicly accessible.

@havok2063
Copy link
Contributor

@snbianco We probably don't need to test a real file. I think we can get away with a fake file for testing, or if there is a public ST S3 file to use for testing? If we turn lines 251-257 into a tiny function, we could test that functionality separately from any asdf stuff. That it returns the correct file or s3 file as input? That would satisfy me for a test.

It looks like there are packages for mocking s3 aws services, like moto. I'm not sure if we need that here or not.

@snbianco snbianco marked this pull request as draft May 20, 2024 17:46
@dr-rodriguez
Copy link

I generally recommend mocking external resources rather than relying on pings to public files. At a glance, it looked like your approach was good. Basically the goal is to confirm the functions were called with the parameters you expected; the exact logic of the s3fs functions is ideally something that package has verified already.

@snbianco snbianco marked this pull request as ready for review May 21, 2024 13:36
Copy link
Contributor

@havok2063 havok2063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@snbianco snbianco merged commit 56c32a2 into spacetelescope:main May 21, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants