-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PXP-5529): implement GET /objects endpoint #15
base: master
Are you sure you want to change the base?
Conversation
src/mds/query.py
Outdated
- GET /objects?filter=(message,:eq,"morning") returns "2" | ||
- GET /objects?filter=(counts.1,:eq,3) returns "3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the design doc doesn't use a filter
param, and the syntax is more simple - we might want to accept {record_or_metadata}.{arbitrary_key}={value}
as param like the design doc describes, and build the filter object based on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request still maintains the {arbitrary_key}={value}
param syntax for the GET /metadata
endpoint. I also don't think it would be too bad to support that syntax for the GET /objects
endpoint in addition to the filter=(counts.1,:eq,3)
style.
src/mds/query.py
Outdated
# operator | ||
# (e.g. GET objects?filter=(_resource_paths,:any,(,:like,"/programs/%")) ) | ||
return operators[operator_name]["sql_clause"]( | ||
json_object, other["op"], other["val"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the filter stays user-provided, we might want to add a syntax validation step somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like parsimonious
does that for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, didn't mean to approve
erm... just double-checking. |
Hey @themarcelor this is backwards-compatible with the existing |
src/mds/objects.py
Outdated
request: Request, | ||
data: bool = Query( | ||
True, | ||
description="Switch to returning a list of GUIDs (false), " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description="Switch to returning a list of GUIDs (false), " | |
description="Switch to return a list of GUIDs (false), " |
At least, I think that's right? 😅 🔤
@@ -226,6 +226,161 @@ async def create_object_for_id( | |||
return JSONResponse(response, HTTP_201_CREATED) | |||
|
|||
|
|||
@mod.get("/objects") | |||
async def get_objects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it from the design doc linked in PXP-5529, this is sort of moving us towards the idea of rebranding MDS as a generalized "object management" service, is that right? (Just want to make sure I'm getting the context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that’s right. For the GET /objects
endpoint in particular, note that Indexd records corresponding with metadata objects are returned in the response. I think the idea with the Object Management Service is to bring together various components of Gen3 (MDS, Fence, Indexd, SSJDispatcher, Indexs3client) to support a submission flow in which PFB files can be uploaded to the data lake (which consists of a single s3 bucket at the moment) by running a single gen3-client command.
During submission, both a Metadata object and an Indexd record pair are created and populated with info by an Indexs3client job, the Indexd record with calculated hashes
, size
, and urls
info and the metadata object with _bucket
, _filename
, _file_extension
, and _upload_status
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. That makes a lot of sense to me. Thanks!
src/mds/query.py
Outdated
|
||
# json_value and below was taken from | ||
# https://github.com/erikrose/parsimonious/pull/23/files | ||
grammar = Grammar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think parsing the grammar might be expensive. Can we do this once on app init and maintain a singleton grammar
somewhere? Then just call grammar.parse()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should say compiling
above rather than parsing
. I think the Grammar
constructor would be re-compiling the definition every time we call parse_filter
https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! 👍
I think socializing/teaching the query/filtering language to others will be important to make sure people know it exists and we can use that consistently in other places where needed in the future.
src/mds/objects.py
Outdated
filter: str = Query( | ||
"", | ||
description="The filter(s) that will be applied to the " | ||
"result (more detail in the docstring).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"result (more detail in the docstring).", | |
"result (more detail in the endpoint description).", |
this is super nitpicky but "docstring" doesn't make sense in the swagger doc
src/mds/objects.py
Outdated
) -> JSONResponse: | ||
""" | ||
XXX comments | ||
Returns a list of objects and their corresponding Indexd records (please | ||
see URL query documentation for more info on which objects get returned). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when i'm reading the swagger doc for this endpoint, where is the "URL query documentation"?
src/mds/objects.py
Outdated
Returns a list of objects and their corresponding Indexd records (please | ||
see URL query documentation for more info on which objects get returned). | ||
|
||
The filtering functionality was primarily driven by the requirement that a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way this doc is written is aimed at a dev reading the code, not at a user reading the swagger API docs. i'm referring to things like:
- "primarily driven by the requirement that a user be able"
- "how do we design a filtering interface that allows the user to"
- "that's what Postgres uses"
This information is valuable but IMO ideally docstrings would be written for API users (describe how to use the filter), since the docs are generated from docstrings automatically, and technical details would be somewhere else. Not sure where, maybe at the top of the file?
src/mds/objects.py
Outdated
guid: {"record": records[guid] if guid in records else {}, "metadata": o} | ||
for guid, o in metadata_objects.items() | ||
"items": [ | ||
{"record": records[guid] if guid in records else {}, "metadata": o} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"record": records[guid] if guid in records else {}, "metadata": o} | |
{"record": records.get(guid, {}), "metadata": o} |
src/mds/query.py
Outdated
except (IncompleteParseError, ParseError, VisitationError): | ||
raise HTTPException( | ||
HTTP_400_BAD_REQUEST, f"filter URL query param syntax is invalid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can provide more details to the user? do the IncompleteParseError, ParseError, VisitationError
return useful error messages?
if we don't want to return them to the user for some reason, i think we should at least log them to help us debug (maybe just traceback.print_exc()
)
tests/test_objects.py
Outdated
assert len(resp_json["items"]) == 1 | ||
assert resp_json["items"][0]["metadata"]["message"] == "morning" | ||
finally: | ||
tear_down_metadata_objects(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you could use a fixture to avoid adding a try/finally to each test?
maybe something like this, without "autouse"?
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
Jira Ticket: PXP-5529
Implement
GET /objects
endpoint with filtering.Not ready to be merged yet.Please seeget_objects
docstring for more information.The PR was becoming a bit long, so I was hoping to get feedback on the filtering interface before adding tests.New Features
GET /objects
endpoint with filteringBreaking Changes
Bug Fixes
Improvements
Dependency updates
parsimonious = "0.8.1"
Deployment changes