(PXP-5529): implement GET /objects endpoint #15

johnfrancismccann · 2020-12-16T15:06:56Z

Jira Ticket: PXP-5529

Implement GET /objects endpoint with filtering. ~~Not ready to be merged yet.~~ Please see get_objects docstring for more information.

~~The PR was becoming a bit long, so I was hoping to get feedback on the filtering interface before adding tests.~~

New Features

Implement GET /objects endpoint with filtering

Breaking Changes

Bug Fixes

Improvements

Dependency updates

use parsimonious = "0.8.1"

Deployment changes

src/mds/objects.py

pyproject.toml

src/mds/objects.py

paulineribeyre · 2021-01-04T17:10:05Z

src/mds/query.py

+        - GET /objects?filter=(message,:eq,"morning") returns "2"
+        - GET /objects?filter=(counts.1,:eq,3) returns "3"


the design doc doesn't use a filter param, and the syntax is more simple - we might want to accept {record_or_metadata}.{arbitrary_key}={value} as param like the design doc describes, and build the filter object based on that?

This pull request still maintains the {arbitrary_key}={value} param syntax for the GET /metadata endpoint. I also don't think it would be too bad to support that syntax for the GET /objects endpoint in addition to the filter=(counts.1,:eq,3) style.

src/mds/query.py

paulineribeyre · 2021-01-04T17:12:55Z

src/mds/query.py

+            #  operator
+            #  (e.g. GET objects?filter=(_resource_paths,:any,(,:like,"/programs/%")) )
+            return operators[operator_name]["sql_clause"](
+                json_object, other["op"], other["val"]


if the filter stays user-provided, we might want to add a syntax validation step somewhere

looks like parsimonious does that for us

src/mds/objects.py

src/mds/query.py

src/mds/objects.py

paulineribeyre

woops, didn't mean to approve

themarcelor · 2021-03-18T23:14:49Z

erm... just double-checking.
But is that backwards compatible with the existing /mds/metadata endpoint?
Just double-check due to the the current logic in: https://github.com/uc-cdis/gen3-qa/blob/master/suites/apis/metadataIngestionTest.js

johnfrancismccann · 2021-03-19T16:52:22Z

Hey @themarcelor this is backwards-compatible with the existing /mds/metadata endpoint in the sense that it leaves that endpoint in place and doesn't modify it's behavior at all. However, the new /mds/objects endpoint has it's own filtering interface that is completely different to that used by /mds/metadata (i.e. using the same filter parameters for both endpoints won't work).

williamhaley · 2021-03-24T20:23:39Z

src/mds/objects.py

+    request: Request,
+    data: bool = Query(
+        True,
+        description="Switch to returning a list of GUIDs (false), "


Suggested change

description="Switch to returning a list of GUIDs (false), "

description="Switch to return a list of GUIDs (false), "

At least, I think that's right? 😅 🔤

src/mds/objects.py

williamhaley · 2021-03-24T20:40:35Z

src/mds/objects.py

@@ -226,6 +226,161 @@ async def create_object_for_id(
    return JSONResponse(response, HTTP_201_CREATED)


+@mod.get("/objects")
+async def get_objects(


If I understand it from the design doc linked in PXP-5529, this is sort of moving us towards the idea of rebranding MDS as a generalized "object management" service, is that right? (Just want to make sure I'm getting the context)

Yeah that’s right. For the GET /objects endpoint in particular, note that Indexd records corresponding with metadata objects are returned in the response. I think the idea with the Object Management Service is to bring together various components of Gen3 (MDS, Fence, Indexd, SSJDispatcher, Indexs3client) to support a submission flow in which PFB files can be uploaded to the data lake (which consists of a single s3 bucket at the moment) by running a single gen3-client command.

During submission, both a Metadata object and an Indexd record pair are created and populated with info by an Indexs3client job, the Indexd record with calculated hashes, size, and urls info and the metadata object with _bucket, _filename, _file_extension, and _upload_status.

Gotcha. That makes a lot of sense to me. Thanks!

src/mds/objects.py

williamhaley · 2021-03-25T14:16:02Z

src/mds/query.py

+
+        #  json_value and below was taken from
+        #  https://github.com/erikrose/parsimonious/pull/23/files
+        grammar = Grammar(


I'd think parsing the grammar might be expensive. Can we do this once on app init and maintain a singleton grammar somewhere? Then just call grammar.parse() here?

I should say compiling above rather than parsing. I think the Grammar constructor would be re-compiling the definition every time we call parse_filter https://github.com/erikrose/parsimonious/blob/master/parsimonious/grammar.py#L68

Really nice catch!

williamhaley

lgtm! 👍

I think socializing/teaching the query/filtering language to others will be important to make sure people know it exists and we can use that consistently in other places where needed in the future.

paulineribeyre · 2021-05-27T13:30:15Z

src/mds/objects.py

+    filter: str = Query(
+        "",
+        description="The filter(s) that will be applied to the "
+        "result (more detail in the docstring).",


Suggested change

"result (more detail in the docstring).",

"result (more detail in the endpoint description).",

this is super nitpicky but "docstring" doesn't make sense in the swagger doc

paulineribeyre · 2021-05-27T13:32:14Z

src/mds/objects.py

 ) -> JSONResponse:
    """
-    XXX comments
+    Returns a list of objects and their corresponding Indexd records (please
+    see URL query documentation for more info on which objects get returned).


when i'm reading the swagger doc for this endpoint, where is the "URL query documentation"?

paulineribeyre · 2021-05-27T13:38:10Z

src/mds/objects.py

+    Returns a list of objects and their corresponding Indexd records (please
+    see URL query documentation for more info on which objects get returned).
+
+    The filtering functionality was primarily driven by the requirement that a


the way this doc is written is aimed at a dev reading the code, not at a user reading the swagger API docs. i'm referring to things like:

"primarily driven by the requirement that a user be able"

"how do we design a filtering interface that allows the user to"

"that's what Postgres uses"

This information is valuable but IMO ideally docstrings would be written for API users (describe how to use the filter), since the docs are generated from docstrings automatically, and technical details would be somewhere else. Not sure where, maybe at the top of the file?

paulineribeyre · 2021-05-27T13:42:46Z

src/mds/objects.py

-            guid: {"record": records[guid] if guid in records else {}, "metadata": o}
-            for guid, o in metadata_objects.items()
+            "items": [
+                {"record": records[guid] if guid in records else {}, "metadata": o}


Suggested change

{"record": records[guid] if guid in records else {}, "metadata": o}

{"record": records.get(guid, {}), "metadata": o}

paulineribeyre · 2021-05-27T14:05:19Z

src/mds/query.py

+    except (IncompleteParseError, ParseError, VisitationError):
+        raise HTTPException(
+            HTTP_400_BAD_REQUEST, f"filter URL query param syntax is invalid"


maybe we can provide more details to the user? do the IncompleteParseError, ParseError, VisitationError return useful error messages?
if we don't want to return them to the user for some reason, i think we should at least log them to help us debug (maybe just traceback.print_exc())

paulineribeyre · 2021-05-27T14:14:17Z

tests/test_objects.py

+        assert len(resp_json["items"]) == 1
+        assert resp_json["items"][0]["metadata"]["message"] == "morning"
+    finally:
+        tear_down_metadata_objects(client)


maybe you could use a fixture to avoid adding a try/finally to each test?
maybe something like this, without "autouse"?

github-actions · 2021-10-11T03:28:20Z

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

johnfrancismccann added 15 commits November 18, 2020 22:05

feat(endpoint): add GET /objects

2d7666a

feat(GET /objects): use search_metadata_helper

fe95adf

feat(GET /objects): filter resource paths LIKE val

da7f330

feat(GET /objects): map filter operators w/ dict

d639b2b

feat(GET /objects): filter all of array where op

6a6093f

feat(GET /objects): POST /bulk/documents in indexd

683ed5d

feat(GET /objects): filter w/ () syntax (x,:eq,42)

05c879c

feat(GET /objects): parse filter with parsimonious

bf3a6f6

feat(GET /objects): DRY up sqlalchemy clauses

06cc2f0

feat(GET /objects): remove unnecessary comments

a8d24e5

feat(GET /objects): support boolean SQL clauses

f86f036

feat(GET /objects): use filter_dict

4e0191d

feat(GET /objects): account for empty filter param

f48a015

feat(GET /objects): remove commented code

e774189

feat(GET /objects): document examples

a87bfaa

github-actions bot added the test-apis-metadataIngestionTest label Dec 16, 2020

Apply automatic documentation changes

a75914d

johnfrancismccann requested review from Avantol13, fantix, paulineribeyre and mpingram December 16, 2020 15:14

paulineribeyre approved these changes Jan 4, 2021

View reviewed changes

paulineribeyre requested changes Jan 4, 2021

View reviewed changes

johnfrancismccann added 7 commits March 8, 2021 13:26

feat(GET /objects): use data param

71506e6

feat(GET /objects): return items list in response

79e2c7d

feat(GET /objects): restore search_metadata

80a12d5

feat(GET /objects): clean up parsing grammar

879e64a

test(GET /objects): add filter tests

1b00631

test(GET /objects): use resp_json variable

9a350c4

docs(GET /objects): add function docstrings

7c90096

johnfrancismccann and others added 7 commits March 16, 2021 04:51

Apply automatic documentation changes

aec9b9e

feat(GET /objects): change limit from 2000 to 1024

167d39a

Apply automatic documentation changes

60b9be5

docs(GET /objects): add docstrings to tests

4ff094d

test(GET /objects): add :lte and :gt tests

3e574f8

feat(GET /objects): use search_metadata_objects

5d7deba

Apply automatic documentation changes

f688cd3

williamhaley reviewed Mar 24, 2021

View reviewed changes

src/mds/objects.py Outdated Show resolved Hide resolved

williamhaley reviewed Mar 24, 2021

View reviewed changes

src/mds/objects.py Outdated Show resolved Hide resolved

williamhaley reviewed Mar 25, 2021

View reviewed changes

johnfrancismccann and others added 3 commits April 1, 2021 08:41

Merge branch 'master' into feat/get-objects

df595e8

feat(GET /objects): compile filter grammar once

e35d9bf

Apply automatic documentation changes

d313286

williamhaley previously approved these changes Apr 2, 2021

View reviewed changes

paulineribeyre reviewed May 27, 2021

View reviewed changes

johnfrancismccann added 5 commits October 8, 2021 11:50

Merge branch 'master' into feat/get-objects

e82e87c

docs(GET /objects): move dev info from swagger

23cb565

style(GET /objects): use get on records dict

8f91573

test(GET /objects): use fixture to set up objects

7ed158e

feat(GET /objects): add detail in filter error res

8636930

Apply automatic documentation changes

82cb87e

mfshao dismissed williamhaley’s stale review via 82cb87e January 26, 2022 22:40

Avantol13 removed request for fantix, mpingram and Avantol13 October 3, 2022 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(PXP-5529): implement GET /objects endpoint #15

(PXP-5529): implement GET /objects endpoint #15

johnfrancismccann commented Dec 16, 2020 •

edited

Loading

paulineribeyre Jan 4, 2021

johnfrancismccann Mar 8, 2021

paulineribeyre Jan 4, 2021

paulineribeyre May 27, 2021

paulineribeyre left a comment

themarcelor commented Mar 18, 2021

johnfrancismccann commented Mar 19, 2021

williamhaley Mar 24, 2021

williamhaley Mar 24, 2021

johnfrancismccann Apr 2, 2021

williamhaley Apr 2, 2021

williamhaley Mar 25, 2021

williamhaley Mar 25, 2021 •

edited

Loading

johnfrancismccann Apr 2, 2021

williamhaley left a comment

paulineribeyre May 27, 2021

paulineribeyre May 27, 2021

paulineribeyre May 27, 2021

paulineribeyre May 27, 2021

paulineribeyre May 27, 2021

paulineribeyre May 27, 2021

github-actions bot commented Oct 11, 2021

		- GET /objects?filter=(message,:eq,"morning") returns "2"
		- GET /objects?filter=(counts.1,:eq,3) returns "3"

	description="Switch to returning a list of GUIDs (false), "
	description="Switch to return a list of GUIDs (false), "

	"result (more detail in the docstring).",
	"result (more detail in the endpoint description).",

	{"record": records[guid] if guid in records else {}, "metadata": o}
	{"record": records.get(guid, {}), "metadata": o}

(PXP-5529): implement GET /objects endpoint #15

Are you sure you want to change the base?

(PXP-5529): implement GET /objects endpoint #15

Conversation

johnfrancismccann commented Dec 16, 2020 • edited Loading

New Features

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulineribeyre left a comment

Choose a reason for hiding this comment

themarcelor commented Mar 18, 2021

johnfrancismccann commented Mar 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamhaley Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamhaley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 11, 2021

johnfrancismccann commented Dec 16, 2020 •

edited

Loading

williamhaley Mar 25, 2021 •

edited

Loading