Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new SEARCH/QUERY HTTP method to pass arguments in request body for read request #2125

Open
telewak opened this issue Jan 15, 2022 · 28 comments
Labels
http http compliance idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@telewak
Copy link

telewak commented Jan 15, 2022

Thank you for this wonderful and magical api... i would like to send alot of arguments while doing a GET request.
Does PostgREST as an API have a limit on the url length ? Is there possibility to POST a json object to the GET endpoint?

@wolfgangwalther
Copy link
Member

See PostgREST/postgrest-docs#417.

POST to GET is not implemented. Will keep this open to track the QUERY proposal by @steve-chavez in the thread above.

@wolfgangwalther wolfgangwalther added the idea Needs of discussion to become an enhancement, not ready for implementation label Jan 16, 2022
@wolfgangwalther wolfgangwalther changed the title URL Length Implement new QUERY method to pass arguments in request body for read request Jan 16, 2022
@steve-chavez
Copy link
Member

Even the new QUERY http method won't help with cases where DELETE has a big in filter. PATCH would suffer from this too.

Perhaps we should have a POST with a special body + content type(similar to #465) for these cases?

Come to think of it, QUERY would help because it will contain the DELETE/UPDATE inside.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 18, 2022

Even the new QUERY http method won't help with cases where DELETE has a big in filter. PATCH would suffer from this too.

Makes sense.

Perhaps we should have a POST with a special body + content type(similar to #465) for these cases?

Why POST? We could invent a special body format, that would contain our filters. We'd use that for QUERY and DELETE by default - but would allow adding it to PATCH via content-type header. In the PATCH case the original payload would then be included in this body, too. We don't need special headers for QUERY or DELETE, because there is no other body associated with those requests anyway.

Not sure whether it could be used for PUT, too.

Come to think of it, QUERY would help because it will contain the DELETE/UPDATE inside.

I don't understand what you're saying here.

@steve-chavez
Copy link
Member

Come to think of it, QUERY would help because it will contain the DELETE/UPDATE inside.
I don't understand what you're saying here.

The example here:

QUERY /contacts HTTP/1.1
Host: example.org
Content-Type: example/query
Accept: text/csv

select surname, givenname, email limit 10

Hints that we could have a delete where in (1,2,3,4,...) or update set name = 'xxx', status = 'yy' where in (1,2,3,4,..) in the body? Maybe I'm wrong..

@steve-chavez
Copy link
Member

steve-chavez commented Feb 18, 2022

We'd use that for QUERY and DELETE by default -

You mean we would do an HTTP DELETE with a body? I thought that was frowned upon according to this.

Perhaps some proxies don't even pass the body in a DELETE, we'd have to test that.

@wolfgangwalther
Copy link
Member

We'd use that for QUERY and DELETE by default -

You mean we would do an HTTP DELETE with a body? I thought that was frowned upon according to this.

Perhaps some proxies don't even pass the body in a DELETE, we'd have to test that.

I did a quick search for it, but only found other posts that said it was fine. Your link has a much more authoritative source, though...

So DELETE not allowing a body would be the reason why we have to switch to POST. Got it.

So what goes in the body for that POST and what doesn't? Is it only the filters that go in there or the full path? I feel like it's hard to draw a line here - so what I think would be possible:
Allow sending a POST to the root endpoint. This request would contain basically the whole request, including method/action, filters, headers, body etc.

This would even work with http semantics somehow: We are creating a short-lived new resource of type "query". Since a query is not part of any of the exposed URIs, but rather part of the whole api, it makes sense to put it at the root endpoint.


Hints that we could have a delete where in (1,2,3,4,...) or update set name = 'xxx', status = 'yy' where in (1,2,3,4,..) in the body? Maybe I'm wrong..

Having that in the body would not be a problem at all, I think. But I understand the QUERY method to not allow any sort of mutation on the server. From the draft:

This specification defines a new HTTP method, QUERY, as a safe, idempotent request method that can carry request content.

"safe" is defined as:

An HTTP method is safe if it doesn't alter the state of the server. In other words, a method is safe if it leads to a read-only operation.

@steve-chavez
Copy link
Member

Allow sending a POST to the root endpoint. This request would contain basically the whole request, including method/action, filters, headers, body etc.

Not adamantly opposed, but that solution feels like "chucking it all".

Is it only the filters that go in there or the full path? I feel like it's hard to draw a line here

The problem really is the in filter, the only I've seen reported to grow that big(like a thousand values) for both GET and DELETE.

So perhaps we could do something like:

POST /projects?status=eq.active
Content-Type: application/vnd.pgrst.big-in-filter+json

{
  "method": "DELETE"
, "in": [1,2,3,4,5,6]
}

That way we get to reuse path, headers and querystring.

@wolfgangwalther
Copy link
Member

Hm. For such a specific solution that only targets a single filter, you can easily create an RPC - that should be good enough already.

@steve-chavez
Copy link
Member

Yeah, but we kinda miss the "low-code" advantage with that. I thought we could have the above interface so client libraries can abstract the in filter problem.

@steve-chavez
Copy link
Member

A multipart type could be an option to the above.

@wolfgangwalther
Copy link
Member

wolfgangwalther commented Feb 20, 2022

The problem really is the in filter, the only I've seen reported to grow that big(like a thousand values) for both GET and DELETE.

I can see other use-cases, too: Passing a value in a JSON body should allow to work around some of the quoting/escaping issues/challenges we have in the query string.

So perhaps we could do something like:

POST /projects?status=eq.active
Content-Type: application/vnd.pgrst.big-in-filter+json

{
  "method": "DELETE"
, "in": [1,2,3,4,5,6]
}

Only supporting that special cases just seems so... arbitrary.

That way we get to reuse path, headers and querystring.

I can see how that's a positive. What about:

POST /projects?status=eq.active&id=in.refs->my_ids
Content-Type: application/vnd.pgrst.params-in-body+json

{
  "method": "DELETE"
, "body": "... original body ..."
, "refs": {
    "my_ids": [1,2,3,4,5,6]
  }
}

(Note: The "original body" was just meant for PATCH etc. - not in this specific example of DELETE)

The idea would be to somehow allow referencing values in the body in the query string. This way they could be re-used for all kinds of filters, not only for in or so. And the general query structure is still entirely visible from the query string - including all filters.

Some ideas for syntax:

  • variants for each operator, that would use a reference: &id=in-ref.my_ids
  • a meta operator: &id=in.ref.my_ids
  • a special character for access: &id=in.$my_ids or &id=in.@my_ids

Those would not be breaking changes, because they would only be interpreted if the special Content-Type is set. I think I like the last suggestion most...

@steve-chavez
Copy link
Member

steve-chavez commented Feb 20, 2022

The idea would be to somehow allow referencing values in the body in the query string.

Nice! I like that direction.


There's another concern I have. This type of bodies will definitely be big and parsing them in Haskell would lead us to have the same problem as #690. So ideally we want to send the body directly to postgres and avoid processing it. With the application/vnd.pgrst.params-in-body+json, I believe we can work the query to somehow pass the refs->my_ids to the WHERE clause but this would not be possible for the method field since PostgREST would need it in advance..

Perhaps some proxies don't even pass the body in a DELETE, we'd have to test that.
I did a quick search for it, but only found other posts that said it was fine. Your link has a much more authoritative source, though...

I grow dissatisfied with answers with just authoritative/purist points of view and it has come to my attention that Elasticsearch allows doing a GET with a request body.

How about if we support for GET/DELETE with a request body for these exceptional cases? Seems javascript fetch would be an issue(GET doesn't pass a request body) but I'm lead to believe that, most importantly, proxies would work fine.

References:

Edit: Passing the method in the header would be another option.

@wolfgangwalther
Copy link
Member

There's another concern I have. This type of bodies will definitely be big and parsing them in Haskell would lead us to have the same problem as #690. So ideally we want to send the body directly to postgres and avoid processing it. With the application/vnd.pgrst.params-in-body+json, I believe we can work the query to somehow pass the refs->my_ids to the WHERE clause but this would not be possible for the method field since PostgREST would need it in advance..

Fully agree.

I grow dissatisfied with answers with just authoritative/purist points of view and it has come to my attention that Elasticsearch allows doing a GET with a request body.

How about if we support for GET/DELETE with a request body for these exceptional cases? Seems javascript fetch would be an issue(GET doesn't pass a request body) but I'm lead to believe that, most importantly, proxies would work fine.

Do we need to support GET with a request body at all? The idea was to use QUERY for that. As long as DELETE with a body works fine, we can avoid passing the method in the body.

I think I'd be fine with that.


For completeness, there is also a different way, which would be the most "REST-like", I guess:

When passing those parameters in the body, we break the "the same URI describes the same resource" promise - because with a different body, a completely different resource is targeted. This can be avoided by creating temporary resources in two http requests:

  • First a POST request with the to-be-used filter in the body. Something like a temporary "params" resource on a special endpoint.
  • Then run the main request and use the temporary resource (which should have a unique identifier - unique forever) as a filter. Something like an inner join or exists filter, basically.

The second request would then have the unique identifier in the URI and would not break that promise above anymore.

Those temporary resources could be represented as real database objects, e.g. temp tables or functions in the temporary schema. Although I don't have an idea, yet, how that would work with a connection pool.

@steve-chavez
Copy link
Member

steve-chavez commented Feb 25, 2022

Do we need to support GET with a request body at all? The idea was to use QUERY for that.

Another option could be the REPORT method:

A REPORT request is an extensible mechanism for obtaining information
about a resource. Unlike a resource property, which has a single
value, the value of a report can depend on additional information
specified in the REPORT request body and in the REPORT request
headers.

According to this blog post it already works everywhere.

When passing those parameters in the body, we break the "the same URI describes the same resource" promise - because with a different body, a completely different resource is targeted

Looks like the REPORT method addresses that? The resource can depend on the request body.

A REPORT request is an extensible mechanism for obtaining information
about a resource. Unlike a resource property, which has a single
value, the value of a report can depend on additional information
specified in the REPORT request body and in the REPORT request
headers.

https://www.rfc-editor.org/rfc/rfc3253.html#page-25

Other references:

@steve-chavez
Copy link
Member

Branched off the DELETE with a body discussion to #2314

@telewak telewak closed this as completed Jun 14, 2022
@telewak telewak reopened this Jun 14, 2022
@telewak
Copy link
Author

telewak commented Jun 14, 2022

Thanks guys for the good work so far. Any idea when this feature could be implemented? i am waiting for it, passing arguments in request body when GETing or QUERYing for data

@steve-chavez
Copy link
Member

steve-chavez commented Jun 14, 2022

DELETE with body will definitely come in v10. This one would be for the next version.

I think we should go with the SEARCH method as it's older and likely better supported. With that the above proposal would be:

SEARCH /projects?status=eq.active&id=in.refs->my_ids
Content-Type: application/json

{
 "refs": {
    "my_ids": [1,2,3,4,5,6]
  }
}

Or to avoid an special case in our filter operators we could:

SEARCH /projects?status=eq.active&id=_in.my_ids
Content-Type: application/json

{
  "my_ids": [1,2,3,4,5,6]
}

Where every operator with an underscore(_op) gets its value from a json key in the body.


SEARCH with the default application/json type could behave the same as the Bulk Update/Delete we're planning. It would just search by the ids in the payload, in addition to the query filters.

@steve-chavez steve-chavez changed the title Implement new QUERY method to pass arguments in request body for read request Implement new SEARCH/QUERY method to pass arguments in request body for read request Jun 14, 2022
@steve-chavez steve-chavez changed the title Implement new SEARCH/QUERY method to pass arguments in request body for read request Implement new SEARCH/QUERY HTTP method to pass arguments in request body for read request Jun 14, 2022
@steve-chavez
Copy link
Member

steve-chavez commented Jul 1, 2022

The SEARCH method could also allow us to pass parameters that are more bulky in nature, like for PostGIS operators:

SEARCH /projects?range_area=_st_within.area
Content-Type: application/json

{
  "area": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -71.10034391283989,
          42.37385299961788
        ],
        [
          -71.10036939382553,
          42.373756895982865
        ],
        [
          -71.1002916097641,
          42.373745997623224
        ],
        [
          -71.1002641171217,
          42.37384408279195
        ],
        [
          -71.10034391283989,
          42.37385299961788
        ]
      ]
    ]
  }
}

@steve-chavez
Copy link
Member

steve-chavez commented Jul 30, 2022

Right now we can only use our operators against other values, we're missing operating against other identifiers.

Underscore operators could be used to operate on columns besides referencing values on the request body. This would allow doing column to column comparison(asked before) without computed columns:

SEARCH /PredictedGenes?Strand=_eq.+&GeneStart=_gt.GeneEnd

Operating on a body value would still be allowed, body would be a reserved identifier:

SEARCH /PredictedGenes?Strand=_eq.body->strand&GeneStart=_lt.GeneEnd
{ "Strand": "-"}

This would favor PATCH/DELETE as well.


Why POST? We could invent a special body format, that would contain our filters. We'd use that for QUERY and DELETE by default - but would allow adding it to PATCH via content-type header. In the PATCH case the original payload would then be included in this body, too

@wolfgangwalther Just realized that If we implement the set operator for PATCH, we wouldn't need a new content-type for it.


Underscore operators is arbitrary, they could also be dollar operators - $eq, $gt - but this syntax feels more "loaded". Open to other suggestions.


Also, like set, they could potentially take an expression:

SEARCH /users?createdAt=_gt.(body->start::timestamptz,minus,body->lapse)
{"start": "now", "lapse": "3 days"}

SELECT * FROM users WHERE "createdAt" > 'now'::timestamptz - '3 days';

This was referenced Jul 30, 2022
@steve-chavez
Copy link
Member

steve-chavez commented Aug 3, 2022

Another option for underscore operators is to use a jsonpath expression as the operand.

DELETE /tbl?id=_eq.$[*].id

[
 {"id": 1},
 {"id": 2}
]

One good thing is that we'd get to reuse existing pg functionality. Another one that since $ is invalid for the first char of an identifier, we can clearly separate json path values from identifiers(so column to column comparison would still be possible).


One difficulty I've noted with this is that we'd need the schema cache to cast the json to the column type.

select * from test.projects where id = (('{"a": 3}'::json)->>'a');
ERROR:  operator does not exist: integer = text

select * from test.projects where id = (('{"a": 3}'::json)->>'a')::int;
 id | name | client_id
----+------+-----------
  3 | IOS  |         2
(1 row)

It doesn't seem possible to obtain the json with an unknown casting.

Another drawback would be that json path includes brackets [] that need url encoding.

Still seems useful, it might be possible to offer json path depending on a different content-type.

json path does too much imo, some comments on #1883 (comment).

@wolfgangwalther
Copy link
Member

Another one that since $ is invalid for the first char of an identifier, we can clearly separate json path values from identifiers(so column to column comparison would still be possible).

I don't really speak jsonpath, yet. Does an expression always have to start with $? I can't take that away immediately from skimming the docs on it.

@steve-chavez
Copy link
Member

Does an expression always have to start with $?

Yes, otherwise there was a syntax error IIRC. You can also use $var for functions that pass variables.

I no longer think that json path is safe to expose though, made a comment on #1883 (comment)

@chelodegli
Copy link

chelodegli commented Oct 17, 2022

I'd love to see this feature.
I have found aditional information for the new Query method, maybe it's somehow useful:
https://datatracker.ietf.org/doc/draft-ietf-httpbis-safe-method-w-body

@steve-chavez
Copy link
Member

steve-chavez commented Jan 10, 2023

Elaborating on the above. Instead of underscore operators, I think dollar operators would be better:

SEARCH /PredictedGenes?Strand=$eq.$body.strand&GeneStart=$lt.GeneEnd
{ "Strand": "-"}

It looks more differentiable/clearer that it's our own syntax. And instead of having body as a reserved identifier, we can have namespaces(that can be used in other cases #2578 (comment) #915 (comment)) like $body.col1 and also support a shortened version like $b.col1.

@steve-chavez
Copy link
Member

steve-chavez commented Jun 13, 2023

What if we use the our query grammar in the media type parameters. Like:

SEARCH /PredictedGenes
Content-Type: application/vnd.pgrst.search+json; Strand.eq=$Strand; GeneStart.lt=GeneEnd

{ "Strand": "-"}

This would make it easier to implement as the query string grammar is not modified.

Media type parameters are also proposed for PATCH #465 (comment).


Also, maybe later this could be used for solving client-side transactions(#286).

POST / HTTP/1.1
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Type: application/vnd.pgrst.search+json; col.eq=$field;
Content-Disposition: form-data; name="tbl";

{..}

--boundary
Content-Type: application/vnd.pgrst.patch+json; another_col.eq=$another_field;
Content-Disposition: form-data; name="other_tbl";

{...}

Maybe we can have another multipart media type for clarity. Like multipart/vnd.pgrst.transaction.

@steve-chavez
Copy link
Member

steve-chavez commented Jun 29, 2023

On #2310 (comment) Wolfgang, mentions:

We need to differentiate between two types of request bodies here: The request bodies we use currently pass a whole "document" or "entity", where everything in there is the resource. Those should go through the data representation transformation chain as a whole. However, the request bodies passed in a SEARCH request are not that. They use a container format of application/json as a replacement for the size-limited query string. This document should never go through a data representation transformation as a whole. The extracted values (text) should go through it, though. (btw: Why not just use application/x-www-form-urlencoded as the body format for those SEARCH requests? This would make the concept much clearer and be a lot easier for client-libraries to implement: Just build the query string and if it's too long, put it exactly like this in the body..)

Just using application/x-www-form-urlencoded in the body is an interesting idea. Probably the simplest implementation.

SEARCH /projects
Content-Type: application/x-www-form-urlencoded

status=eq.active&id=in.[1,2,3,4,5,6,....]

(Not sure how would this look with PATCH though)

One problem is that then doing #2816 won't be possible, we'd have to parse the body before sending the query. On the contrary if we had:

SEARCH /projects
Content-Type: application/vnd.pgrst.filters+json;status.eq=$status;id.in=$id

{"status": "active", "id": [1,2,3,4,5,6]}

Then we wouldn't have to parse the body at all. We could just hash the URL + the header.


Although the application/x-www-form-urlencoded doesn't preclude the other application/vnd.pgrst.filters+json content type. We could support both.

@steve-chavez
Copy link
Member

Wondering if we should do this. This recent article:

https://bessey.dev/blog/2024/05/24/why-im-over-graphql/#query-parsing

Mentions how sending a long list over the POST body can be abused to cause OOM, and how mitigating this is a pain in GraphQL.

So for us, since we use the URL for our query language, the URL size limit has always served as a built-in defense mechanism.

In a way, this limitation has always been a feature. Perhaps we should document it as such? And try to give a better error message if it happens?

@wolfgangwalther
Copy link
Member

Mentions how sending a long list over the POST body can be abused to cause OOM, and how mitigating this is a pain in GraphQL.

It seems like this is not specific to POST. It's just a very special syntax, which amplifies memory usage way beyond the payload size. I don't see a problem here for us, yet. Especially because we have not a single parser anywhere, which would continue on an error and collect multiple of them. Everything we have so far throws on the first error immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http http compliance idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

4 participants