Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to get Spring GraphQL adhere to GraphQL Over Http Spec #783

Closed
JayGhiya opened this issue Aug 28, 2023 · 18 comments
Closed

Not able to get Spring GraphQL adhere to GraphQL Over Http Spec #783

JayGhiya opened this issue Aug 28, 2023 · 18 comments
Labels
for: external-project Needs a change in external project status: invalid An issue that we don't feel is valid

Comments

@JayGhiya
Copy link

JayGhiya commented Aug 28, 2023

As per GraphQL over HTTP spec when using the application/graphql-response+json(https://graphql.github.io/graphql-over-http/draft/#sec-application-graphql-response-json) during a situation when the graphQL response does not contain the data entry then the server must reply with a 4xx or 5xx as appropriate.

Here is the poc : https://github.com/JayGhiya/Spring-Graphql-Experiments where we have a graphql query fetching data from sql db. During fetch if the database is unavailable then the operation itself has failed and we have to return 5xx. So The code uses a custom web interceptor (https://github.com/JayGhiya/Spring-Graphql-Experiments/blob/main/src/main/java/com/graphql/demo/CustomGraphQLWebInterceptor.java) to modify the response to remove data as it holds no significance. Still post modifying the response using webinterceptor the spring graphql http servlet returns 2xx which is wrong.

Post intercepting spring graphql servlet should have made 5xx response as there is no data field. Could anybody point me to a example of using spring graphql http servet to modify http status code as this results in a observability nightmare. As we are using HTTP as the transport protocol we would want to adhere to RFC7231 - HTTP semantics and content.

@JayGhiya JayGhiya changed the title Spring GraphQL not compliant with GraphQL Over Http Spec Not able to get Spring GraphQL adhere to GraphQL Over Http Spec Aug 28, 2023
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Aug 28, 2023
@bclozel
Copy link
Member

bclozel commented Aug 28, 2023

Thanks but your sample project is not using spring graphql but graphql kickstart which is a different project. I'm closing this issue as a result.

@bclozel bclozel closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
@bclozel bclozel added status: invalid An issue that we don't feel is valid for: external-project Needs a change in external project and removed status: waiting-for-triage An issue we've not yet triaged labels Aug 28, 2023
@JayGhiya
Copy link
Author

@bclozel i might have added those dependencies but using only spring graphql . i might have not pushed the latest pom. let me do that in some time. request you to open the issue post that.

@JayGhiya
Copy link
Author

@bclozel i looked at the code the code makes use of only spring graphql. the pom is just old. will update.

@bclozel
Copy link
Member

bclozel commented Aug 28, 2023

The sample is using and registering a "graphql.kickstart.servlet.GraphQLHttpServlet".

@JayGhiya
Copy link
Author

yes that is correct. but the result is same . apologies for inconvenience. i will remove the graphql-java lib code completely and update the readme with result @bclozel :)

@JayGhiya
Copy link
Author

@bclozel https://github.com/JayGhiya/Spring-Graphql-Experiments/blob/main/README.md . It is updated. Right now the project only uses spring-graphql only and readme is updated with steps to reproduce. please have a look and reopen the issue thanks in advance

@rstoyanchev
Copy link
Contributor

I believe this is the same discussion, or at least related to #568 where we decided to hold off on making these changes for now.

@JayGhiya
Copy link
Author

@rstoyanchev @bclozel sure even if due to some reasons the team has decided to hold off on making these changes but why does the custom web interceptor or even the controller advise with graphql exception handler not allow to modify http response codes. in a increasingly cloud native world api gateways/service meshes all of it get affected by this non compliance with http rfc when you have a failure in connecting to downstream services. Is there any other way to modify the http response through a custom config that you are aware of ?

@bclozel
Copy link
Member

bclozel commented Aug 29, 2023

The current draft version of the spec says the following.

This response should have a 4xx or 5xx response status.

{
  "errors": [
    {
      "message": "Could not connect to database",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "books"
      ]
    }
  ]
}

This response should have a 200 response status.

{
  "data": null,
  "errors": [
    {
      "message": "Could not connect to database",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "books"
      ]
    }
  ]
}

As far as I understand the only difference is about when the error happen (before or during the request execution, and this doesn't seem really required by the original GraphQL spec).

In a increasingly cloud native world, HTTP is not the only transport out there and I think that the transport-agnostic nature of GraphQL is one of its strength. How should this behavior be mapped to RSocket, WebSocket, other messaging protocols and other transports? This is why so far we chose to decouple transports and the actual request execution. This allows us to support multiple transports and variants (Servlet-based, Reactive Streams based) without leaking abstractions.

As for the observability angle you've shared, I believe our observability story is quite compelling so far. Relying on HTTP statuses for that would be a mistake since the behavior you're sharing in the spec also depends on the response content-type (you would not get the same behavior if clients request the popular "application/json" media type).

@JayGhiya
Copy link
Author

JayGhiya commented Aug 29, 2023

@bclozel I do understand if data is null it shud be 2xx but as you could see we are intercepting and modifying the response as it does not make sense if the connection to downstream service such as db failed then making the parameter null itself makes no sense. It is a straight server error. we are in agreement that graphql is transport agnostic . What I am saying -> the framework should be flexible to allow the users to modify the response code based on their underlying transport chosen. Why that is not the case. And also you cannot bend the world of service mesh observability / golden signals pattern / api gateways through spring observability . You just can't. Things that do centralised monitoring rely on status codes. What is the way out ? do we have a roadmap where we allow users to modify status codes in cases of real server errors and graphql json only containing errors. There is no partial data to share. it does not make sense. Also leme know if there is a better forum to share and discuss the same. if you have a working group or weekly open source calls on the same. @rstoyanchev your thoughts?

@bclozel
Copy link
Member

bclozel commented Aug 29, 2023

I guess we could consider extending the WebGraphQlResponse to include the response status. At this point I'm still not convinced this is the right call for the project. I don't really understand the purpose of that custom interceptor (in the repro project) since it turns a valid graphql response (according to the spec) into an invalid one.

And also you cannot bend the world of service mesh observability / golden signals pattern / api gateways through spring observability . You just can't. Things that do centralised monitoring rely on status codes.

Sorry I don't understand this last paragraph. Can you share which service mesh or product are you using? How is it enforcing the graphql over http spec?

According to the graphql over http draft spec, these http status changes only apply to the new media type which is far from being the norm right now. We are in touch with several companies using a lot of graphql, monitoring and observability and this point never came up. How are you monitoring graphql services right now in production? Are they only relying on http status codes? How does this work when most error responses are http 200?

@JayGhiya
Copy link
Author

JayGhiya commented Sep 3, 2023

hey @bclozel apologies for delayed response. yea so lot of commercial companies use service meshes like istio , consul etc to do error reporting across enterprise automatically. And they rely on transport protocols. They work through a kubernetes sidecar pattern. same applies for products using api gateway to monitor http requests/responses. This all does not work. Companies that do platform engineering will prefer to observe traffic across enterprise through transport protocols. The spec has to evolve for usecases where entire operation fails and still it is populating 2xx. I might not understand coming from kubernetes any value add it brings. So that is the reason.

@oliverjumpertz
Copy link

Coming here from #568.

According to the graphql over http draft spec, these http status changes only apply to the new media type which is far from being the norm right now. We are in touch with several companies using a lot of graphql, monitoring and observability and this point never came up. How are you monitoring graphql services right now in production? Are they only relying on http status codes? How does this work when most error responses are http 200?

Speaking as a platform engineer for a relatively large-scale system (a few hundred nodes each in multiple Kubernetes clusters, vod and live streaming space):

We rely on service meshes (Istio) and status codes to monitor our traffic on a base level, which is why we adopted the 2.0 draft of the GraphQL over HTTP spec as soon as possible because it massively helps in outlier detection. We suffered quite heavily (multiple times) in the past because we didn't identify errors correctly and had to (partially) implement custom parsers to extract information from GraphQL responses (after a wrongly configured rate limiter that rendered a service useless for a few hours, which we didn't notice because status code-wise all seemed good).

So, in summary: The new spec version really helps with that.

@bclozel
Copy link
Member

bclozel commented Jan 23, 2025

@oliverjumpertz that's interesting. Could you elaborate?
How is the spec helping in the case of #783 (comment) ? As far as I understand, applying that part of the spec won't help in case of errors as long as the "data" part of the response is present, even if it's empty. Could you explain what those custom parsers were doing? If they were looking at the "errors" map, I'm afraid this part of the spec won't help.

I get that as a platform engineer you'd want to know how things are going in your deployment. We're shipping a complete observability instrumentation and I'd think that metrics would fulfill that need. While HTTP status is widely supported at the platform level, I feel like this particular case is going to miss most of the issues in production. For example, a data fetcher failing consistently while the rest works fine would not be shown in the HTTP response status.

@frittentheke
Copy link

frittentheke commented Jan 23, 2025

@bclozel would you potentially reopen this one as it appears to be a more active and fruitful discussion as in my newly opened #1113

@oliverjumpertz
Copy link

Hey @bclozel , I'd absolutely love to contribute more information because after thinking about it for a little longer, I might have identified a blocker in communication, which might cause some misunderstandings. :)

A bit about our architecture:

Our main system runs on AWS EKS with a few hundred nodes. We use Apollo Federation with a customized version of Apollo Router as a form of API Gateway behind an Istio Gateway (which in return sits behind CloudFront).

In case the technology is not known to you (which is perfectly fine), let me quickly tell you about the functionality (here is also some documentation):

GraphQL is usually not well-suited for inter-service communication. In case of Apollo Router, though, individual services (called subgraphs) all supply their own dedicated GraphQL API. All schemas of these subgraphs are processed into a single API schema at build time, which then makes up the so-called supergraph at runtime. In our case, for example, most subgraphs run on Spring.

The supergraph schema is what the Apollo Router exposes to the outside. When a client requests data, the Router query plans the execution path of that query and then queries one to multiple individual subgraphs through GraphQL over HTTP to create a single unified response.

This is where support for the 2.0 version of the GraphQL over HTTP spec, and especially the application/graphql-response+json comes into play.

For us, it's way easier to monitor HTTP traffic through Istio and create dashboards and alerts based on that data. While each individual team can, of course, use the great Spring ecosystem to get their own observability.

At least in our organization, though, we have learned that trust is good but control is better. So, it's always good to have a standard way that always works, and give teams the freedom to additionally employ their own solutions (like from the Spring ecosystem).

Now, regarding why exactly the spec helps us, and to maybe reiterate what we'd imagine:

To my understanding, right now, it requires custom solutions to remap error responses (in the usual 4xx to 5xx range) to actual HTTP error responses because spec-compliantly, GraphQL errors are returned if something fails (status code 200, response body according to the spec).

Many teams are usually not that well-aware of specs and mostly trust the frameworks they use, which then leads to issues like the ones we have faced, where team monitoring failed, and the platform could not pick up because we hadn't analyzed response bodies, yet.

It would be great to have a (maybe configurable?) way to activate the application/graphql-response+json way and allow "real" HTTP error responses to be returned by a service, while taking care of setting body and status code accordingly.

Here is an example that one of my colleagues has created in Kotlin:

package <redacted>

import com.netflix.graphql.types.errors.ErrorType
import org.springframework.graphql.server.WebGraphQlHandler
import org.springframework.graphql.server.WebGraphQlResponse
import org.springframework.graphql.server.webflux.GraphQlHttpHandler
import org.springframework.http.HttpHeaders
import org.springframework.http.HttpStatus
import org.springframework.http.MediaType
import org.springframework.web.reactive.function.server.ServerRequest
import org.springframework.web.reactive.function.server.ServerResponse
import reactor.core.publisher.Mono

class CustomGraphQlHttpHandler(graphQlHandler: WebGraphQlHandler) : GraphQlHttpHandler(graphQlHandler) {

    override fun prepareResponse(request: ServerRequest, response: WebGraphQlResponse): Mono<ServerResponse> {
        val builder = ServerResponse.status(getStatusFromGraphQlResponse(response))
        builder.headers { headers: HttpHeaders -> headers.putAll(response.responseHeaders) }
        builder.contentType(MediaType.APPLICATION_GRAPHQL_RESPONSE)
        return builder.bodyValue(encodeResponseIfNecessary(response))
    }

    private fun getStatusFromGraphQlResponse(response: WebGraphQlResponse): HttpStatus {
        val error = response.errors.firstOrNull()?.errorType as? ErrorType
        return when (error) {
            ErrorType.BAD_REQUEST -> HttpStatus.BAD_REQUEST
            ErrorType.INTERNAL -> HttpStatus.INTERNAL_SERVER_ERROR
            ErrorType.UNAVAILABLE -> HttpStatus.SERVICE_UNAVAILABLE
            else -> HttpStatus.OK
        }
    }
}

I think in the end, it's not about creating a default built-in way because that's usually pretty difficult. I think it's about giving more flexibility.
While the custom solution above works and we could well distribute that in our organization, having something like that provided by Spring, with a probably better and more thought-out API, could potentially help many (platform and dev) teams.

Does this give more context, and does this help? :)

@bclozel
Copy link
Member

bclozel commented Jan 23, 2025

Thanks for the feedback!

I'll discuss this further with the team. By the way, I'm not sure the http handler you've shared here is spec compliant. This sets the HTTP response status using the first error it's finding. Is it more about taking control over the HTTP status even if the response is not spec compliant in the end?

@oliverjumpertz
Copy link

Yes, it is not fully spec-compliant, that is correct. It's only an example a colleague has created in a few minutes. But I think it showcases how difficult it is to correctly adhere to the spec. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project Needs a change in external project status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

6 participants