Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a GET Interface to Inference Would Allow for Better Performance #20

Open
fstakem opened this issue Feb 19, 2024 · 0 comments
Open

Comments

@fstakem
Copy link

fstakem commented Feb 19, 2024

The current specification does not allow for good use of Cache Control i.e., client side caching, which is inefficient in production environments. The specification should add a GET request for inference to allow better use of client side caching with Cache Control. Let me explain better.

If a user is querying a deterministic model the response from the endpoint should be the same each time until the model is retrained, at which time the model should get a new version. (For non deterministic models such as simulation the current interface is fine) The current implementation only has a HTTP POST for querying the model for inference. If a HTTP GET is used with proper Cache Control settings the load on the server can be decreased. Cache control allows the client to cache response and the server to control the cache settings. By having the server control the cache other systems such as experimentation can be used on the server side without worry that the client will get the wrong response. The RFC on Cache Control is probably better at explaining this than I am and is included below.

RFC on HTTP caching: here

Currently different implementations of this specification use a more inefficient server side caching. Although server side caching can reduce the load on the server, the network bandwidth and round trip delay on the POST request are not eliminated. A good production system should utilize both client side and server side caching to have optimal results.

Here is an example of an implementation that uses server side caching: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant