Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update API object when configuration changes #1636

Open
tomkralidis opened this issue Apr 20, 2024 · 9 comments
Open

update API object when configuration changes #1636

tomkralidis opened this issue Apr 20, 2024 · 9 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@tomkralidis
Copy link
Member

Description
Ensure that changes to the configuration are visible without requiring a service restart.

Steps to Reproduce

  • use the admin api to POST/PUT/PATCH/DELETE a resource
  • inspect updates

Expected behavior
pygeoapi always provides service based on the latest configuration changes

Screenshots/Tracebacks

Environment

  • OS: all
  • Python version: 3.10
  • pygeoapi version: mastet

Additional context
cc @totycro

@tomkralidis tomkralidis added the bug Something isn't working label Apr 20, 2024
@tomkralidis tomkralidis added this to the 0.17.0 milestone Apr 20, 2024
@tomkralidis tomkralidis self-assigned this Apr 20, 2024
@totycro
Copy link
Contributor

totycro commented Apr 24, 2024

Some considerations on this:

The config is read in the global scope in e.g. flask_app.py, so every worker processes has its own copy.

If there's only one worker process (with possibly threads), it would be possible to change the config in place by doing something like self.config.update(data) when receiving update requests. However the docker entrypoint starts gunicorn with 4 processes, so only one of them would effectively receive the update in this case.

To update all workers, a graceful restart is necessary and in gunicorn can be done by using kill -HUP masterpid . To get the masterpid, we can run gunicorn with --pid some-file-path.pid to make it write its pid.
This approach seems very feasible, but relies on the pid file setup. Note that other servers like uWSGI also support reloading via the HUP signal, so it's probably a reasonable to assume support for this. It's however not supported with the dev flask server.

Other possibilities are to read the config on reach request, which would be really slow. We could also cache it for e.g. 1 minute in each worker or even read it from some hot cache like redis (probably overkill).

@webb-ben
Copy link
Member

webb-ben commented Apr 29, 2024

You can use the gunicon reload function to perform this as described here: https://docs.pygeoapi.io/en/latest/admin-api.html#pygeoapi-hot-reloading-in-gunicorn. The docker example for pygeoapi admin implements this strategy in the entrypoint, as does the entrypoint in wis2box-api.

There is a security concern for users who do not want to update their configuration which is why it is not in the default Docker image of pygeoapi. If it is their wish to not update the configuration, i.e. the Admin API is not enabled, I do not think pygeoapi should hot-reload changes made to the configuration.

I am partial to some solution that exists below the (Flask, Starlette, Django) framework to avoid all these potential deployment variations.

@matthesrieke
Copy link

Maybe it could be an option to consider a database-driven configuration (at least for the resources)? I described something similar a while ago in a feature request (#1351). Restarting/hot-reloading seems like a workaround to me. I do not see an issue in performance when introducing a lightweight database. This would solve the thread issues as well.

@webb-ben
Copy link
Member

Another note, anytime the configuration gets updated, the Open API document must also be recreated and read into memory. Considerations need to be taken for both PYGEOAPI_CONFIG and PYGEOAPI_OPENAPI.

@tomkralidis tomkralidis modified the milestones: 0.17.0, 0.18.0 Jul 4, 2024
@tomkralidis tomkralidis modified the milestones: 0.18.0, 0.19.0 Aug 22, 2024
Copy link

This Issue has been inactive for 90 days. As per RFC4, in order to manage maintenance burden, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale Issue marked stale by stale-bot label Nov 24, 2024
@webb-ben
Copy link
Member

@tomkralidis please confirm this is still ongoing

@webb-ben webb-ben removed the stale Issue marked stale by stale-bot label Nov 25, 2024
@matthesrieke
Copy link

There is a related PR. I did not test it yet but from looking at the changes it seems to not update the OpenAPI (I might be wrong, @Alex-NRCan).

@Alex-NRCan
Copy link
Contributor

Alex-NRCan commented Nov 26, 2024

Thanks for hooking me into this issue @matthesrieke .

I've read the comments in this thread.

What I could add on to the discussion is that my PR:

  • Extends the API class to easily tweak its behavior and how it loads/reloads its collections. You simply override 2 methods and you're good to go:

    • on_load_resources_check in which you implement a check to see if the resources should be hot-reloaded and you return True when they must be. You can implement the check however you want it in your override based on your environment/context. This check is automatically called on every collections call;
    • on_load_resources in which you do the actual fetching of the resources (e.g. from an SQL database) and essentially build the resources dictionary which you return in your implementation of the method. The mother class (the API) automatically will take care of updating the actual self.config['resources'] object.
  • Supports multiple instances of pygeoapi workers (supports gunicorn (or others) load-balancing patterns) as each worker performs its own checks-and-reloads with its in-memory implementation. We could use help testing this, but seems to be working fine for us right now.

  • Is non-breaking at all with the native behavior of pygeoapi. When the methods aren't overridden, pygeoapi simply acts like regular.

  • Seems to provide an implementation that aligns with issue Module for resource management / dynamic resources #1351, indeed.

As for the OpenAPI spec, that needed to be refreshed on every collection that was added/removed on the fly, yes, that was a tricky one. In our case, we ended up creating a 'generic' openapi spec yaml. So, instead of having 1 "paragraph" per collection, we have 1 generic paragraph with the collectionId in the query path. It looks like this:
image

*Note: It does have the caveat that we had to consider a user error message handling when one tries to hit a /coverage endpoint on a feature type collection for example. We thought about this and, for our case, we still prefer to have a generic OpenAPI spec, for these reasons:

  • We have so many collections, that we consider it's better like this for us, it saves us from having a big Swagger UI to scroll;
  • We still wanted to fool-proof our endpoints (Swagger UI or not), as anybody can try any sort of endpoint manually by typing it in any URL (for example: http://url/some_feature_collection/covage (which wasn't planned to be called, but can be called));
  • We don't have to bother with regenerating the spec by running the pygeoapi openapi generate ${PYGEOAPI_CONFIG} --output-file ${PYGEOAPI_OPENAPI} command when deploying anymore (the line in the docker\entrypoint.sh file is commented out for us).

@matthesrieke
Copy link

matthesrieke commented Dec 4, 2024

@Alex-NRCan really interesting, and reasonable decisions on the OpenAPI part. I generally support the approach of a generic document (as you said, loading times, complexity, ...), but am wondering if there are use cases where an explicit OpenAPI with all collections is required. @tomkralidis @webb-ben thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants