Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance support for limits (RFC5) (#1856) #1892

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/containers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
contents: read
steps:
- name: Check out the repo
uses: actions/checkout@v3
uses: actions/checkout@master

- name: Set up QEMU
uses: docker/[email protected]
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ jobs:
include:
- python-version: '3.10'
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@master
- uses: actions/setup-python@v5
name: Setup Python ${{ matrix.python-version }}
with:
python-version: ${{ matrix.python-version }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/flake8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ jobs:
flake8_py3:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- uses: actions/checkout@master
- uses: actions/setup-python@v5
name: setup Python
with:
python-version: '3.10'
Expand Down
4 changes: 3 additions & 1 deletion docker/default.config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ server:
cors: true
pretty_print: true
admin: ${PYGEOAPI_SERVER_ADMIN:-false}
limit: 10
limits:
default_items: 10
max_items: 50
# templates: /path/to/templates
map:
url: https://tile.openstreetmap.org/{z}/{x}/{y}.png
Expand Down
133 changes: 85 additions & 48 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,15 @@ For more information related to API design rules (the ``api_rules`` property in
gzip: false # default server config to gzip/compress responses to requests with gzip in the Accept-Encoding header
cors: true # boolean on whether server should support CORS
pretty_print: true # whether JSON responses should be pretty-printed
limit: 10 # server limit on number of items to return

limits: # server limits on number of items to return. This property can also be defined at the resource level to override global server settings
default_items: 50
max_items: 1000
max_distance_x: 25
max_distance_y: 25
max_distance_units: m
on_exceed: throttle # throttle or error (default=throttle)

admin: false # whether to enable the Admin API

# optional configuration to specify a different set of templates for HTML pages. Recommend using absolute paths. Omit this to use the default provided templates
Expand Down Expand Up @@ -254,6 +262,41 @@ default.
.. seealso::
:ref:`plugins` for more information on plugins

Using environment variables
---------------------------

pygeoapi configuration supports using system environment variables, which can be helpful
for deploying into `12 factor <https://12factor.net/>`_ environments for example.

Below is an example of how to integrate system environment variables in pygeoapi.

.. code-block:: yaml

server:
bind:
host: ${MY_HOST}
port: ${MY_PORT}

Multiple environment variables are supported as follows:

.. code-block:: yaml

data: ${MY_HOST}:${MY_PORT}

It is also possible to define a default value for a variable in case it does not exist in
the environment using a syntax like: ``value: ${ENV_VAR:-the default}``

.. code-block:: yaml

server:
bind:
host: ${MY_HOST:-localhost}
port: ${MY_PORT:-5000}
metadata:
identification:
title:
en: This is pygeoapi host ${MY_HOST} and port ${MY_PORT:-5000}, nice to meet you!

Adding links to collections
---------------------------

Expand Down Expand Up @@ -389,53 +432,6 @@ If omitted, no header will be added. Common names for this header are ``API-Vers
Note that pygeoapi already adds a ``X-Powered-By`` header by default that includes the software version number.


Validating the configuration
----------------------------

To ensure your configuration is valid, pygeoapi provides a validation
utility that can be run as follows:

.. code-block:: bash

pygeoapi config validate -c /path/to/my-pygeoapi-config.yml


Using environment variables
---------------------------

pygeoapi configuration supports using system environment variables, which can be helpful
for deploying into `12 factor <https://12factor.net/>`_ environments for example.

Below is an example of how to integrate system environment variables in pygeoapi.

.. code-block:: yaml

server:
bind:
host: ${MY_HOST}
port: ${MY_PORT}

Multiple environment variables are supported as follows:

.. code-block:: yaml

data: ${MY_HOST}:${MY_PORT}

It is also possible to define a default value for a variable in case it does not exist in
the environment using a syntax like: ``value: ${ENV_VAR:-the default}``

.. code-block:: yaml

server:
bind:
host: ${MY_HOST:-localhost}
port: ${MY_PORT:-5000}
metadata:
identification:
title:
en: This is pygeoapi host ${MY_HOST} and port ${MY_PORT:-5000}, nice to meet you!


Hierarchical collections
------------------------

Expand Down Expand Up @@ -507,6 +503,36 @@ Examples:
curl https://example.org/collections/lakes/items # only the name attribute is returned in properties
curl https://example.org/collections/lakes/items/{item_id} # only the name attribute is returned in properties

Limiting data responses
-----------------------

pygeoapi defines a ``limits`` configuration parameter that will allow a user to define default and maximum limits for multiple data types. This parameter is defined at the server level (``server.limits``) with the ability to override at resource level (``resources[*].limits``). An example of this setting is shown below:

.. code-block:: yaml

limits:
default_items: 10 # applies to vector data
max_items: 500 # applies to vector data
max_distance_x: 123 # applies to all datasets
max_distance_y: 456 # applies to all datasets
max_distance_units: m # as per UCUM https://ucum.org/ucum#section-Tables-of-Terminal-Symbols
on_exceed: error # one of error, throttle

The ``limits`` setting is applied as follows:

- can be defined at both the server and resources levels, with resource limits overriding server wide limits settings
- ``on_exceed`` can be set to ``error`` or ``throttle`` (default). If a client specified limit exceeds those set by the server:
- when set to ``error``, an exception is returned
- when set to ``throttle`` the maximum data allowed by the collection/server/provider is returned

Vector data (features, records)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- when a limit not specified by the client, ``limits.default_items`` can be used to set the result set size
- when a limit is specified by the client, the minimum of the ``limit`` parameter and ``limits.max_items`` is calculated to set the result set size

Raster data (coverages, environmental data retrieval)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- when a bbox or spatial subset is specified by the client, ``limits.max_distance_x``, ``limits.max_distance_y`` and ``limits.max_distance_units`` are used to determine whether a request has asked for more data than the collection is configured to provide and respond accordingly (via ``on_exceed``)

Linked Data
-----------
Expand Down Expand Up @@ -638,6 +664,17 @@ deployment flexibility, the path can be specified with string interpolation of e
The template ``tests/data/base.jsonld`` renders the unmodified JSON-LD. For more information on the capacities
of Jinja2 templates, see :ref:`html-templating`.

Validating the configuration
----------------------------

To ensure your configuration is valid, pygeoapi provides a validation
utility that can be run as follows:

.. code-block:: bash

pygeoapi config validate -c /path/to/my-pygeoapi-config.yml


Summary
-------

Expand Down
6 changes: 4 additions & 2 deletions pygeoapi-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#
# Authors: Tom Kralidis <[email protected]>
#
# Copyright (c) 2020 Tom Kralidis
# Copyright (c) 2025 Tom Kralidis
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
Expand Down Expand Up @@ -41,7 +41,9 @@ server:
- fr-CA
# cors: true
pretty_print: true
limit: 10
limits:
default_items: 20
max_items: 50
# templates:
# path: /path/to/Jinja2/templates
# static: /path/to/static/folder # css/js/img
Expand Down
42 changes: 41 additions & 1 deletion pygeoapi/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
Returns content from plugins and sets responses.
"""

from collections import OrderedDict
from collections import ChainMap, OrderedDict
from copy import deepcopy
from datetime import datetime
from functools import partial
Expand Down Expand Up @@ -1609,3 +1609,43 @@ def validate_subset(value: str) -> dict:
subsets[subset_name] = list(map(get_typed_value, values))

return subsets


def evaluate_limit(requested: Union[None, int], server_limits: dict,
collection_limits: dict) -> int:
"""
Helper function to evaluate limit parameter

:param requested: the limit requested by the client
:param server_limits: `dict` of server limits
:param collection_limits: `dict` of collection limits

:returns: `int` of evaluated limit
"""

effective_limits = ChainMap(collection_limits, server_limits)

default = effective_limits.get('default_items', 10)
max_ = effective_limits.get('max_items', 10)
on_exceed = effective_limits.get('on_exceed', 'throttle')

LOGGER.debug(f'Requested limit: {requested}')
LOGGER.debug(f'Default limit: {default}')
LOGGER.debug(f'Maximum limit: {max_}')
LOGGER.debug(f'On exceed: {on_exceed}')

if requested is None:
LOGGER.debug('no limit requested; returning default')
return default

requested2 = get_typed_value(requested)
if not isinstance(requested2, int):
raise ValueError('limit value should be an integer')

if requested2 <= 0:
raise ValueError('limit value should be strictly positive')
elif requested2 > max_ and on_exceed == 'error':
raise RuntimeError('Limit exceeded; throwing errror')
else:
LOGGER.debug('limit requested')
return min(requested2, max_)
20 changes: 18 additions & 2 deletions pygeoapi/api/environmental_data_retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
from shapely.wkt import loads as shapely_loads

from pygeoapi import l10n
from pygeoapi.api import evaluate_limit
from pygeoapi.plugin import load_plugin, PLUGINS
from pygeoapi.provider.base import (
ProviderGenericError, ProviderItemNotFoundError)
Expand Down Expand Up @@ -342,6 +343,21 @@ def get_collection_edr_query(api: API, request: APIRequest,
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', msg)

LOGGER.debug('Processing limit parameter')
if api.config['server'].get('limit') is not None:
msg = ('server.limit is no longer supported! '
'Please use limits at the server or collection '
'level (RFC5)')
LOGGER.warning(msg)
try:
limit = evaluate_limit(request.params.get('limit'),
api.config['server'].get('limits', {}),
collections[dataset].get('limits', {}))
except ValueError as err:
return api.get_exception(
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', str(err))

query_args = dict(
query_type=query_type,
instance=instance,
Expand All @@ -353,8 +369,8 @@ def get_collection_edr_query(api: API, request: APIRequest,
bbox=bbox,
within=within,
within_units=within_units,
limit=int(api.config['server']['limit']),
location_id=location_id,
limit=limit,
location_id=location_id
)

try:
Expand Down
31 changes: 14 additions & 17 deletions pygeoapi/api/itemtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
from pyproj.exceptions import CRSError

from pygeoapi import l10n
from pygeoapi.api import evaluate_limit
from pygeoapi.formatter.base import FormatterSerializationError
from pygeoapi.linked_data import geojson2jsonld
from pygeoapi.plugin import load_plugin, PLUGINS
Expand Down Expand Up @@ -239,33 +240,29 @@ def get_collection_items(
return api.get_exception(
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', msg)
except TypeError as err:
LOGGER.warning(err)
offset = 0
except ValueError:
msg = 'offset value should be an integer'
return api.get_exception(
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', msg)
except TypeError as err:
LOGGER.warning(err)
offset = 0

LOGGER.debug('Processing limit parameter')
if api.config['server'].get('limit') is not None:
msg = ('server.limit is no longer supported! '
'Please use limits at the server or collection '
'level (RFC5)')
LOGGER.warning(msg)
try:
limit = int(request.params.get('limit'))
# TODO: We should do more validation, against the min and max
# allowed by the server configuration
if limit <= 0:
msg = 'limit value should be strictly positive'
return api.get_exception(
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', msg)
except TypeError as err:
LOGGER.warning(err)
limit = int(api.config['server']['limit'])
except ValueError:
msg = 'limit value should be an integer'
limit = evaluate_limit(request.params.get('limit'),
api.config['server'].get('limits', {}),
collections[dataset].get('limits', {}))
except ValueError as err:
return api.get_exception(
HTTPStatus.BAD_REQUEST, headers, request.format,
'InvalidParameterValue', msg)
'InvalidParameterValue', str(err))

resulttype = request.params.get('resulttype') or 'results'

Expand Down
Loading