You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have kopf 1.37.1 running a watch stream on a single custom resource cluster wide. Eventually we received a 429 too many requests error from the kube-apiserver. The kopf watch did not restart after this error. It shutdown and continued running the kopf liveness and readiness probe locations.
Based on the logs it doesn't look too many requests at all.
We had the following updates to the CR list.
Line 813: 2024-02-26T06:02:22.849697098Z
Line 1256: 2024-02-26T07:02:27.741847287Z
Line 1383: 2024-02-26T07:18:02.638854440Z
Line 1706: 2024-02-26T08:02:29.404285926Z
Line 2148: 2024-02-26T09:02:30.995018345Z
Line 2583: 2024-02-26T10:02:32.614030753Z
Line 3059: 2024-02-26T11:02:36.687348782Z
Line 3249: 2024-02-26T11:27:30.696478118Z
Line 3479: 2024-02-26T11:54:46.913868429Z
Line 3958: 2024-02-26T12:54:51.819996162Z
Then we hit the following error from kopf.
2024-02-26T13:43:38.870237767Z [2024-02-26 13:43:38,870] kopf._cogs.clients.w [DEBUG ] Stopping the watch-stream for applications.v1alpha1.application.isf.ibm.com cluster-wide.
2024-02-26T13:43:38.872325055Z [2024-02-26 13:43:38,870] kopf._core.reactor.o [ERROR ] Watcher for applications.v1alpha1.application.isf.ibm.com@none has failed: (None, None)
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
2024-02-26T13:43:38.872325055Z response.raise_for_status()
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
2024-02-26T13:43:38.872325055Z raise ClientResponseError(
2024-02-26T13:43:38.872325055Z aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url=URL('https://172.30.0.1:443/apis/application.isf.ibm.com/v1alpha1/applications?watch=true&resourceVersion=9409415&timeoutSeconds=600')
2024-02-26T13:43:38.872325055Z
2024-02-26T13:43:38.872325055Z The above exception was the direct cause of the following exception:
2024-02-26T13:43:38.872325055Z
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 96, in guard
2024-02-26T13:43:38.872325055Z await coro
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
2024-02-26T13:43:38.872325055Z async for raw_event in stream:
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
2024-02-26T13:43:38.872325055Z async for raw_event in stream:
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
2024-02-26T13:43:38.872325055Z async for raw_input in stream:
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
2024-02-26T13:43:38.872325055Z async for raw_input in api.stream(
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 200, in stream
2024-02-26T13:43:38.872325055Z response = await request(
2024-02-26T13:43:38.872325055Z ^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
2024-02-26T13:43:38.872325055Z return await fn(*args, **kwargs, context=context)
2024-02-26T13:43:38.872325055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
2024-02-26T13:43:38.872325055Z await errors.check_response(response) # but do not parse it!
2024-02-26T13:43:38.872325055Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
2024-02-26T13:43:38.872325055Z raise cls(payload, status=response.status) from e
2024-02-26T13:43:38.872325055Z kopf._cogs.clients.errors.APIClientError: (None, None)
2024-02-26T13:44:16.563238107Z [2024-02-26 13:44:16,563] root [INFO ] Probe response True, {'connected': True}
2024-02-26T13:44:16.564237153Z [2024-02-26 13:44:16,564] kopf.activities.prob [INFO ] Activity 'check_kafka_health' succeeded.
2024-02-26T13:44:16.564269573Z [2024-02-26 13:44:16,564] kopf.activities.prob [DEBUG ] Activity 'check_rbacs' is invoked.
2024-02-26T13:44:16.606139781Z [2024-02-26 13:44:16,606] kopf.activities.prob [INFO ] Activity 'check_rbacs' succeeded.
All of the available watching timeout settings have been set. But no restart occurs. This fails silently without ending the kopf executable making it impossible to recover from.
I see the same problem. Once an Watcher for xyz has failed error occured the game is basically over. There can be other errors leading to the same situation #1077
It would be great if one of the following could be achieved:
Retry and recover automatically, using the configured backoff
Liveness probe takes the watchers into consideration
We could somehow monitor the internal watchers ourself using a custom logic
Same issues starting kopf with namespace wildcard scope:
kopf run --namespace=dev-*
Having kopf handlers on N resources (mostly CRD) and M namespaces, it seems like kopf tries to immediately start N x M listeners resulting in 429 from K8s API for most of them. kopf is running, but handlers for most resources are not fired.
Long story short
We have kopf 1.37.1 running a watch stream on a single custom resource cluster wide. Eventually we received a 429 too many requests error from the kube-apiserver. The kopf watch did not restart after this error. It shutdown and continued running the kopf liveness and readiness probe locations.
Based on the logs it doesn't look too many requests at all.
We had the following updates to the CR list.
Line 813: 2024-02-26T06:02:22.849697098Z
Line 1256: 2024-02-26T07:02:27.741847287Z
Line 1383: 2024-02-26T07:18:02.638854440Z
Line 1706: 2024-02-26T08:02:29.404285926Z
Line 2148: 2024-02-26T09:02:30.995018345Z
Line 2583: 2024-02-26T10:02:32.614030753Z
Line 3059: 2024-02-26T11:02:36.687348782Z
Line 3249: 2024-02-26T11:27:30.696478118Z
Line 3479: 2024-02-26T11:54:46.913868429Z
Line 3958: 2024-02-26T12:54:51.819996162Z
Then we hit the following error from kopf.
All of the available watching timeout settings have been set. But no restart occurs. This fails silently without ending the kopf executable making it impossible to recover from.
Is there some setting or something we are missing to deal with this issue? Any execution recommendations that can be done to avoid this issue?
Kopf version
1.37.1
Kubernetes version
v1.27.8+4fab27b - Red Hat Openshift 4.14.8
Python version
3.11
Code
Our kopf configuration. Never been able to reproduce this issue reliably.
Logs
Additional information
No response
The text was updated successfully, but these errors were encountered: