Kopf does not restart after 429 too many requests error #1108

msfrucht · 2024-02-27T21:08:51Z

Long story short

We have kopf 1.37.1 running a watch stream on a single custom resource cluster wide. Eventually we received a 429 too many requests error from the kube-apiserver. The kopf watch did not restart after this error. It shutdown and continued running the kopf liveness and readiness probe locations.

Based on the logs it doesn't look too many requests at all.
We had the following updates to the CR list.
Line 813: 2024-02-26T06:02:22.849697098Z
Line 1256: 2024-02-26T07:02:27.741847287Z
Line 1383: 2024-02-26T07:18:02.638854440Z
Line 1706: 2024-02-26T08:02:29.404285926Z
Line 2148: 2024-02-26T09:02:30.995018345Z
Line 2583: 2024-02-26T10:02:32.614030753Z
Line 3059: 2024-02-26T11:02:36.687348782Z
Line 3249: 2024-02-26T11:27:30.696478118Z
Line 3479: 2024-02-26T11:54:46.913868429Z
Line 3958: 2024-02-26T12:54:51.819996162Z

Then we hit the following error from kopf.

2024-02-26T13:43:38.870237767Z [2024-02-26 13:43:38,870] kopf._cogs.clients.w [DEBUG   ] Stopping the watch-stream for applications.v1alpha1.application.isf.ibm.com cluster-wide.
2024-02-26T13:43:38.872325055Z [2024-02-26 13:43:38,870] kopf._core.reactor.o [ERROR   ] Watcher for applications.v1alpha1.application.isf.ibm.com@none has failed: (None, None)
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
2024-02-26T13:43:38.872325055Z     response.raise_for_status()
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
2024-02-26T13:43:38.872325055Z     raise ClientResponseError(
2024-02-26T13:43:38.872325055Z aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url=URL('https://172.30.0.1:443/apis/application.isf.ibm.com/v1alpha1/applications?watch=true&resourceVersion=9409415&timeoutSeconds=600')
2024-02-26T13:43:38.872325055Z 
2024-02-26T13:43:38.872325055Z The above exception was the direct cause of the following exception:
2024-02-26T13:43:38.872325055Z 
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 96, in guard
2024-02-26T13:43:38.872325055Z     await coro
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
2024-02-26T13:43:38.872325055Z     async for raw_event in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
2024-02-26T13:43:38.872325055Z     async for raw_event in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
2024-02-26T13:43:38.872325055Z     async for raw_input in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
2024-02-26T13:43:38.872325055Z     async for raw_input in api.stream(
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 200, in stream
2024-02-26T13:43:38.872325055Z     response = await request(
2024-02-26T13:43:38.872325055Z                ^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
2024-02-26T13:43:38.872325055Z     return await fn(*args, **kwargs, context=context)
2024-02-26T13:43:38.872325055Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
2024-02-26T13:43:38.872325055Z     await errors.check_response(response)  # but do not parse it!
2024-02-26T13:43:38.872325055Z     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
2024-02-26T13:43:38.872325055Z     raise cls(payload, status=response.status) from e
2024-02-26T13:43:38.872325055Z kopf._cogs.clients.errors.APIClientError: (None, None)
2024-02-26T13:44:16.563238107Z [2024-02-26 13:44:16,563] root                 [INFO    ] Probe response True, {'connected': True}
2024-02-26T13:44:16.564237153Z [2024-02-26 13:44:16,564] kopf.activities.prob [INFO    ] Activity 'check_kafka_health' succeeded.
2024-02-26T13:44:16.564269573Z [2024-02-26 13:44:16,564] kopf.activities.prob [DEBUG   ] Activity 'check_rbacs' is invoked.
2024-02-26T13:44:16.606139781Z [2024-02-26 13:44:16,606] kopf.activities.prob [INFO    ] Activity 'check_rbacs' succeeded.

All of the available watching timeout settings have been set. But no restart occurs. This fails silently without ending the kopf executable making it impossible to recover from.

settings.watching.connect_timeout = 60
settings.watching.server_timeout = 600
settings.watching.client_timeout = 660
settings.watching.reconnect_backoff = 1

Is there some setting or something we are missing to deal with this issue? Any execution recommendations that can be done to avoid this issue?

Kopf version

1.37.1

Kubernetes version

v1.27.8+4fab27b - Red Hat Openshift 4.14.8

Python version

3.11

Code

Our kopf configuration. Never been able to reproduce this issue reliably.

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
    settings.posting.level = logging.DEBUG
    # timeouts are in seconds
    settings.watching.connect_timeout = 60
    settings.watching.server_timeout = 600
    settings.watching.client_timeout = 660
    settings.watching.reconnect_backoff = 1
    settings.networking.error_backoffs = InfiniteBackoff()
    settings.batching.error_delays = [10, 20, 30, 40]
    settings.peering.name = "application-peers"
    settings.peering.stealth = True
    settings.persistence.finalizer = "application-controller"
    settings.posting.enabled = False
    application_discovery_state.isDiscoveryCompleted = Fals

Logs

..setup at 2024-02-26T06:01:32.599757687Z
...probe repeats
2024-02-26T11:54:46.913868429Z [2024-02-26 11:54:46,913] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating is in progress: {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'kind': 'Application', 'metadata': {'annotations': {'isf.ibm.com/created-event': 'created', 'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884758016"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n', 'usedcapacity': '1884819456'}, 'creationTimestamp': '2024-02-24T04:17:17Z', 'finalizers': ['application-controller'], 'generation': 1, 'labels': {'isf.ibm.com/clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'}, 'managedFields': [{'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'f:kopf.zalando.org/last-handled-configuration': {}}, 'f:finalizers': {'.': {}, 'v:"application-controller"': {}}}}, 'manager': 'kopf', 'operation': 'Update', 'time': '2024-02-26T11:02:36Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:status': {'.': {}, 'f:activeRestore': {}, 'f:availableBackups': {}, 'f:backupStatus': {}, 'f:drStatus': {'.': {}, 'f:initialSyncComplete': {}}, 'f:lastSuccessfulBackupTime': {}, 'f:observedGeneration': {}, 'f:pvcStatuses': {}, 'f:storageClassCapacities': {'.': {}, 'f:lvms-hcp-etcd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}, 'f:ocs-storagecluster-ceph-rbd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}}, 'f:totalBackups': {}, 'f:totalCapacity': {}, 'f:totalPVCCapacity': {}, 'f:usedCapacity': {}, 'f:usedPVCCapacity': {}}}, 'manager': 'manager', 'operation': 'Update', 'subresource': 'status', 'time': '2024-02-26T11:02:38Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:isf.ibm.com/created-event': {}, 'f:usedcapacity': {}}, 'f:labels': {'.': {}, 'f:isf.ibm.com/clusterId': {}}}, 'f:spec': {'.': {}, 'f:enableDR': {}, 'f:includedNamespaces': {}}}, 'manager': 'manager', 'operation': 'Update', 'time': '2024-02-26T11:54:46Z'}], 'name': 'clusters-hcp24feb', 'namespace': 'ibm-spectrum-fusion-ns', 'resourceVersion': '9215010', 'uid': '1b79fa23-5b73-4a46-806f-73f3dad59db3'}, 'spec': {'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, 'status': {'activeRestore': {}, 'availableBackups': 0, 'backupStatus': [], 'drStatus': {'initialSyncComplete': False}, 'lastSuccessfulBackupTime': 0, 'observedGeneration': 1, 'pvcStatuses': [{'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-0', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-1', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-2', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-82k9b-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-dpfqh-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-lvzfh-rhcos', 'pvcStatus': 'Bound'}], 'storageClassCapacities': {'lvms-hcp-etcd': {'numPVCs': 3, 'totalCapacity': 25769803776, 'usedCapacity': 1884758016}, 'ocs-storagecluster-ceph-rbd': {'numPVCs': 3, 'totalCapacity': 103079215104, 'usedCapacity': 0}}, 'totalBackups': 0, 'totalCapacity': 128849018880, 'totalPVCCapacity': 128849018880, 'usedCapacity': 1884758016, 'usedPVCCapacity': 1884758016}}
2024-02-26T11:54:46.913938731Z [2024-02-26 11:54:46,913] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating diff: (('change', ('metadata', 'annotations', 'usedcapacity'), '1884758016', '1884819456'),)
2024-02-26T11:54:46.913972995Z [2024-02-26 11:54:46,913] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handler 'on_metadata_annotations/metadata.annotations' is invoked.
2024-02-26T11:54:46.914303997Z [2024-02-26 11:54:46,914] root                 [DEBUG   ] Modified application metadata.annotations request received with name=clusters-hcp24feb, namespace=ibm-spectrum-fusion-ns, spec={'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, body={'apiVersion': 'application.isf.ibm.com/v1alpha1', 'kind': 'Application', 'metadata': {'annotations': {'isf.ibm.com/created-event': 'created', 'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884758016"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n', 'usedcapacity': '1884819456'}, 'creationTimestamp': '2024-02-24T04:17:17Z', 'finalizers': ['application-controller'], 'generation': 1, 'labels': {'isf.ibm.com/clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'}}, 'manager': 'manager', 'operation': 'Update', 'time': '2024-02-26T11:54:46Z'}], 'name': 'clusters-hcp24feb', 'namespace': 'ibm-spectrum-fusion-ns', 'resourceVersion': '9215010', 'uid': '1b79fa23-5b73-4a46-806f-73f3dad59db3'}, 'spec': {'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, 'status': {'activeRestore': {}, 'availableBackups': 0, 'backupStatus': [], 'drStatus': {'initialSyncComplete': False}, 'lastSuccessfulBackupTime': 0, 'observedGeneration': 1, 'pvcStatuses': [{'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-0', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-1', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-2', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-82k9b-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-dpfqh-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-lvzfh-rhcos', 'pvcStatus': 'Bound'}], 'storageClassCapacities': {'lvms-hcp-etcd': {'numPVCs': 3, 'totalCapacity': 25769803776, 'usedCapacity': 1884758016}, 'ocs-storagecluster-ceph-rbd': {'numPVCs': 3, 'totalCapacity': 103079215104, 'usedCapacity': 0}}, 'totalBackups': 0, 'totalCapacity': 128849018880, 'totalPVCCapacity': 128849018880, 'usedCapacity': 1884758016, 'usedPVCCapacity': 1884758016}}
2024-02-26T11:54:46.914351195Z [2024-02-26 11:54:46,914] root                 [DEBUG   ] Received request to create application message.
2024-02-26T11:54:46.914422589Z [2024-02-26 11:54:46,914] root                 [DEBUG   ] Reading from application
2024-02-26T11:54:46.914462334Z [2024-02-26 11:54:46,914] root                 [INFO    ] Sending application discovery Kafka message for application ibm-spectrum-fusion-ns/clusters-hcp24feb
2024-02-26T11:54:46.914535311Z [2024-02-26 11:54:46,914] root                 [DEBUG   ] Kafka Message Sending: topic=application key={'uuid': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'} value={'action': 'MODIFY', 'clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce', 'applicationName': 'ibm-spectrum-fusion-ns/clusters-hcp24feb', 'applicationId': '1b79fa23-5b73-4a46-806f-73f3dad59db3', 'annotations': "{'isf.ibm.com/created-event': 'created', 'usedcapacity': '1884819456'}", 'discoveryTimestamp': '2024-02-26T06:01:32Z', 'isDiscoveryCompleted': False}
2024-02-26T11:54:46.915049095Z [2024-02-26 11:54:46,914] kopf.objects         [INFO    ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handler 'on_metadata_annotations/metadata.annotations' succeeded.
2024-02-26T11:54:46.915199538Z [2024-02-26 11:54:46,915] kopf.objects         [INFO    ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating is processed: 1 succeeded; 0 failed.
2024-02-26T11:54:46.915403771Z [2024-02-26 11:54:46,915] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884819456"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n'}}}
2024-02-26T11:54:47.043473198Z [2024-02-26 11:54:47,043] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Something has changed, but we are not interested (the essence is the same).
2024-02-26T11:54:47.043518694Z [2024-02-26 11:54:47,043] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handling cycle is finished, waiting for new changes.
2024-02-26T11:54:49.302440211Z [2024-02-26 11:54:49,302] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Something has changed, but we are not interested (the essence is the same).
2024-02-26T11:54:49.302489643Z [2024-02-26 11:54:49,302] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handling cycle is finished, waiting for new changes.
2024-02-26T11:55:20.707305716Z [2024-02-26 11:55:20,707] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/cephdemo] Something has changed, but we are not interested (the essence is the same).
2024-02-26T11:55:20.707354387Z [2024-02-26 11:55:20,707] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/cephdemo] Handling cycle is finished, waiting for new changes.
2024-02-26T11:55:26.120333469Z [2024-02-26 11:55:26,120] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/cephdemo] Something has changed, but we are not interested (the essence is the same).
2024-02-26T11:55:26.120333469Z [2024-02-26 11:55:26,120] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/cephdemo] Handling cycle is finished, waiting for new changes.
...probe repeats
2024-02-26T12:54:51.819996162Z [2024-02-26 12:54:51,819] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating is in progress: {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'kind': 'Application', 'metadata': {'annotations': {'isf.ibm.com/created-event': 'created', 'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884819456"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n', 'usedcapacity': '1884651520'}, 'creationTimestamp': '2024-02-24T04:17:17Z', 'finalizers': ['application-controller'], 'generation': 1, 'labels': {'isf.ibm.com/clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'}, 'managedFields': [{'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'f:kopf.zalando.org/last-handled-configuration': {}}, 'f:finalizers': {'.': {}, 'v:"application-controller"': {}}}}, 'manager': 'kopf', 'operation': 'Update', 'time': '2024-02-26T11:54:46Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:status': {'.': {}, 'f:activeRestore': {}, 'f:availableBackups': {}, 'f:backupStatus': {}, 'f:drStatus': {'.': {}, 'f:initialSyncComplete': {}}, 'f:lastSuccessfulBackupTime': {}, 'f:observedGeneration': {}, 'f:pvcStatuses': {}, 'f:storageClassCapacities': {'.': {}, 'f:lvms-hcp-etcd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}, 'f:ocs-storagecluster-ceph-rbd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}}, 'f:totalBackups': {}, 'f:totalCapacity': {}, 'f:totalPVCCapacity': {}, 'f:usedCapacity': {}, 'f:usedPVCCapacity': {}}}, 'manager': 'manager', 'operation': 'Update', 'subresource': 'status', 'time': '2024-02-26T11:54:49Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:isf.ibm.com/created-event': {}, 'f:usedcapacity': {}}, 'f:labels': {'.': {}, 'f:isf.ibm.com/clusterId': {}}}, 'f:spec': {'.': {}, 'f:enableDR': {}, 'f:includedNamespaces': {}}}, 'manager': 'manager', 'operation': 'Update', 'time': '2024-02-26T12:54:51Z'}], 'name': 'clusters-hcp24feb', 'namespace': 'ibm-spectrum-fusion-ns', 'resourceVersion': '9354249', 'uid': '1b79fa23-5b73-4a46-806f-73f3dad59db3'}, 'spec': {'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, 'status': {'activeRestore': {}, 'availableBackups': 0, 'backupStatus': [], 'drStatus': {'initialSyncComplete': False}, 'lastSuccessfulBackupTime': 0, 'observedGeneration': 1, 'pvcStatuses': [{'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-0', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-1', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-2', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-82k9b-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-dpfqh-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-lvzfh-rhcos', 'pvcStatus': 'Bound'}], 'storageClassCapacities': {'lvms-hcp-etcd': {'numPVCs': 3, 'totalCapacity': 25769803776, 'usedCapacity': 1884819456}, 'ocs-storagecluster-ceph-rbd': {'numPVCs': 3, 'totalCapacity': 103079215104, 'usedCapacity': 0}}, 'totalBackups': 0, 'totalCapacity': 128849018880, 'totalPVCCapacity': 128849018880, 'usedCapacity': 1884819456, 'usedPVCCapacity': 1884819456}}
2024-02-26T12:54:51.819996162Z [2024-02-26 12:54:51,819] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating diff: (('change', ('metadata', 'annotations', 'usedcapacity'), '1884819456', '1884651520'),)
2024-02-26T12:54:51.820076553Z [2024-02-26 12:54:51,820] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handler 'on_metadata_annotations/metadata.annotations' is invoked.
2024-02-26T12:54:51.820634761Z [2024-02-26 12:54:51,820] root                 [DEBUG   ] Modified application metadata.annotations request received with name=clusters-hcp24feb, namespace=ibm-spectrum-fusion-ns, spec={'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, body={'apiVersion': 'application.isf.ibm.com/v1alpha1', 'kind': 'Application', 'metadata': {'annotations': {'isf.ibm.com/created-event': 'created', 'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884819456"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n', 'usedcapacity': '1884651520'}, 'creationTimestamp': '2024-02-24T04:17:17Z', 'finalizers': ['application-controller'], 'generation': 1, 'labels': {'isf.ibm.com/clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'}, 'managedFields': [{'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'f:kopf.zalando.org/last-handled-configuration': {}}, 'f:finalizers': {'.': {}, 'v:"application-controller"': {}}}}, 'manager': 'kopf', 'operation': 'Update', 'time': '2024-02-26T11:54:46Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:status': {'.': {}, 'f:activeRestore': {}, 'f:availableBackups': {}, 'f:backupStatus': {}, 'f:drStatus': {'.': {}, 'f:initialSyncComplete': {}}, 'f:lastSuccessfulBackupTime': {}, 'f:observedGeneration': {}, 'f:pvcStatuses': {}, 'f:storageClassCapacities': {'.': {}, 'f:lvms-hcp-etcd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}, 'f:ocs-storagecluster-ceph-rbd': {'.': {}, 'f:numPVCs': {}, 'f:totalCapacity': {}, 'f:usedCapacity': {}}}, 'f:totalBackups': {}, 'f:totalCapacity': {}, 'f:totalPVCCapacity': {}, 'f:usedCapacity': {}, 'f:usedPVCCapacity': {}}}, 'manager': 'manager', 'operation': 'Update', 'subresource': 'status', 'time': '2024-02-26T11:54:49Z'}, {'apiVersion': 'application.isf.ibm.com/v1alpha1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:annotations': {'.': {}, 'f:isf.ibm.com/created-event': {}, 'f:usedcapacity': {}}, 'f:labels': {'.': {}, 'f:isf.ibm.com/clusterId': {}}}, 'f:spec': {'.': {}, 'f:enableDR': {}, 'f:includedNamespaces': {}}}, 'manager': 'manager', 'operation': 'Update', 'time': '2024-02-26T12:54:51Z'}], 'name': 'clusters-hcp24feb', 'namespace': 'ibm-spectrum-fusion-ns', 'resourceVersion': '9354249', 'uid': '1b79fa23-5b73-4a46-806f-73f3dad59db3'}, 'spec': {'enableDR': False, 'includedNamespaces': ['clusters-hcp24feb']}, 'status': {'activeRestore': {}, 'availableBackups': 0, 'backupStatus': [], 'drStatus': {'initialSyncComplete': False}, 'lastSuccessfulBackupTime': 0, 'observedGeneration': 1, 'pvcStatuses': [{'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-0', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-1', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'data-etcd-2', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-82k9b-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-dpfqh-rhcos', 'pvcStatus': 'Bound'}, {'namespace': 'clusters-hcp24feb', 'pvcName': 'hcp24feb-6febef8a-lvzfh-rhcos', 'pvcStatus': 'Bound'}], 'storageClassCapacities': {'lvms-hcp-etcd': {'numPVCs': 3, 'totalCapacity': 25769803776, 'usedCapacity': 1884819456}, 'ocs-storagecluster-ceph-rbd': {'numPVCs': 3, 'totalCapacity': 103079215104, 'usedCapacity': 0}}, 'totalBackups': 0, 'totalCapacity': 128849018880, 'totalPVCCapacity': 128849018880, 'usedCapacity': 1884819456, 'usedPVCCapacity': 1884819456}}
2024-02-26T12:54:51.820691968Z [2024-02-26 12:54:51,820] root                 [DEBUG   ] Received request to create application message.
2024-02-26T12:54:51.820752341Z [2024-02-26 12:54:51,820] root                 [DEBUG   ] Reading from application
2024-02-26T12:54:51.820794751Z [2024-02-26 12:54:51,820] root                 [INFO    ] Sending application discovery Kafka message for application ibm-spectrum-fusion-ns/clusters-hcp24feb
2024-02-26T12:54:51.820881864Z [2024-02-26 12:54:51,820] root                 [DEBUG   ] Kafka Message Sending: topic=application key={'uuid': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce'} value={'action': 'MODIFY', 'clusterId': 'b8f4b4bf-b113-417a-8708-3fc1ce10d9ce', 'applicationName': 'ibm-spectrum-fusion-ns/clusters-hcp24feb', 'applicationId': '1b79fa23-5b73-4a46-806f-73f3dad59db3', 'annotations': "{'isf.ibm.com/created-event': 'created', 'usedcapacity': '1884651520'}", 'discoveryTimestamp': '2024-02-26T06:01:32Z', 'isDiscoveryCompleted': False}
2024-02-26T12:54:51.821499163Z [2024-02-26 12:54:51,821] kopf.objects         [INFO    ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handler 'on_metadata_annotations/metadata.annotations' succeeded.
2024-02-26T12:54:51.821668912Z [2024-02-26 12:54:51,821] kopf.objects         [INFO    ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Updating is processed: 1 succeeded; 0 failed.
2024-02-26T12:54:51.821886440Z [2024-02-26 12:54:51,821] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Patching with: {'metadata': {'annotations': {'kopf.zalando.org/last-handled-configuration': '{"spec":{"enableDR":false,"includedNamespaces":["clusters-hcp24feb"]},"metadata":{"labels":{"isf.ibm.com/clusterId":"b8f4b4bf-b113-417a-8708-3fc1ce10d9ce"},"annotations":{"isf.ibm.com/created-event":"created","usedcapacity":"1884651520"},"uid":"1b79fa23-5b73-4a46-806f-73f3dad59db3"}}\n'}}}
2024-02-26T12:54:51.989526589Z [2024-02-26 12:54:51,989] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Something has changed, but we are not interested (the essence is the same).
2024-02-26T12:54:51.989583606Z [2024-02-26 12:54:51,989] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handling cycle is finished, waiting for new changes.
2024-02-26T12:54:53.325395344Z [2024-02-26 12:54:53,325] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Something has changed, but we are not interested (the essence is the same).
2024-02-26T12:54:53.325395344Z [2024-02-26 12:54:53,325] kopf.objects         [DEBUG   ] [ibm-spectrum-fusion-ns/clusters-hcp24feb] Handling cycle is finished, waiting for new changes.
... probe repeats
2024-02-26T13:42:31.312701197Z [2024-02-26 13:42:31,312] aiohttp.access       [INFO    ] 10.128.2.2 [26/Feb/2024:13:42:29 +0000] "GET /healthz HTTP/1.1" 200 245 "-" "kube-probe/1.27"
2024-02-26T13:42:31.312890452Z [2024-02-26 13:42:31,312] aiohttp.access       [INFO    ] 10.128.2.2 [26/Feb/2024:13:42:29 +0000] "GET /healthz HTTP/1.1" 200 245 "-" "kube-probe/1.27"
2024-02-26T13:42:59.685426651Z [2024-02-26 13:42:59,685] kafka.client         [WARNING ] <BrokerConnection node_id=0 host=guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connected> [IPv4 ('10.129.2.76', 9093)]> timed out after 30000 ms. Closing connection.
2024-02-26T13:42:59.685483998Z [2024-02-26 13:42:59,685] kafka.conn           [INFO    ] <BrokerConnection node_id=0 host=guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connected> [IPv4 ('10.129.2.76', 9093)]>: Closing connection. [Error 7] RequestTimedOutError: Request timed out after 30000 ms
2024-02-26T13:42:59.685901943Z [2024-02-26 13:42:59,685] kafka.client         [WARNING ] Node 0 connection failed -- refreshing metadata
2024-02-26T13:42:59.842812784Z [2024-02-26 13:42:59,842] kafka.conn           [INFO    ] <BrokerConnection node_id=0 host=guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connecting> [IPv4 ('10.129.2.76', 9093)]>: connecting to guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 [('10.129.2.76', 9093) IPv4]
2024-02-26T13:43:29.620885986Z [2024-02-26 13:43:29,620] kopf.activities.prob [DEBUG   ] Activity 'check_kafka_health' is invoked.
2024-02-26T13:43:29.853219419Z [2024-02-26 13:43:29,852] kafka.conn           [ERROR   ] Connection attempt to <BrokerConnection node_id=0 host=guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connecting> [IPv4 ('10.129.2.76', 9093)]> timed out
2024-02-26T13:43:29.853219419Z [2024-02-26 13:43:29,852] kafka.conn           [INFO    ] <BrokerConnection node_id=0 host=guardian-kafka-cluster-kafka-0.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connecting> [IPv4 ('10.129.2.76', 9093)]>: Closing connection. KafkaConnectionError: timeout
2024-02-26T13:43:29.853219419Z [2024-02-26 13:43:29,852] kafka.client         [WARNING ] Node 0 connection failed -- refreshing metadata
2024-02-26T13:43:29.907597266Z [2024-02-26 13:43:29,907] kafka.conn           [INFO    ] <BrokerConnection node_id=1 host=guardian-kafka-cluster-kafka-1.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <connecting> [IPv4 ('10.129.2.77', 9093)]>: connecting to guardian-kafka-cluster-kafka-1.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 [('10.129.2.77', 9093) IPv4]
2024-02-26T13:43:30.950918274Z [2024-02-26 13:43:30,950] kafka.conn           [INFO    ] <BrokerConnection node_id=1 host=guardian-kafka-cluster-kafka-1.guardian-kafka-cluster-kafka-brokers.ibm-backup-restore.svc:9093 <handshake> [IPv4 ('10.129.2.77', 9093)]>: Connection complete.
2024-02-26T13:43:38.870237767Z [2024-02-26 13:43:38,870] kopf._cogs.clients.w [DEBUG   ] Stopping the watch-stream for applications.v1alpha1.application.isf.ibm.com cluster-wide.
2024-02-26T13:43:38.872325055Z [2024-02-26 13:43:38,870] kopf._core.reactor.o [ERROR   ] Watcher for applications.v1alpha1.application.isf.ibm.com@none has failed: (None, None)
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 148, in check_response
2024-02-26T13:43:38.872325055Z     response.raise_for_status()
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
2024-02-26T13:43:38.872325055Z     raise ClientResponseError(
2024-02-26T13:43:38.872325055Z aiohttp.client_exceptions.ClientResponseError: 429, message='Too Many Requests', url=URL('https://172.30.0.1:443/apis/application.isf.ibm.com/v1alpha1/applications?watch=true&resourceVersion=9409415&timeoutSeconds=600')
2024-02-26T13:43:38.872325055Z 
2024-02-26T13:43:38.872325055Z The above exception was the direct cause of the following exception:
2024-02-26T13:43:38.872325055Z 
2024-02-26T13:43:38.872325055Z Traceback (most recent call last):
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/aiokits/aiotasks.py", line 96, in guard
2024-02-26T13:43:38.872325055Z     await coro
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_core/reactor/queueing.py", line 175, in watcher
2024-02-26T13:43:38.872325055Z     async for raw_event in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 86, in infinite_watch
2024-02-26T13:43:38.872325055Z     async for raw_event in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 201, in continuous_watch
2024-02-26T13:43:38.872325055Z     async for raw_input in stream:
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/watching.py", line 266, in watch_objs
2024-02-26T13:43:38.872325055Z     async for raw_input in api.stream(
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 200, in stream
2024-02-26T13:43:38.872325055Z     response = await request(
2024-02-26T13:43:38.872325055Z                ^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/auth.py", line 45, in wrapper
2024-02-26T13:43:38.872325055Z     return await fn(*args, **kwargs, context=context)
2024-02-26T13:43:38.872325055Z            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/api.py", line 85, in request
2024-02-26T13:43:38.872325055Z     await errors.check_response(response)  # but do not parse it!
2024-02-26T13:43:38.872325055Z     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-02-26T13:43:38.872325055Z   File "/usr/src/app/code/.venv/lib64/python3.11/site-packages/kopf/_cogs/clients/errors.py", line 150, in check_response
2024-02-26T13:43:38.872325055Z     raise cls(payload, status=response.status) from e
2024-02-26T13:43:38.872325055Z kopf._cogs.clients.errors.APIClientError: (None, None)
2024-02-26T13:44:16.563238107Z [2024-02-26 13:44:16,563] root                 [INFO    ] Probe response True, {'connected': True}
2024-02-26T13:44:16.564237153Z [2024-02-26 13:44:16,564] kopf.activities.prob [INFO    ] Activity 'check_kafka_health' succeeded.
2024-02-26T13:44:16.564269573Z [2024-02-26 13:44:16,564] kopf.activities.prob [DEBUG   ] Activity 'check_rbacs' is invoked.
...probe repeats

Additional information

No response

The text was updated successfully, but these errors were encountered:

teriyakichild · 2024-03-25T18:53:52Z

I'm seeing the same issue on 1.36.2.

dheeg · 2024-08-01T07:42:39Z

I see the same problem. Once an Watcher for xyz has failed error occured the game is basically over. There can be other errors leading to the same situation #1077

It would be great if one of the following could be achieved:

Retry and recover automatically, using the configured backoff
Liveness probe takes the watchers into consideration
We could somehow monitor the internal watchers ourself using a custom logic

cpnielsen · 2024-08-06T08:37:17Z

@nolar Any input / update on this? If you know where the potential issue may be, can you give some pointers for me or someone else to potentially fix?

valen7in · 2024-11-12T16:37:28Z

Same issues starting kopf with namespace wildcard scope:

kopf run --namespace=dev-*

Having kopf handlers on N resources (mostly CRD) and M namespaces, it seems like kopf tries to immediately start N x M listeners resulting in 429 from K8s API for most of them. kopf is running, but handlers for most resources are not fired.

msfrucht added the bug Something isn't working label Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kopf does not restart after 429 too many requests error #1108

Kopf does not restart after 429 too many requests error #1108

msfrucht commented Feb 27, 2024 •

edited

Loading

teriyakichild commented Mar 25, 2024

dheeg commented Aug 1, 2024

cpnielsen commented Aug 6, 2024

valen7in commented Nov 12, 2024 •

edited

Loading

Kopf does not restart after 429 too many requests error #1108

Kopf does not restart after 429 too many requests error #1108

Comments

msfrucht commented Feb 27, 2024 • edited Loading

Long story short

Kopf version

Kubernetes version

Python version

Code

Logs

Additional information

teriyakichild commented Mar 25, 2024

dheeg commented Aug 1, 2024

cpnielsen commented Aug 6, 2024

valen7in commented Nov 12, 2024 • edited Loading

msfrucht commented Feb 27, 2024 •

edited

Loading

valen7in commented Nov 12, 2024 •

edited

Loading