Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kopf suddenly stop detecting any changes approximately after 20 mins #1120

Open
RSE132 opened this issue Aug 3, 2024 · 5 comments
Open

kopf suddenly stop detecting any changes approximately after 20 mins #1120

RSE132 opened this issue Aug 3, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@RSE132
Copy link

RSE132 commented Aug 3, 2024

Long story short

kopf suddenly stop detecting any changes approximately after 20 mins, this behaviour is unexpected. I useing latest version 1.37.2

Kopf version

1.37.2

Kubernetes version

1.29

Python version

3.12

Code

@kopf.on.create('perpetual.com', 'v1', 'vab')
def create_fn(spec,name,namespace, **kwargs):
    logging.info(f"New VaultAuthBackend resource found: {name},{namespace}")
    vabInit(spec,name,namespace, **kwargs)
    return

@kopf.on.delete('perpetual.com', 'v1', 'vab')
def delete_fn(spec,name,namespace, **kwargs):
    specConfig = spec.get('config', {})
    clusterName = specConfig.get('cluster', '')
    vaultOwner = spec.get('vaultowner', '')
    logging.info(f"Deleting VaultAuthBackend resource: {name},{namespace}")
    vaultOnboarding.deactivate(clusterName,vaultOwner)
    return

Logs

2024-08-03 01:06:14,866 :: DEBUG :: Handler 'create_fn' is invoked.
2024-08-03 01:06:14,867 :: INFO :: New VaultAuthBackend resource found: landingzone-snadbox-pox-vabrr,landing-zone
2024-08-03 01:06:16,403 :: INFO :: VaultauthBackend unregistered successfully from MSS. cluster=landingzone-snadbox-pox
2024-08-03 01:06:16,518 :: INFO :: Handler 'delete_fn' succeeded.
2024-08-03 01:06:16,518 :: INFO :: Deletion is processed: 1 succeeded; 0 failed.
2024-08-03 01:06:16,518 :: DEBUG :: Removing the finalizer, thus allowing the actual deletion.
2024-08-03 01:06:16,518 :: DEBUG :: Patching with: {'metadata': {'finalizers': []}}
2024-08-03 01:06:16,641 :: INFO :: Service Account already exist sa=:vault-auth
2024-08-03 01:06:16,668 :: DEBUG :: Deleted, really deleted, and we are notified.
2024-08-03 01:06:16,669 :: DEBUG :: Removing the finalizer, thus allowing the actual deletion.
2024-08-03 01:06:16,673 :: INFO :: Secret already exist. secret=:vault-auth-sa-token-secret
2024-08-03 01:06:19,758 :: INFO :: Cluster rolebinding 'role-tokenreview-binding' already exist for the sa=vault-auth
2024-08-03 01:06:21,563 :: INFO :: Successfully enabled the Kubernetes auth method.
2024-08-03 01:06:21,797 :: INFO :: Successfully configured the Kubernetes auth method.
2024-08-03 01:06:24,672 :: INFO :: Kubernetes role=vault-secrets-webhook for auth_backend=landingzone-snadbox-pox created successfully
2024-08-03 01:06:25,074 :: INFO :: Handler 'create_fn' succeeded.
2024-08-03 01:06:25,074 :: INFO :: Creation is processed: 1 succeeded; 0 failed.
2024-08-03 01:06:25,075 :: DEBUG :: Patching with: {'metadata': {'annotations': {'kopf.zalando.org/last-handled-configuration': '{"spec":{"config":{"authmethod":"kubernetes","cluster":"landingzone-snadbox-pox","clusterenv":"nonprod","kubeconfig":"landingzone-snadbox-pox-kubeconfig"},"vaultaddress":"https://vault.maersk-digital.net","vaultowner":"perpetual"},"metadata":{"annotations":{"perpetual.com/reconcile-policy":"detach-on-delete"}}}\n'}}}
2024-08-03 01:06:25,216 :: DEBUG :: Something has changed, but we are not interested (the essence is the same).
2024-08-03 01:06:25,216 :: DEBUG :: Handling cycle is finished, waiting for new changes.

Additional information

After Handling cycle is finished, waiting for new changes, kopf do not detect any new changes after 20 mins of idle time

@RSE132 RSE132 added the bug Something isn't working label Aug 3, 2024
@dsarazin
Copy link

dsarazin commented Sep 12, 2024

Hello guys,
i am also facing the same issue using kopf 1.37.2 on python 3.12 and kubernetes 1.29.
It was working perfectly well until we updated from kubernetes 1.28 to 1.29 (1.29.7-gke.1274000 on google)

@SteinRobert
Copy link

We see the same issue im multiple kopf based operators. Any idea about the root of the problem? My team and I would be happy to contribute.

@francescotimperi
Copy link

Hello,

we faced similar issues using AKS, and we enforced a server timeout using something like

@kopf.on.startup()
def configure(settings: kopf.OperatorSettings, **_):
  settings.watching.server_timeout = 210

give it a try.

@caffeinism
Copy link

Same symptoms, AKS, Kubernetes version 1.29

@SteinRobert
Copy link

We evaluated changing the timeouts now for a month - for us it really did the trick!
Thank you @francescotimperi !
https://kopf.readthedocs.io/en/stable/configuration/#networking-timeouts

Since we set the timeouts in multiple operators we've gotten rid of that particular problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants