-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/retries on conflict error #74
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just so you know, the workflows were broken for forks (thats why they were failing). We've fixed it but it will now run every workflow twice in this and the other pr. If you want you can rebase the branches with main and the errors will go away.
for { | ||
err := scanWorkload(workload, client, ctx, layerCli, layerEnv) | ||
if err != nil { | ||
if strings.Contains(err.Error(), "the object has been modified") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid checking the error string here, since it can change at any time. I think it would be better to see if we can get the underlying error type and check if the error is an instance of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ive looked and there is no higher level underlying error type we could use for this. What i did find is where the message is coming from which means, that we can reference that instead of using just a string. With that said, I would change this to:
import "k8s.io/apiserver/pkg/registry/generic/registry"
if strings.Contains(err.Error(), "the object has been modified") { | |
if strings.Contains(err.Error(), registry.OptimisticLockErrorMsg) { |
if err != nil { | ||
slog.Error("failed to scan workload", "error", err, "workload", workload.GetName(), "namespace", workload.GetNamespace()) | ||
return | ||
attempts := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempts := 0 | |
var attempts int |
slog.Error("failed to scan workload", "error", err, "workload", workload.GetName(), "namespace", workload.GetNamespace()) | ||
return | ||
attempts := 0 | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would change this to a for i as we can break out either way and then dont have to handle incrementing nor declaring the attempts variable
if strings.Contains(err.Error(), "the object has been modified") { | ||
if attempts >= maxRetriesOnConflict { | ||
if maxRetriesOnConflict > 0 { | ||
slog.Error("max retries reached, will try again in the next cycle", "workload", workload.GetName(), "namespace", workload.GetNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slog.Error("max retries reached, will try again in the next cycle", "workload", workload.GetName(), "namespace", workload.GetNamespace()) | |
slog.Error("max retries reached, will try again in the next scan", "workload", workload.GetName(), "namespace", workload.GetNamespace()) |
return | ||
} | ||
workload = updatedWorkload | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use guard clauses where applicable
} | ||
result = &suspendScaledWorkload{&cronJob{cronjob}} | ||
return result, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think we shouldn't have a whole other function just to get just 1 of the resource.
} | |
// getCronJobs is the getResourceFunc for CronJobs | |
func getCronJobs(name, namespace string, clientsets *Clientsets, ctx context.Context) ([]Workload, error) { | |
var results []Workload | |
if name != "" { | |
cronjob, err := clientsets.Kubernetes.BatchV1().CronJobs(namespace).Get(ctx, name, metav1.GetOptions{}) | |
if err != nil { | |
return result, fmt.Errorf("failed to get cronjob: %w", err) | |
} | |
results = append(results, &suspendScaledWorkload{&cronJob{cronjob}}) | |
return results | |
} | |
cronjobs, err := clientsets.Kubernetes.BatchV1().CronJobs(namespace).List(ctx, metav1.ListOptions{TimeoutSeconds: &timeout}) | |
if err != nil { | |
return nil, fmt.Errorf("failed to get cronjobs: %w", err) | |
} | |
for _, item := range cronjobs.Items { | |
results = append(results, &suspendScaledWorkload{&cronJob{&item}}) | |
} | |
return results, nil | |
} |
I think we should keep the client.GetWorkload and GetWorkloads in single functions. This will also help abstract this configuration away.
Motivation
Give the user the possibility to choose how to handle HTTP 409 conflict errors. Such conflicts typically occur when another entity (such as an HPA, CI/CD pipeline, or manual intervention) modifies a resource just before KubeDownscaler processes it
See #68 or caas-team/py-kube-downscaler#111
Changes
--max-retries-on-conflict
argument like in Py-Kube-Downscalerkubectl get deploy -n default
). the old GetWorkload() was renamed to GetWorkloads() to reflect the changes--max-retries-on-conflict
Tests done
TODO