Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storageClusterPeer: add logic for storageclusterpeer controller #2678

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rewantsoni
Copy link
Member

@rewantsoni rewantsoni commented Jul 1, 2024

@rewantsoni
Copy link
Member Author

/retest-required

config/rbac/role.yaml Outdated Show resolved Hide resolved
controllers/util/k8sutil.go Outdated Show resolved Hide resolved
controllers/util/k8sutil.go Outdated Show resolved Hide resolved
api/v1/storageclusterpeer_types.go Outdated Show resolved Hide resolved
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2024
@rewantsoni rewantsoni force-pushed the scp-controller branch 6 times, most recently from c31ebbe to cd4371e Compare July 31, 2024 10:56
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2024
@rewantsoni rewantsoni force-pushed the scp-controller branch 4 times, most recently from 0fe696a to cfbc4bf Compare November 5, 2024 15:03
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2024
@rewantsoni rewantsoni force-pushed the scp-controller branch 2 times, most recently from 44b0276 to dc8a088 Compare November 6, 2024 13:51
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2024
@rewantsoni
Copy link
Member Author

/retest-required

controllers/storagecluster/reconcile.go Outdated Show resolved Hide resolved
Copy link
Contributor

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rewantsoni
Once this PR has been reviewed and has the lgtm label, please ask for approval from nb-ohad. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rewantsoni rewantsoni force-pushed the scp-controller branch 8 times, most recently from 8d456a4 to 99e35e9 Compare November 7, 2024 15:09
Comment on lines 204 to 213
storageClusterPeerList := ocsv1.StorageClusterPeerList{}
if err := r.List(ctx, &storageClusterPeerList, client.InNamespace(sc.Namespace)); err != nil {
return reconcile.Result{}, err
}
for i := range storageClusterPeerList.Items {
storageClusterPeer := &storageClusterPeerList.Items[i]
if err = controllerutil.SetOwnerReference(sc, storageClusterPeer, r.Client.Scheme()); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to set owner reference on storageClusterPeer %s: %w", storageClusterPeer.Name, err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be located here. This should be part of a reconcile function that runs as part of the storage cluster main reconcile

r.log.Error(err, "failed to fetch StorageCluster for StorageClusterPeer found in the same namespace.")
if k8serrors.IsNotFound(err) {
storageClusterPeer.Status.State = ocsv1.ReconcileFailed
_ = r.updateStatus(storageClusterPeer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't update the status inside the reconciliation, have a central point outside of the reconciliation that ensures that no matter how we exist the reconcile a status update will happen. Ee take a similar approach in all of our controllers across the majority of our projects.

If you don't do that then all contributors need to always make sure they are sending an update status comment on every return. This is erroneous.

Comment on lines +108 to +131
storageClusterPeer.Status.State = ocsv1.ReconcileFailed
return ctrl.Result{}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a continuation of the my last comment, here is an example of a missed update that will not get persistent

}
storageClusterPeer.Status.State = ocsv1.StorageClusterPeerStateInitializing

peerStorageClusterUID, err := readStorageClusterUIDFromTicket(storageClusterPeer.Spec.OnboardingToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we use token not ticket in code
  2. This is only needed once. Why is there a need for another function to be invoked?
Suggested change
peerStorageClusterUID, err := readStorageClusterUIDFromTicket(storageClusterPeer.Spec.OnboardingToken)
peerStorageClusterUID, err := readStorageClusterUIDFromTicket(storageClusterPeer.Spec.OnboardingToken)

if storageClusterPeer.Status.PeerInfo == nil {
storageClusterPeer.Status.PeerInfo = &ocsv1.PeerInfo{}
}
storageClusterPeer.Status.PeerInfo.StorageClusterUid = string(peerStorageClusterUID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are updating the storage cluster uid in the peer info then why do you assume all other information should not needs to be wiped out? If the existing storage cluster uid is different then the old info is meaningless and even misleading


_, err = ocsClient.PeerStorageCluster(r.ctx, storageClusterPeer.Spec.OnboardingToken, string(storageCluster.UID))
if err != nil {
r.log.Error(err, fmt.Sprintf("failed to Peer Storage Cluster, code: %v.", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r.log.Error(err, fmt.Sprintf("failed to Peer Storage Cluster, code: %v.", err))
r.log.Error(err, fmt.Sprintf("failed to Peer Storage Cluster, reason: %v.", err))

_, err = ocsClient.PeerStorageCluster(r.ctx, storageClusterPeer.Spec.OnboardingToken, string(storageCluster.UID))
if err != nil {
r.log.Error(err, fmt.Sprintf("failed to Peer Storage Cluster, code: %v.", err))
st, _ := status.FromError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not assume zero errors.

Comment on lines 165 to 188
func (r *StorageClusterPeerReconciler) updateStatus(obj client.Object) error {
return r.Client.Status().Update(r.ctx, obj)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Status update should be located at a single place in the code when you exist the reconcile logic and should only run once for each reconcile.

var ticketData services.OnboardingTicket
err = json.Unmarshal(message, &ticketData)
if err != nil {
return "", fmt.Errorf("failed to unmarshal onboarding ticket message. %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "", fmt.Errorf("failed to unmarshal onboarding ticket message. %v", err)
return "", fmt.Errorf("Onboarding ticket message is not a valid JSON. %v", err)

@rewantsoni rewantsoni force-pushed the scp-controller branch 4 times, most recently from c37c951 to 44f8993 Compare November 7, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants