Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x #382

rm3l · 2024-06-07T14:05:25Z

/kind bug

What did you do exactly?

From the operator repo, switch to the 1.1.x branch and deploy the operator from RHDH 1.1 (setting the IMG arg because the default image for this branch (quay.io/janus-idp/operator:0.1.3) has expired and no longer exists on quay.io)

git switch 1.1.x
make deploy IMG=quay.io/rhdh/rhdh-rhel9-operator:1.1

Create a simple CR (e.g., examples/bs1.yaml)

kubectl apply -f examples/bs1.yaml

Wait a few seconds until all the resources are created
Check the Backstage Custom Resource status. Reason should be DeployOK:

$ kubectl describe backstage bs1                                                                                                                                                           
Name:         bs1                                                                                                                                                                              
Namespace:    my-ns                                                                                                                                                                            
Labels:       <none>                                                                                                                                                                           
Annotations:  <none>                                                                                                                                                                           
API Version:  rhdh.redhat.com/v1alpha1                                                                                                                                                         
Kind:         Backstage                                                                                                                                                                        
Metadata:                                      
  Creation Timestamp:  2024-06-07T12:21:58Z                                                                                                                                                    
  Generation:          1                                                                       
  Resource Version:    48634                   
  UID:                 e4e9766f-6c32-4b44-85cd-1eac93b56f16                                                                                                                                    
Status:                                                                                                                                                                                        
  Conditions:                      
    Last Transition Time:  2024-06-07T12:21:58Z 
    Message:                                                                                                                                                                                       Reason:                DeployOK                                                                                                                                                            
    Status:                True                                                                                                                                                                
    Type:                  Deployed            
Events:                    <none>

Switch to the 1.2.x branch and deploy the upcoming 1.2 operator

git switch 1.2.x
make deploy

Wait a few seconds until the new version of the operator pod is running and the existing CR is reconciled again. Then check the CR status again. Reason will be DeployFailed with an error :

$ kubectl describe backstage bs1
Name:         bs1
Namespace:    my-ns
Labels:       <none>
Annotations:  <none>
API Version:  rhdh.redhat.com/v1alpha1
Kind:         Backstage
Metadata:
  Creation Timestamp:  2024-06-07T13:44:15Z
  Generation:          1
  Resource Version:    2846
  UID:                 5cdad2eb-0840-4c8e-8c2b-08be46e3856a
Status:
  Conditions:
    Last Transition Time:  2024-06-07T13:49:06Z
    Message:               failed to apply backstage objects failed to patch object &Service{ObjectMeta:{backstage-psql-bs1  my-ns   2100 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/instance:bs1 app.kubernetes.io/name:backstage rhdh.redhat.com/app:backstage-psql-bs1] map[] [{rhdh.redhat.com/v1alpha1 Backstage bs1 5cdad2eb-0840-4c8e-8c2b-08be46e3856a 0xc000590f55 0xc000590f54}] [] []},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:,Protocol:,Port:5432,TargetPort:{0 0 },NodePort:0,AppProtocol:nil,},},Selector:map[string]string{rhdh.redhat.com/app: backstage-psql-bs1,},ClusterIP:None,Type:,ExternalIPs:[],SessionAffinity:,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:nil,ClusterIPs:[],IPFamilies:[],AllocateLoadBalancerNodePorts:nil,LoadBalancerClass:nil,InternalTrafficPolicy:nil,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},}: failed to patch object *v1.Service: Service "backstage-psql-bs1" is invalid: spec.clusterIPs[0]: Invalid value: []string{"None"}: may not change once set
    Reason:                DeployFailed
    Status:                False
    Type:                  Deployed
Events:                    <none>

Actual behavior

It seems the existing CR could not be reconciled successfully with the new version of the operator because it was unable to patch the existing database Service object.

If we take a look at the resources, a new Backstage pod was in the process of being created, but is stuck on trying a mount a ConfigMap (which could not be created because of the failure to patch the DB Service):

$ kubectl get pod

NAME                             READY   STATUS     RESTARTS   AGE
backstage-psql-bs1-0             1/1     Running    0          9m22s
backstage-bs1-655f659ddc-n7grw   1/1     Running    0          9m22s
backstage-bs1-6469fdd48f-ldq5h   0/1     Init:0/1   0          4m31s

$ kubectl describe pod backstage-bs1-6469fdd48f-ldq5h

[...]
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  5m18s                 default-scheduler  0/1 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "backstage-bs1-6469fdd48f-ldq5h-dynamic-plugins-root". preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
  Normal   Scheduled         5m11s                 default-scheduler  Successfully assigned my-ns/backstage-bs1-6469fdd48f-ldq5h to k3d-k3s-default-server-0
  Warning  FailedMount       3m9s                  kubelet            Unable to attach or mount volumes: unmounted volumes=[backstage-appconfig-bs1], unattached volumes=[backstage-appconfig-bs1 dynamic-plugins-root dynamic-plugins-npmrc backstage-dynamic-plugins-bs1]: timed out waiting for the condition
  Warning  FailedMount       62s (x10 over 5m12s)  kubelet            MountVolume.SetUp failed for volume "backstage-appconfig-bs1" : configmap "backstage-appconfig-bs1" not found
  Warning  FailedMount       51s                   kubelet            Unable to attach or mount volumes: unmounted volumes=[backstage-appconfig-bs1], unattached volumes=[dynamic-plugins-root dynamic-plugins-npmrc backstage-dynamic-plugins-bs1 backstage-appconfig-bs1]: timed out waiting for the condition

Note that the same issue happens when upgrading from the operator channels on OpenShift (https://github.com/janus-idp/operator/blob/main/.rhdh/docs/installing-ci-builds.adoc).

Expected behavior

CR reconciliation should be successful, and the application should be upgraded.

The text was updated successfully, but these errors were encountered:

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 7, 2024

github-actions bot added the jira Issue will be sync'ed to Red Hat JIRA label Jun 7, 2024

rm3l mentioned this issue Jun 10, 2024

fix: Fix reconciliation error preventing existing Backstage operands from being upgraded when the operator is upgraded from 1.1.x [RHIDP-2597] #384

Merged

3 tasks

openshift-merge-bot bot closed this as completed in #384 Jun 11, 2024

rm3l self-assigned this Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x #382

Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x #382

rm3l commented Jun 7, 2024 •

edited

Loading

Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x #382

Existing Backstage operand not upgraded (stuck on mounting a ConfigMap) after upgrading operator from 1.1.x to 1.2.x #382

Comments

rm3l commented Jun 7, 2024 • edited Loading

What did you do exactly?

Actual behavior

Expected behavior

rm3l commented Jun 7, 2024 •

edited

Loading