OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled #4800

sadasu · 2025-01-22T17:22:39Z

Append /etc/hosts files with entries to resolve cluster api and api-int URLS. /etc/hosts will provide resolution for these URLs until kubelet joins the cluster and runs its CoreDNS pod which will then take over resolution of those 2 URLs

- What I did

- How to verify it

- Description for the changelog

openshift-ci-robot · 2025-01-22T17:23:32Z

@sadasu: This pull request references Jira Issue OCPBUGS-48469, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.19.0) matches configured target version for branch (4.19.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @jianli-wei

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Append /etc/hosts files with entries to resolve cluster api and api-int URLS. /etc/hosts will provide resolution for these URLs until kubelet joins the cluster and runs its CoreDNS pod which will then take over resolution of those 2 URLs

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

templates/common/gcp/units/gcp-update-etc-hosts.service.yaml

zaneb · 2025-01-23T22:52:52Z

templates/common/gcp/units/gcp-update-etc-hosts.service.yaml

+  apiServerIntURL={{ .Infra.Status.APIServerInternalURL }}
+  # Add the/etc/hosts configuration file
+  mkdir -p /etc/hosts/conf.d
+  cat <<EOF | tee /etc/hosts/conf.d/etc-hosts.conf


Could maybe name the file api.conf or something.

I was hoping that naming it as etc-hosts.conf would be it obvious that this file contains some configuration for /etc/hosts. Happy to call it api.conf is it contains information about resolving the API/-Int urls.

zaneb · 2025-01-23T23:07:01Z

templates/common/gcp/units/gcp-update-etc-hosts.service.yaml

+{{ else }}
+  exit 0
+{{ end }}
+  if [ -z "${apiIntLBIPs}" ]; then


Should this be:

Suggested change

if [ -z "${apiIntLBIPs}" ]; then

if [ -z "{{$apiIntLBIPs}}" ]; then

?

We appear to be missing a line like:

apiIntLBIPs={{$apiIntLBIPs}}

to get the template variable that we defined on line 19 into a bash variable.

zaneb · 2025-01-23T23:11:38Z

templates/common/gcp/units/gcp-update-etc-hosts.service.yaml

+  mkdir -p /etc/hosts/conf.d
+  cat <<EOF | tee /etc/hosts/conf.d/etc-hosts.conf
+  # Added by OpenShift
+  ${apiLBIPs[0]}    ${apiServerURL}


We are treating apiLBIPs as a bash array here. That's fine if it is, but bash arrays always seem like kind of a pain to set up to me. It might be simpler to do this in the template:

Suggested change

${apiLBIPs[0]} ${apiServerURL}

{{$apiLBIPs[0]}} ${apiServerURL}

zaneb · 2025-01-23T23:23:20Z

templates/common/gcp/files/usr-local-bin-update-etc-hosts.yaml

@@ -0,0 +1,11 @@
+mode: 0755
+path: "/usr/local/bin/update-etc-hosts"
+contents:


Potentially a simpler way, avoiding the script and systemd service:

path: "/etc/hosts" append: - inline: | {{ if and (eq .Infra.Status.PlatformStatus.Type "GCP") (.Infra.Status.PlatformStatus.GCP) (.Infra.Status.PlatformStatus.GCP.CloudLoadBalancerConfig) (eq .Infra.Status.PlatformStatus.GCP.CloudLoadBalancerConfig.DNSType "ClusterHosted") }} {{ $apiIntLBIPs := cloudPlatformAPIIntLoadBalancerIPs . }} {{ if len $apiIntLBIPs > 0 }} {{ $apiLBIPs := cloudPlatformAPILoadBalancerIPs . }} {{ if len $apiLBIPs > 0 }}{{ $apiLBIPs[0] }}{{ else }}{{ $apiIntLBIPs[0] }}{{ end }} {{ .Infra.Status.APIServerURL }} {{ $apiIntLBIPs[0] }} {{ .Infra.Status.APIServerInternalURL }} {{ end }} {{ end }}

The systemd service is providing us a way to time the running of this script before kubelet.

To check if feature to run in-cluster DNS on GCP and AWS is enabled by checking if the value of `PlatformStatus.GCP.CloudLoadBalancerConfig.DNSType` is set to `ClusterHosted`.

yuqi-zhang

Logically seems fine, although I am unsure how to test, so let me know if you'd like any QE pre-merge testing on this

yuqi-zhang · 2025-01-28T01:16:54Z

pkg/controller/template/render.go

@@ -777,6 +778,33 @@ func cloudPlatformIngressLoadBalancerIPs(cfg RenderConfig) (interface{}, error)
 	}
 }

+// cloudPlatformLBIPAvailable returns true when DNSType is set to `ClusterHosted`


Curious, based on the comment I'd expect some check for clusterhosted in the function. I guess it's implicit since the service enablement is dependent on this field?

(I know we do the same elsewhere in the template rendering, so I'm fine with it as is)

openshift-ci · 2025-01-28T01:17:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sadasu, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gpei · 2025-01-28T07:38:23Z

@sadasu Hi, I just tried to use the latest commit to test the installation of GCP custom DNS. The gcp-update-etc-hosts.service couldn't be started on masters because of the following syntax error:

[core@gpei-0128-gcpdns-5zs6h-master-0 ~]$ journalctl -u gcp-update-etc-hosts.service --no-pager
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 systemd[1]: Starting Update Default GCP /etc/hosts...
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 bash[1233]: /bin/bash: -c: line 1: syntax error near unexpected token `then'
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 bash[1233]: /bin/bash: -c: line 1: `    apiIntLBIPs=[10.0.0.2]      apiLBIPs=[34.54.248.13]    if [ -z $apiLBIPs ]; then    apiLBIPs=$apiIntLBIPs  fi  apiServerURL=https://api.gpei-0128-gcpdns.qe.gcp.devcluster.openshift.com:6443  apiServerIntURL=https://api-int.gpei-0128-gcpdns.qe.gcp.devcluster.openshift.com:6443  mkdir -p /etc/conf.d  cat <<EOF | tee /etc/conf.d/etc-hosts.conf              EOF  /usr/local/bin/update-etc-hosts'
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 systemd[1]: gcp-update-etc-hosts.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 systemd[1]: gcp-update-etc-hosts.service: Failed with result 'exit-code'.
Jan 28 05:58:05 gpei-0128-gcpdns-5zs6h-master-0 systemd[1]: Failed to start Update Default GCP /etc/hosts.

gpei · 2025-01-29T08:43:28Z

Hi @sadasu still something wrong with the syntax of the bash in gcp-update-etc-hosts.service

[core@gpei-0129-gcpdns-lf995-master-0 ~]$ journalctl -u gcp-update-etc-hosts.service --no-pager
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 systemd[1]: Starting Update Default GCP /etc/hosts...
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 bash[1230]: /bin/bash: line 1: warning: here-document at line 1 delimited by end-of-file (wanted `EOF')
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 bash[1231]: /bin/bash: line 1: if: command not found
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 bash[1232]: tee: /etc/conf.d/etc-hosts.conf: No such file or directory
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 bash[1232]: tee: EOF: Operation not permitted
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 systemd[1]: gcp-update-etc-hosts.service: Main process exited, code=exited, status=1/FAILURE
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 systemd[1]: gcp-update-etc-hosts.service: Failed with result 'exit-code'.
Jan 29 06:22:45 gpei-0129-gcpdns-lf995-master-0 systemd[1]: Failed to start Update Default GCP /etc/hosts.
[core@gpei-0129-gcpdns-lf995-master-0 ~]$ 
[core@gpei-0129-gcpdns-lf995-master-0 ~]$ 
[core@gpei-0129-gcpdns-lf995-master-0 ~]$ systemctl cat gcp-update-etc-hosts.service
# /etc/systemd/system/gcp-update-etc-hosts.service
[Unit]
Description=Update Default GCP /etc/hosts
# We don't need to do this on the firstboot
After=firstboot-osupdate.target
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet-dependencies.target

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/bin/bash -c " \
 \
apiIntLBIPs=[10.0.0.2] \
apiIntLBIP=10.0.0.2 \
 \
 \
apiLBIPs=[34.102.208.176] \
apiLBIP=34.102.208.176 \
 \
if [ -z "$apiLBIPs" ] \
then \
  # apiLBIPs will not be set on private clusters
  apiLBIPs=$apiIntLBIPs \
fi \
apiServerURL=https://api.gpei-0129-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
apiServerIntURL=https://api-int.gpei-0129-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
mkdir -p /etc/conf.d \
cat <<EOF | tee /etc/conf.d/etc-hosts.conf \
${apiLBIPs[0]}    ${apiServerURL} \
${apiIntLBIPs[0]}    ${apiServerIntURL} \
EOF \
# Update /etc/hosts \
/usr/local/bin/update-etc-hosts"

[Install]
RequiredBy=kubelet-dependencies.target

sadasu · 2025-01-30T03:58:42Z

@gpei thanks for testing the 2 previous versions. I don't see why the bash script with correct syntax has failures when run within the systemd unit. I have reorganized my code and posted another version.

gpei · 2025-01-30T07:58:23Z

@sadasu thanks for the update, with the latest code, we can move one step further now, but the /usr/local/bin/update-etc-hosts script seems still now working as expected somehow.

Here are the contents of some key files on the master for your reference:

[root@gpei-0130-gcpdns-hq6q8-master-2 ~]# cat /etc/conf.d/etc-hosts.conf
    
    
[root@gpei-0130-gcpdns-hq6q8-master-2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    
[root@gpei-0130-gcpdns-hq6q8-master-2 ~]# journalctl -u gcp-update-etc-hosts.service
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 systemd[1]: Starting Update Default GCP /etc/hosts...
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 bash[1235]:     
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 bash[1235]:     
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 bash[1229]: Done updating /etc/hosts
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 systemd[1]: gcp-update-etc-hosts.service: Deactivated successfully.
Jan 30 06:11:27 gpei-0130-gcpdns-hq6q8-master-2 systemd[1]: Finished Update Default GCP /etc/hosts.
[root@gpei-0130-gcpdns-hq6q8-master-2 ~]# cat /usr/local/bin/update-etc-hosts
#!/bin/bash
apiLBIP=${1}
apiURL=${2}
apiIntLBIP=${3}
apiIntURL=${4}
if [ -z "$apiLBIP" ]; then
  # apiLBIPs are not expected to be set on private clusters
  apiLBIP=$apiIntLBIP
fi
mkdir -p /etc/conf.d
etc_hosts_config_filename="/etc/conf.d/etc-hosts.conf"
echo "${apiLBIP}    ${apiURL%:*}" >> ${etc_hosts_config_filename}
echo "${apiIntLBIP}    ${apiIntURL%:*}" >> ${etc_hosts_config_filename}
cat /etc/conf.d/etc-hosts.conf
cat /etc/conf.d/etc-hosts.conf >> /etc/hosts
echo "Done updating /etc/hosts"
[root@gpei-0130-gcpdns-hq6q8-master-2 ~]# systemctl cat gcp-update-etc-hosts.service
# /etc/systemd/system/gcp-update-etc-hosts.service
[Unit]
Description=Update Default GCP /etc/hosts
# We don't need to do this on the firstboot
After=firstboot-osupdate.target
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet-dependencies.target

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/bin/bash -c " \
 \
apiIntLBIP=10.0.0.2 \
 \
 \
apiLBIP=34.160.207.88 \
 \
apiServerURL=https://api.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
apiServerIntURL=https://api-int.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
/usr/local/bin/update-etc-hosts ${apiLBIP} ${apiServerURL} ${apiIntLBIP} ${apiServerIntURL}"

[Install]
RequiredBy=kubelet-dependencies.target

It looks like two empty lines have been added to file /etc/conf.d/etc-hosts.conf, the value of ${apiLBIP}/${apiServerURL}/${apiIntLBIP}/${apiServerIntURL} were not loaded when running /usr/local/bin/update-etc-hosts, maybe we need to also define them in "Environment" of the gcp-update-etc-hosts systemd service or some other way to make sure the variables could be set correctly.

gpei · 2025-01-30T08:08:17Z

templates/common/gcp/files/usr-local-bin-update-etc-hosts.yaml

+    fi
+    mkdir -p /etc/conf.d
+    etc_hosts_config_filename="/etc/conf.d/etc-hosts.conf"
+    echo "${apiLBIP}    ${apiURL%:*}" >> ${etc_hosts_config_filename}


Another issue I'm seeing is, when I manually executed the following command

[root@gpei-0130-gcpdns-hq6q8-master-0 ~]# /usr/local/bin/update-etc-hosts 34.160.207.88 https://api.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com:6443 10.0.0.2 https://api-int.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com:6443 34.160.207.88 https://api.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com 10.0.0.2 https://api-int.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com Done updating /etc/hosts

It's adding records like
10.0.0.2 https://api-int.gpei-0130-gcpdns.qe.gcp.devcluster.openshift.com to /etc/hosts file, but it can't help on resolve the URL, so looks like we also need to remove the beginning "https://" in addition.

Something like below should work based on my test, just for your reference.

apiHostname=${apiURL#*//} apiIntHostname=${apiIntURL#*//} echo "${apiLBIP} ${apiHostname%%:*}" >> ${etc_hosts_config_filename} echo "${apiIntLBIP} ${apiIntHostname%%:*}" >> ${etc_hosts_config_filename}

Thanks! I had removed the tailing port number previously. Now updated to also remove the leading https://.

sadasu · 2025-01-30T17:09:31Z

@gpei thanks for you attention to this work. I ended up moving some simple bash code back to templates/common/gcp/units/gcp-update-etc-hosts.service.yaml. My last attempt before I move to the use of "Environment".

sadasu · 2025-01-31T00:59:42Z

/retest-required

gpei · 2025-01-31T07:25:43Z

@sadasu Now a new error was raised in the gcp-update-etc-hosts.service

[core@gpei-0131a-gcpdns-chwdj-master-2 ~]$ journalctl -u gcp-update-etc-hosts.service --no-pager
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 systemd[1]: Starting Update Default GCP /etc/hosts...
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 bash[1224]: mkdir: cannot create directory ‘etc_hosts_config_filename=’: Operation not permitted
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 bash[1224]: mkdir: cannot create directory ‘echo’: Operation not permitted
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 systemd[1]: gcp-update-etc-hosts.service: Main process exited, code=exited, status=1/FAILURE
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 systemd[1]: gcp-update-etc-hosts.service: Failed with result 'exit-code'.
Jan 31 01:15:37 gpei-0131a-gcpdns-chwdj-master-2 systemd[1]: Failed to start Update Default GCP /etc/hosts.

[core@gpei-0131a-gcpdns-chwdj-master-2 ~]$ systemctl cat gcp-update-etc-hosts.service
# /etc/systemd/system/gcp-update-etc-hosts.service
[Unit]
Description=Update Default GCP /etc/hosts
# We don't need to do this on the firstboot
After=firstboot-osupdate.target
# Wait for NetworkManager to report it's online
After=NetworkManager-wait-online.service
# Run before kubelet
Before=kubelet-dependencies.target

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/bin/bash -c " \
 \
apiIntLBIP=10.0.0.2  \
 \
apiLBIP=34.117.235.251 \
 \
apiServerURL=https://api.gpei-0131a-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
apiServerIntURL=https://api-int.gpei-0131a-gcpdns.qe.gcp.devcluster.openshift.com:6443 \
apiServerHostPort=${apiServerURL#*//} \
apiServerIntHostPort=${apiServerIntURL#*//} \
apiServerHostname=${apiServerHostPort%:*} \
apiIntServerHostname=${apiServerIntHostPort%:*} \
mkdir -p /etc/conf.d \
etc_hosts_config_filename="/etc/conf.d/etc-hosts.conf" \
echo "${apiLBIP}    ${apiServerHostname}" >> ${etc_hosts_config_filename} \
echo "${apiIntLBIP}    ${apiIntServerHostname}" >> ${etc_hosts_config_filename} \
/usr/local/bin/update-etc-hosts"

[Install]
RequiredBy=kubelet-dependencies.target

I tried to put the creating /etc/conf.d directory task into ExecStartPre as following, it's working but still nothing was written to "/etc/conf.d/etc-hosts.conf"

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStartPre=/usr/bin/mkdir -p /etc/conf.d
ExecStart=/bin/bash -c " \
...

[root@gpei-0131a-gcpdns-chwdj-master-2 core]# ls -al /etc/conf.d/
total 12
drwxr-xr-x.   2 root root    6 Jan 31 01:15 .
drwxr-xr-x. 100 root root 8192 Jan 31 01:15 ..

Not sure if it's related to the write permission on that directory somehow, but even I specified User=root in the service file, "/etc/conf.d/etc-hosts.conf" file is still not present.

sadasu · 2025-01-31T15:07:41Z

I tried to put the creating /etc/conf.d directory task into ExecStartPre as following, it's working but still nothing was written to "/etc/conf.d/etc-hosts.conf"

what if we created the file in ExecStartPre too. Like,
ExecStartPre=/usr/bin/mkdir -p /etc/conf.d; /usr/bin/echo "#Created by OpenShift" >> /etc/conf.d/etc-hosts.conf
It might be worth a try.

I am working on another version, that moves all file operations back to templates/common/gcp/files/usr-local-bin-update-etc-hosts.yaml

Append /etc/hosts files with entries to resolve cluster api and api-int URLS. /etc/hosts will provide resolution for these URLs until kubelet joins the cluster and runs its CoreDNS pod which will then take over resolution of those 2 URLs

Added tests to accomodate GCP in-cluster DNS config

openshift-ci · 2025-02-01T02:22:30Z

@sadasu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`bd13be9`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/bootstrap-unit	`bd13be9`	link	false	`/test bootstrap-unit`
ci/prow/e2e-gcp-op-single-node	`bd13be9`	link	true	`/test e2e-gcp-op-single-node`
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`bd13be9`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

gpei · 2025-02-01T06:04:43Z

@sadasu I think it's working as expected this time 🎉

IP address mapping of API/API-INT were both added into /etc/hosts, and masters are joined into the cluster.

[root@gpei-0201-gcpdns-mfggf-master-0 ~]# cat /etc/conf.d/etc-hosts.conf
35.241.55.195    api.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com
10.0.0.2    api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com
[root@gpei-0201-gcpdns-mfggf-master-0 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
35.241.55.195    api.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com
10.0.0.2    api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com
[root@gpei-0201-gcpdns-mfggf-master-0 ~]# 
[root@gpei-0201-gcpdns-mfggf-master-0 ~]# systemctl status gcp-update-etc-hosts.service
○ gcp-update-etc-hosts.service - Update Default GCP /etc/hosts
     Loaded: loaded (/etc/systemd/system/gcp-update-etc-hosts.service; enabled; preset: disabled)
     Active: inactive (dead) since Sat 2025-02-01 04:59:38 UTC; 41min ago
   Main PID: 1227 (code=exited, status=0/SUCCESS)
        CPU: 6ms

Feb 01 04:59:38 gpei-0201-gcpdns-mfggf-master-0 systemd[1]: Starting Update Default GCP /etc/hosts...
Feb 01 04:59:38 gpei-0201-gcpdns-mfggf-master-0 update-etc-hosts[1227]: Done updating /etc/hosts
Feb 01 04:59:38 gpei-0201-gcpdns-mfggf-master-0 systemd[1]: gcp-update-etc-hosts.service: Deactivated successfully.
Feb 01 04:59:38 gpei-0201-gcpdns-mfggf-master-0 systemd[1]: Finished Update Default GCP /etc/hosts.
[root@gpei-0201-gcpdns-mfggf-master-0 ~]# cat /usr/local/bin/update-etc-hosts
#!/bin/bash


apiIntLBIP=10.0.0.2 

apiLBIP=35.241.55.195

apiServerURL=https://api.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com:6443
apiServerIntURL=https://api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com:6443
apiServerHostPort=${apiServerURL#*//}
apiServerIntHostPort=${apiServerIntURL#*//}
apiServerHostname=${apiServerHostPort%:*}
apiIntServerHostname=${apiServerIntHostPort%:*}
mkdir -p /etc/conf.d
etc_hosts_config_filename="/etc/conf.d/etc-hosts.conf"
echo "${apiLBIP}    ${apiServerHostname}" >> ${etc_hosts_config_filename}
echo "${apiIntLBIP}    ${apiIntServerHostname}" >> ${etc_hosts_config_filename}
if [ -f ${etc_hosts_config_filename} ]
then
  cat /etc/conf.d/etc-hosts.conf >> /etc/hosts
  echo "Done updating /etc/hosts"
fi

[root@preserve-gpei-worker k_files]# oc get node
NAME                              STATUS   ROLES                  AGE   VERSION
gpei-0201-gcpdns-mfggf-master-0   Ready    control-plane,master   52m   v1.32.1
gpei-0201-gcpdns-mfggf-master-1   Ready    control-plane,master   51m   v1.32.1
gpei-0201-gcpdns-mfggf-master-2   Ready    control-plane,master   51m   v1.32.1

[root@preserve-gpei-worker k_files]# oc get co
NAME                                       VERSION                                                AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   False       True          True       49m     APIServicesAvailable: PreconditionNotReady...
baremetal                                  4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
cloud-controller-manager                   4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      52m     
cloud-credential                           4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      52m     
cluster-api                                4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m     
cluster-autoscaler                                                                                True        False         True       47m     machine-api not ready
config-operator                            4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      49m     
console                                                                                                                                        
control-plane-machine-set                  4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
csi-snapshot-controller                    4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      49m     
dns                                        4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m     
etcd                                       4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        True          False      47m     NodeInstallerProgressing: 1 node is at revision 0; 1 node is at revision 1; 1 node is at revision 3; 0 nodes have achieved new revision 7
image-registry                                                                                                                                 
ingress                                                                                           False       True          True       47m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights                                   4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m     
kube-apiserver                             4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        True          True       45m     GuardControllerDegraded: Missing operand on node gpei-0201-gcpdns-mfggf-master-2
kube-controller-manager                    4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        True          False      45m     NodeInstallerProgressing: 3 nodes are at revision 4; 0 nodes have achieved new revision 5
kube-scheduler                             4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        True          False      45m     NodeInstallerProgressing: 1 node is at revision 0; 2 nodes are at revision 5
kube-storage-version-migrator              4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      49m     
machine-api                                                                                       False       True          True       47m     Operator is initializing
machine-approver                           4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m     
machine-config                             4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
marketplace                                4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
monitoring                                                                                        False       True          True       32m     UpdatingPrometheusOperator: reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded: got 2 unavailable replicas
network                                    4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      51m     
node-tuning                                4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m     
olm                                        4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   False       True          True       49m     CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment...
openshift-apiserver                        4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   False       True          True       49m     APIServicesAvailable: PreconditionNotReady
openshift-controller-manager               4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
openshift-samples                                                                                                                              
operator-lifecycle-manager                 4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
operator-lifecycle-manager-catalog         4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      47m     
operator-lifecycle-manager-packageserver   4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      41m     
service-ca                                 4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      49m     
storage                                    4.19.0-0.test-2025-02-01-015036-ci-ln-7f63wsk-latest   True        False         False      48m

gpei · 2025-02-01T06:12:01Z

The new problem I can see now is the workers are trying to fetch worker ignition from the original MCS address "https://api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com:22623/config/worker" which couldn't be resolved, looks like we'll need to replace the URL to the internal LB IP address?

Here's the console Serial of worker:

�[K[�[0m�[0;31m*     �[0m] A start job is running for Ignition (fetch) (50min 30s / no limit)
[ 3034.007742] ignition[823]: GET https://api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com:22623/config/worker: attempt #607
[ 3034.044185] ignition[823]: GET error: Get "https://api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com:22623/config/worker": dial tcp: lookup api-int.gpei-0201-gcpdns.qe.gcp.devcluster.openshift.com on 169.254.169.254:53: no such host

I believe we can create another bug to track this issue.
(Update, created https://issues.redhat.com/browse/OCPBUGS-49737 to track the worker ignition fetching separately)

sadasu changed the title ~~GCP: Update /etc/hosts file when ClusterHostedDNS is enabled~~ WIP: OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled Jan 22, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 22, 2025

openshift-ci bot requested review from jianli-wei, djoshy and RishabhSaini January 22, 2025 17:23

sadasu force-pushed the gcp-update-etc-hosts branch 9 times, most recently from 1bc96ca to 3280ff8 Compare January 23, 2025 21:50

zaneb reviewed Jan 24, 2025

View reviewed changes

Add a template method to determine if in-cluster DNS is enabled

2e39cbc

To check if feature to run in-cluster DNS on GCP and AWS is enabled by checking if the value of `PlatformStatus.GCP.CloudLoadBalancerConfig.DNSType` is set to `ClusterHosted`.

sadasu force-pushed the gcp-update-etc-hosts branch 2 times, most recently from 15c0227 to 3d451a6 Compare January 27, 2025 19:21

sadasu changed the title ~~WIP: OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled~~ OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled Jan 27, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2025

sadasu force-pushed the gcp-update-etc-hosts branch 5 times, most recently from 91d44ca to 1b13a9c Compare January 27, 2025 21:57

yuqi-zhang approved these changes Jan 28, 2025

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2025

sadasu force-pushed the gcp-update-etc-hosts branch from 1b13a9c to 77c93b5 Compare January 29, 2025 04:34

sadasu force-pushed the gcp-update-etc-hosts branch from 77c93b5 to 171e94a Compare January 30, 2025 03:55

gpei reviewed Jan 30, 2025

View reviewed changes

sadasu force-pushed the gcp-update-etc-hosts branch 2 times, most recently from f790dfb to 30ea55d Compare January 30, 2025 17:07

sadasu force-pushed the gcp-update-etc-hosts branch from 30ea55d to b02b304 Compare January 30, 2025 19:15

sadasu added 2 commits January 31, 2025 17:02

GCP: Update test code for rendering Machine configs

bd13be9

Added tests to accomodate GCP in-cluster DNS config

sadasu force-pushed the gcp-update-etc-hosts branch from 8a70084 to bd13be9 Compare January 31, 2025 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled #4800

OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled #4800

sadasu commented Jan 22, 2025

openshift-ci-robot commented Jan 22, 2025

zaneb Jan 23, 2025

sadasu Jan 27, 2025

zaneb Jan 23, 2025

zaneb Jan 23, 2025

zaneb Jan 23, 2025

sadasu Jan 27, 2025 •

edited

Loading

yuqi-zhang left a comment

yuqi-zhang Jan 28, 2025

openshift-ci bot commented Jan 28, 2025

gpei commented Jan 28, 2025

gpei commented Jan 29, 2025

sadasu commented Jan 30, 2025

gpei commented Jan 30, 2025 •

edited

Loading

gpei Jan 30, 2025

sadasu Jan 30, 2025

sadasu commented Jan 30, 2025 •

edited

Loading

sadasu commented Jan 31, 2025

gpei commented Jan 31, 2025 •

edited

Loading

sadasu commented Jan 31, 2025 •

edited

Loading

openshift-ci bot commented Feb 1, 2025

gpei commented Feb 1, 2025

gpei commented Feb 1, 2025 •

edited

Loading

	if [ -z "${apiIntLBIPs}" ]; then
	if [ -z "{{$apiIntLBIPs}}" ]; then

	${apiLBIPs[0]} ${apiServerURL}
	{{$apiLBIPs[0]}} ${apiServerURL}

OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled #4800

Are you sure you want to change the base?

OCPBUGS-48469: GCP: Update /etc/hosts file when ClusterHostedDNS is enabled #4800

Conversation

sadasu commented Jan 22, 2025

openshift-ci-robot commented Jan 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadasu Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

yuqi-zhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 28, 2025

gpei commented Jan 28, 2025

gpei commented Jan 29, 2025

sadasu commented Jan 30, 2025

gpei commented Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadasu commented Jan 30, 2025 • edited Loading

sadasu commented Jan 31, 2025

gpei commented Jan 31, 2025 • edited Loading

sadasu commented Jan 31, 2025 • edited Loading

openshift-ci bot commented Feb 1, 2025

gpei commented Feb 1, 2025

gpei commented Feb 1, 2025 • edited Loading

sadasu Jan 27, 2025 •

edited

Loading

gpei commented Jan 30, 2025 •

edited

Loading

sadasu commented Jan 30, 2025 •

edited

Loading

gpei commented Jan 31, 2025 •

edited

Loading

sadasu commented Jan 31, 2025 •

edited

Loading

gpei commented Feb 1, 2025 •

edited

Loading