Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed on upgrading BOSH Director from v271.2.0 to v280.0.14 #2490

Open
phong2tran opened this issue Jan 27, 2024 · 4 comments
Open

Failed on upgrading BOSH Director from v271.2.0 to v280.0.14 #2490

phong2tran opened this issue Jan 27, 2024 · 4 comments

Comments

@phong2tran
Copy link

phong2tran commented Jan 27, 2024

Describe the bug
Failed on upgrading BOSH Director from v271.2.0 to v280.0.14

To Reproduce
Steps to reproduce the behavior (example):
Deploy a bosh director v271.2.0 on vSphere:

$ ./create-env.sh sandbox-cfar 271.2.0
Deployment manifest: '/SANDBOX-CFAR/bosh-director/bosh-deployment-271.2.0/bosh.yml'
Deployment state: '/SANDBOX-CFAR/bosh-director/sandbox-cfar-state.json'

Started validating
  Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh'... Finished (00:00:03)
  Downloading release 'bpm'... Finished (00:00:03)
  Validating release 'bpm'... Finished (00:00:02)
  Downloading release 'bosh-vsphere-cpi'... Finished (00:00:00)
  Validating release 'bosh-vsphere-cpi'... Finished (00:00:01)
  Downloading release 'uaa'... Finished (00:00:09)
  Validating release 'uaa'... Finished (00:00:05)
  Downloading release 'credhub'... Finished (00:00:03)
  Validating release 'credhub'... Finished (00:00:02)
  Downloading release 'os-conf'... Finished (00:00:00)
  Validating release 'os-conf'... Finished (00:00:00)
  Downloading release 'backup-and-restore-sdk'... Finished (00:00:05)
  Validating release 'backup-and-restore-sdk'... Finished (00:00:09)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Finished (00:00:12)
  Validating stemcell... Finished (00:00:05)
Finished validating (00:01:26)

Started installing CPI
  Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:04:19)
  Compiling package 'vsphere_cpi/5dffb632edb799be8e2c7aeed263409627b201d6143ce427621f40d6dd461993'... Finished (00:01:53)
  Compiling package 'iso9660wrap/b9eee11ca7251f93ef853db345596783012ae26b5d6ec5cb3d29bf295899c973'... Finished (00:00:00)
  Installing packages... Finished (00:00:01)
  Rendering job templates... Finished (00:00:00)
  Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:06:15)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-bionic-go_agent/1.92'... Finished (00:01:26)

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'sc-74133471-3d5c-4444-8ae0-1b749056bf79'... Finished (00:01:16)
  Waiting for the agent on VM 'vm-2e30ee54-968d-4407-b0e2-0a2c448f6695' to be ready... Finished (00:00:10)
  Creating disk... Finished (00:00:28)
  Attaching disk 'disk-36f89546-442f-4600-b482-ed148588a756' to VM 'vm-2e30ee54-968d-4407-b0e2-0a2c448f6695'... Finished (00:01:08)
  Rendering job templates... Finished (00:00:22)
  Compiling package 'golang/7b633f7a140b41ef9427109d0f3032cf81445ead'... Finished (00:00:27)
  Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:03:18)
  Compiling package 'mysql/788d06685e1ea1d316759eeeb506782ec7f9302f8c21e2ff04cd4703579f0935'... Finished (00:00:46)
  Compiling package 'libpq/ecbfa62322b4124f25372a19d68b83295b4d290503153667ec378e3196c45f69'... Finished (00:00:28)
  Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:03:15)
  Compiling package 'database-backup-restorer-boost/05f72399bdd8d91643f42ac411ba65befb78ac0334484dbc3ca95c5286ab7680'... Finished (00:00:19)
  Compiling package 'tini/3d7b02f3eeb480b9581bec4a0096dab9ebdfa4bc'... Finished (00:00:02)
  Compiling package 'bpm-runc/3dcaebacd63b8adc75c5f32954f11041885347b1'... Finished (00:01:47)
  Compiling package 'openjdk_1.8.0/225f67373c9ad0a1da464aeb92f06207bd3e8da1'... Finished (00:00:08)
  Compiling package 'golang-1-linux/7fdbb13e913f2f05232da046b27642ceebab32adf2e78ef3582b63ae6d60df96'... Finished (00:00:27)
  Compiling package 'libpcre2/d5cd2e4263fda94bfeec68d2a388b9e6bb17fa15e28e09c99ebe6a4faa3328f5'... Finished (00:00:14)
  Compiling package 'director/f32385256198535b797059dd4990fcb3b65c0c07337990163c24275a7a29b7e1'... Finished (00:01:25)
  Compiling package 'verify_multidigest/64d1958934e10a0eccc05ddf0d7ba0c8215e6f6d4c227cb93998087335378fa8'... Finished (00:00:01)
  Compiling package 'vsphere_cpi/5dffb632edb799be8e2c7aeed263409627b201d6143ce427621f40d6dd461993'... Finished (00:01:18)
  Compiling package 'davcli/58f558960854f58c55e3d506d3906019178dbc189fbbed1616b8b3c7c02142ea'... Finished (00:00:01)
  Compiling package 'gonats/f58980bd4b0436ff65f588627116dfff63f346f4d13175b7ba47380ab89e08a6'... Finished (00:00:01)
  Compiling package 'database-backup-restorer-postgres-9.4/70d321821ff300fbaef47d64fb7f7b5d33ede23c2349cbf1950886c40f25c2e8'... Finished (00:04:36)
  Compiling package 'database-backup-restorer-postgres-10/41f9bdf0c158e18e850a5744250a39b425f385529b234941c9acf1f6631a3424'... Finished (00:05:14)
  Compiling package 'database-backup-restorer-mysql-5.7/81418214987edce3b03159014ac68449689086d696be746e14857f7551f8f3f6'... Finished (00:02:51)
  Compiling package 'nginx/d4cf69d3e81bed005ebba5bc0bc8d2c28252e70ad47ff455479a9838d5f9b0e4'... Finished (00:01:02)
  Compiling package 'database-backup-restorer-postgres-13/0c18508216826e03c23c623d2f1989405831375c9d457e0ac619125c32b15371'... Finished (00:06:01)
  Compiling package 'database-backup-restorer-postgres-11/be5ee4b5015679ea4d92295ea1eb9a58480c3fff155f69cd1a92f800c11a0c91'... Finished (00:05:38)
  Compiling package 'bpm/818bd9ec39fa5e179c5406c1690fb7c6deb0fc4d'... Finished (00:00:11)
  Compiling package 'postgres-9.4/601f3635b43d0e7ba3ae866e3bd69425cdf33f7fb34a7f1bb21cc26818fb598e'... Finished (00:04:31)
  Compiling package 'credhub/33ea568aad1d35e9522c56f792d3d4fc3cd5975d'... Finished (00:00:07)
  Compiling package 's3cli/7e752dee192da026f6a0cdf2653b855cc6efbe6b041564660f8520c39ddd5a78'... Finished (00:00:02)
  Compiling package 'health_monitor/dd842698e83edeae08bdcc6e672429a5cee3b755645d2024d97b6213f1281d44'... Finished (00:00:34)
  Compiling package 'database-backup-restorer/7c0d80a713009aecb8d6533918a2bf45f7ad0319f50ecca1789fc230aa6d5dd9'... Finished (00:00:06)
  Compiling package 'database-backup-restorer-mariadb/af78e79c98c11c29a721b1d7ba554dd7d0bf25e2789fa933b96bbfd67d697465'... Finished (00:02:12)
  Compiling package 'luna-hsm-client-7.4/746f3c30aadc0af7afc2d5cddcc16d8836a8f845'... Finished (00:00:04)
  Compiling package 'postgres-10/708f8446db4ac7bb21bddce9938e217c741a6e6f82f6209f7e6f6a2b5b25eed3'... Finished (00:05:05)
  Compiling package 'bosh-gcscli/52223432539bbd0607db053f542440869688b4404dd65f2ddf33c2d195b1b891'... Finished (00:00:02)
  Compiling package 'uaa/4f77a97610b962f50d0c21067b48bd467db6066855318c766af8bc1cb990e799'... Finished (00:00:35)
  Compiling package 'iso9660wrap/b9eee11ca7251f93ef853db345596783012ae26b5d6ec5cb3d29bf295899c973'... Finished (00:00:01)
  Compiling package 'database-backup-restorer-mysql-5.6/01bf18f19277261bcccac9736d7634b49eb184a93cd6549b78f4e1d75eabe35a'... Finished (00:02:14)
  Compiling package 'database-backup-restorer-postgres-9.6/6a8fcf2d66b67507403df885b84c4b7cc1d66289f2d7efc5914b43dd2305491c'... Finished (00:05:07)
  Updating instance 'bosh/0'... Finished (00:03:08)
  Waiting for instance 'bosh/0' to be running... Finished (00:01:46)
  Running the post-start scripts 'bosh/0'... Finished (00:00:21)
Finished deploying (01:09:07)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Succeeded

root@0036416c4de8:/SANDBOX-CFAR/bosh-director# . bosh-env.sh sandbox-cfar 271.2.0
root@0036416c4de8:/SANDBOX-CFAR/bosh-director# bosh env
Using environment '10.9.202.186' as client 'admin'

Name               sandbox-cfar
UUID               a234617f-6e58-462f-ac51-52c722c3834b
Version            271.2.0 (00000000)
Director Stemcell  ubuntu-bionic/1.92
CPI                vsphere_cpi
Features           compiled_package_cache: disabled
                   config_server: enabled
                   local_dns: enabled
                   power_dns: disabled
                   snapshots: disabled
User               admin

Succeeded

Upload stemcell ubuntu-bionic 1.92

Deploy cf-deployment 21.5.0.

Upgrade the current bosh director v271.2.0 to v280.0.14

$ ./create-env.sh sandbox-cfar 280.0.14
Deployment manifest: '/var/vcap/store/deployment-vm/home/ptran/workspace/SANDBOX-CFAR/bosh-director/bosh-deployment-280.0.14/bosh.yml'
Deployment state: '/var/vcap/store/deployment-vm/home/ptran/workspace/SANDBOX-CFAR/bosh-director/sandbox-cfar-state.json'

Started validating
  Downloading release 'bosh'... Finished (00:00:01)
  Validating release 'bosh'... Finished (00:00:01)
  Downloading release 'bpm'... Finished (00:00:00)
  Validating release 'bpm'... Finished (00:00:00)
  Downloading release 'bosh-vsphere-cpi'... Finished (00:00:01)
  Validating release 'bosh-vsphere-cpi'... Finished (00:00:02)
  Downloading release 'uaa'... Finished (00:00:03)
  Validating release 'uaa'... Finished (00:00:02)
  Downloading release 'credhub'... Finished (00:00:01)
  Validating release 'credhub'... Finished (00:00:01)
  Downloading release 'os-conf'... Finished (00:00:00)
  Validating release 'os-conf'... Finished (00:00:00)
  Downloading release 'backup-and-restore-sdk'... Finished (00:00:04)
  Validating release 'backup-and-restore-sdk'... Finished (00:00:03)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Skipped [Found in local cache] (00:00:00)
  Validating stemcell... Finished (00:00:12)
Finished validating (00:00:39)

Started installing CPI
  Compiling package 'ruby-3.1/8b225e7cc2608305a7b784b5828b2b4b7c7adc3eb14af46e313d64a9e14a3ad6'... Finished (00:03:39)
  Compiling package 'golang-1-darwin/e6383fc2adbcb1dc5ab18d32b737b1729ff3226b774a358504a44bc5d6bd097f'... Finished (00:00:23)
  Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Finished (00:00:24)
  Compiling package 'vsphere_cpi/54bcc7a48ba47cc7df2b8dd4704bc8dbb46b945b1a91cbc147262803557a6a7a'... Finished (00:00:35)
  Compiling package 'iso9660wrap/b351c796826a0a3a57e13bad036c12a3958c38f9370bbb50540e782582baaf79'... Finished (00:00:31)
  Installing packages... Finished (00:00:07)
  Rendering job templates... Finished (00:00:00)
  Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:05:41)

Uploading stemcell 'bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.340'... Skipped [Stemcell already uploaded] (00:00:00)

Started deploying
  Waiting for the agent on VM 'vm-aef0966d-e843-41ff-873d-2acfe6ee88bb'... Finished (00:00:00)
  Draining jobs on instance 'unknown/0'... Finished (00:00:07)
  Stopping jobs on instance 'unknown/0'... Finished (00:00:00)
  Unmounting disk 'disk-36f89546-442f-4600-b482-ed148588a756'... Finished (00:00:01)
  Deleting VM 'vm-aef0966d-e843-41ff-873d-2acfe6ee88bb'... Finished (00:00:22)
  Creating VM for instance 'bosh/0' from stemcell 'sc-74437d41-122f-4224-a3e1-6266ff62e4df'... Finished (00:00:58)
  Waiting for the agent on VM 'vm-57d3af3a-29bf-4b39-944b-3bcb03d5a164' to be ready... Finished (00:00:29)
  Attaching disk 'disk-36f89546-442f-4600-b482-ed148588a756' to VM 'vm-57d3af3a-29bf-4b39-944b-3bcb03d5a164'... Finished (00:00:40)
  Rendering job templates... Finished (00:00:28)
  Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'golang-1-darwin/e6383fc2adbcb1dc5ab18d32b737b1729ff3226b774a358504a44bc5d6bd097f'... Finished (00:00:36)
  Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Finished (00:00:35)
  Compiling package 'ruby-3.1/8b225e7cc2608305a7b784b5828b2b4b7c7adc3eb14af46e313d64a9e14a3ad6'... Finished (00:15:25)
  Compiling package 'director-ruby-3.2/84ee2f9d0485530a75822fa03e7fd0c73544aa4c2f6fe24aaebebe1757195efe'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'tini/3d7b02f3eeb480b9581bec4a0096dab9ebdfa4bc'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'bpm-runc/923e2cae4f8f54cd58de0349352bb14f8662cfa5'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'libopenssl1/7f27f8cdc6cd6f6f865bfbe67ab853977e1505d2ca558415df9bf692eb1b0d63'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'openjdk_17.0/a805b67e0bbf99e97ca878960971301e56d951f67ab5ca14be11553b356556e8'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-boost/05f72399bdd8d91643f42ac411ba65befb78ac0334484dbc3ca95c5286ab7680'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'libpcre2/22fb4c5ee63919fa1e4b1e720fe048f8c55d8998858aeb8172ca67cbdcd0e6de'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'mysql/7ec79ca2b57047da0b337c62944439493b60c1bd5a2767444362cfd1c7b2bbd9'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'libpq/b309a72768019e24e2c592f3f25ded2679e98cbb90f774c3a4d6b7745760079f'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'golang-1-linux/c2342901fca75f4c7ec3f32e6a757e923089c6c50d8eb3effd2c25eac1009e31'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'postgres-15/1059ac62d543dc19011001f80f8c0bb99cc3a9ea4f8c14736e480701051ce9f0'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-postgres-15/162c4cca97dcfd5b12d4241bf40ae421cb3c4fbdbf215ce601f3267865501f66'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'luna-hsm-client-7.4/5956cbd4d17c28c2e4c29f3906e3faddc1d7b921708740f1a532a37d5b6fbe29'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'iso9660wrap/b351c796826a0a3a57e13bad036c12a3958c38f9370bbb50540e782582baaf79'... Finished (00:00:29)
  Compiling package 'vsphere_cpi/54bcc7a48ba47cc7df2b8dd4704bc8dbb46b945b1a91cbc147262803557a6a7a'... Finished (00:01:07)
  Compiling package 'database-backup-restorer-mysql-8.0/488fb8d45895a348f88ca2984fa36939687ad6978deebabd8ee70a1514776f17'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'nats/52d36e5308f7aeced172092016c0fd34f9195ff2788d3106fc2d5cf1ac192c1a'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'bpm/a37a126c1b31da99ab252f4668953a38c4748864'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-mysql-5.6/86603abfbb0d59ebf924449e97fecc422af66d7941bf5498a05099b653a8d3eb'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-postgres-13/ea27ff50286f247ab3acdb3c7cc2101c6d7a666a4eec7c669f7e34e3ef1b51e6'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-postgres-11/b9125bf430a1cf1d00ab83c72e4c5be26f6de52c5315b82beda286d31f4e7cc1'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'davcli/ca2605d13c62b479a215162ea17769326d6f7e37d1002c85816534013235b7d4'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'credhub/e3913a55fb5116fdca99c6403a19a94e7e051e4cd255ab972be279f86ef50de9'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'azure-storage-cli/90a54f4a65a0bfa7d1dc7c651467c1d1b19a009ccbb071ec4ccae42ba903c811'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-mysql-5.7/b1576d316b0046ec60cbbc3ef148eed266daca19992d5b228167a7dfb7059c34'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-mariadb/f66c894e04cf0b91155bf3a3c0af46ff3ce6957ea5f2c07112ba3ead4a185513'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'postgres-10/e3f2ed31116e1a0c929ae6fcdde983a9d6c000c25cafde8a784fd126e06400f9'... Skipped [Package already compiled] (00:00:00)
  Compiling package 's3cli/93d30c08e76d18cf878007359b18c1d1c1c0fb92c757d06bb0bb09de60f2c765'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'verify_multidigest/ffa02c5cc46c56c8006a5c081a16e76b4353f99de7ccc1605c01a95ae47f2fbd'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'health_monitor/5a419aae8750e7fe3f368f6695f8c60fc7d80e8a547d542137d6fbf782cee7fa'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'director/31ce6b1831288b9080178caf68f40d7c59d0743b2f736b449aab842d199fbc4c'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'uaa/2210f02ea85373965968f01d0291a1208d4b6e2e85616a95b477a4354cb93674'... Skipped [Package already compiled] (00:00:03)
  Compiling package 'nginx/82a22b536cf378d354f9325dadcbcb2fa70b1ce9e37eb65a8a7a97cd35e8fc45'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer/84b24a5d9b0a1c07b6484bf908700e2d7990b718e4fd2ce5ee4545337109df2f'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'bosh-gcscli/6394d55f449cad79d0f825815777c3f9f06efcae67850796e905e6aab7e9335b'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'postgres-13/a3141b9f3664abe145c6fb452a54b3bbc4b772933083c2c1ef725c0a7c71824f'... Skipped [Package already compiled] (00:00:00)
  Compiling package 'database-backup-restorer-postgres-10/f4a7d1e2aaad5f2aabb6b0dcbcaedb49305f0d62373af72e2ee8f01eaa595be9'... Skipped [Package already compiled] (00:00:00)
  Updating instance 'bosh/0'... Failed (00:04:49)
Failed deploying (00:26:41)

Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Running the pre-start script:
    Sending 'get_task' to the agent:
      Agent responded with error: Action Failed get_task: Task 288aece3-c64b-4578-5bf7-c6a7c8058142 result: 1 of 8 pre-start scripts failed. Failed Jobs: postgres. Successful Jobs: blobstore, nats, bpm, director, user_add, credhub, uaa.

Exit code 1

The pre-start script of the postgres job failed.

Expected behavior
BOSH Director should be successfully upgraded from v271.2.0 to v280.0.14

Logs
When sshing into the BOSH Director VM, I found this error in /var/vcap/sys/log/postgres/pre-start.stdout.log:

bosh/0:~$ sudo -i
bosh/0:~# monit summary
/var/vcap/bosh/etc/monitrc:8: Warning: include files not found '/var/vcap/monit/job/*.monitrc'
The Monit daemon 5.2.5 uptime: 20m 

System 'system_8c7a4cee-d163-4cd5-4d8c-cd2c5d15cd6f' running

bosh/0:~# ls /var/vcap/sys/log/postgres/ -hal
total 12K
drwxrwx---  2 root vcap 4.0K Jan 27 05:44 .
drwxr-x--- 16 root vcap 4.0K Jan 27 05:44 ..
-rw-r-----  1 root root    0 Jan 27 05:44 pre-start.stderr.log
-rw-r-----  1 root root  283 Jan 27 05:44 pre-start.stdout.log

bosh/0:~# cat /var/vcap/sys/log/postgres/pre-start.stderr.log 

bosh/0:~# cat /var/vcap/sys/log/postgres/pre-start.stdout.log 
kernel.shmmax = 67108864
copying contents of postgres-10 to postgres-15 for postgres upgrade...
Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

The source cluster was not shut down cleanly.
Failure, exiting

When BOSH Director is migrating the database from Postgres 10 to Postgres 15 during the upgrade, it's complaining about the source database (Postgres 10?) is not shutdown cleanly. I attempted to rerun the BOSH Director upgrade several times, but it did not help.

Versions (please complete the following information):

  • Infrastructure: vSphere
  • BOSH versions: from 271.2.0 to 280.0.14
  • BOSH CLI version:
    $ bosh -v
    version 6.1.1-a0c78bc2-2019-10-25T22:16:25Z
    Succeeded
  • Stemcell versions:
    ubuntu-bionic/1.92 for current BOSH Director v271.2.0
    ubuntu-jammy/1.340 for new BOSH Director v280.0.14
  • ... other versions of releases being used (BOSH DNS, Credhub, UAA, BPM, etc)
yq '.releases' releases-280.0.14/interpolated-bosh-director-280.0.14.yml 
- name: bosh
  sha1: f7fd9b040ab56b9c88dd6c4dfc23fdf682c7d4ad
  url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/bosh-280.0.14-ubuntu-jammy-1.340-20240111-153544-517049233-20240111153545.tgz
  version: 280.0.14
- name: bpm
  sha1: 6ac7f9a016075ed69b6808dfb544146a73565a9f
  url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/bpm-1.2.13-ubuntu-jammy-1.340-20240110-224040-652943252-20240110224041.tgz
  version: 1.2.13
- name: bosh-vsphere-cpi
  sha1: ddcf851983f672b1186590244d94f7dffb959ff2
  url: https://bosh.io/d/github.com/cloudfoundry/bosh-vsphere-cpi-release?v=97.0.5
  version: 97.0.5
- name: uaa
  sha1: a8d7847cf4b5829bcfc085565dfb78697fbc3bb5
  url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/uaa-76.31.0-ubuntu-jammy-1.340-20240119-145417-377757494-20240119145421.tgz
  version: 76.31.0
- name: credhub
  sha1: e9229b2bb5681f9ef8911e653e9719de628b3904
  url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/credhub-2.12.58-ubuntu-jammy-1.340-20240111-190030-621523752-20240111190032.tgz
  version: 2.12.58
- name: os-conf
  sha1: daf34e35f1ac678ba05db3496c4226064b99b3e4
  url: https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=22.2.1
  version: 22.2.1
- name: backup-and-restore-sdk
  sha1: 28ea9cbf00d89d4d4c363f4459d79268e44ac65f
  url: https://s3.amazonaws.com/bosh-compiled-release-tarballs/backup-and-restore-sdk-1.18.116-ubuntu-jammy-1.340-20240115-082356-879977937-20240115082400.tgz
  version: 1.18.116

Deployment info:
We're using "bosh create-env" command with bosh-deployment to create and upgrade BOSH Director environment.
BOSH Director creation script:

#!/usr/bin/env bash

BIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd -P)"

if [[ $# -lt 2 ]]
then
  echo "Usage: $0 <env_name> <bosh_director_version>" 1>&2
  echo "Example: $0 sandbox-cfar 280.0.14" 1>&2
  exit 1
fi

env_name=${1}
bosh_director_version=${2}

bosh create-env ${BIN_DIR}/bosh-deployment-${bosh_director_version}/bosh.yml \
    --state=${BIN_DIR}/${env_name}-state.json \
    --vars-store=${BIN_DIR}/${env_name}-creds.yml \
    -l ${BIN_DIR}/${env_name}-vars-${bosh_director_version}.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/vsphere/cpi.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/uaa.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/credhub.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/jumpbox-user.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/bbr.yml \
    -o ${BIN_DIR}/bosh-deployment-${bosh_director_version}/experimental/enable-metrics.yml \
    -o ${BIN_DIR}/ops/configure-uaa-ldap.yml \
    -o ${BIN_DIR}/ops/change-uaa-login-prompt.yml \
    -o ${BIN_DIR}/ops/map-ldap-to-uaa-groups.yml \
    -o ${BIN_DIR}/ops/use-bosh-compiled-releases-from-artifactory-${bosh_director_version}.yml \
    -o ${BIN_DIR}/ops/use-bosh-stemcell-from-artifactory-${bosh_director_version}.yml \
    -o ${BIN_DIR}/ops/vsphere.yml \
    -o ${BIN_DIR}/ops/dns.yml \
    -o ${BIN_DIR}/ops/ntp.yml \
    -o ${BIN_DIR}/ops/passwd.yml \
    -o ${BIN_DIR}/ops/disk-pools.yml \
    -o ${BIN_DIR}/ops/set-credhub-minimum-certificate-duration.yml

new bosh-deployment: https://github.com/cloudfoundry/bosh-deployment/tree/15cbd254db78ab49ef957f2d80ffd2901b09d6e5

Additional context
Add any other context about the problem here.

@rkoster
Copy link
Contributor

rkoster commented Feb 1, 2024

It seems like you are upgrading from an ancient version of Postgres. This issue was fixed here: cloudfoundry/bpm-release#152

@rkoster rkoster moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Feb 1, 2024
@phong2tran
Copy link
Author

Thank you so much for the response @rkoster! Indeed we're operating an "outdated" BOSH environment and have not done the upgrade regularly as we should. We have seen this issue intermittently on a few runs of BOSH Director upgrade testing.

How can we move forward with this BOSH Director v280.0.14 upgrade and ensure that this issue won't happen in our existing production BOSH environments?

Option 1: Can we first manually shut down Postgres 10 on the BOSH Director VM before attempting BOSH Director upgrade? If yes, which command sequences should be used to properly shut down Postgres 10 and other BOSH Director related services?

Option 2: First update BPM component to v1.1.14 or higher (cloudfoundry/bpm-release#152 (comment)) with the fix on current BOSH Director v271.2.0 before upgrading to BOSH Director v280.0.14.

Any other options? Greatly appreciate your suggestions here.

@rkoster
Copy link
Contributor

rkoster commented Feb 1, 2024

Updating BPM would still be an update of the instance, and as such have a change of an improper Postgres shutdown.

@bgandon do you remember if there was a workaround that was used before the fix was implemented?

@phong2tran
Copy link
Author

phong2tran commented Feb 10, 2024

Hi @bgandon,
As @rkoster confirmed using Option 2 will likely run into the same improper Postgres shutdown. Could you please advice on the workaround you used before the BPM fix was implemented if it's possible?

We're thinking of using the Option 1 as a workaround for manually shutting down Postgres 10 on the BOSH Director VM before attempting BOSH Director upgrade. Please help to confirm if the following steps will work.

  1. SSH into BOSH Director VM.
  2. Monit stop all other processes except Postgres.
bosh/0:~# for name in "credhub" "uaa" "health_monitor" "director_nginx" "director_sync_dns" "director_scheduler" "blobstore_nginx" "nats" "director"; do monit stop "${name}"; done
bosh/0:~# monit summary
The Monit daemon 5.2.5 uptime: 7d 2h 19m 

Process 'nats'                      not monitored
Process 'postgres'                  running
Process 'blobstore_nginx'           not monitored
Process 'director'                  not monitored
Process 'worker_1'                  not monitored
Process 'worker_2'                  not monitored
Process 'worker_3'                  not monitored
Process 'worker_4'                  not monitored
Process 'director_scheduler'        not monitored
Process 'director_sync_dns'         not monitored
Process 'director_nginx'            not monitored
Process 'health_monitor'            not monitored
Process 'uaa'                       not monitored
Process 'credhub'                   not monitored
System 'system_be0914a6-1473-47f1-58d9-4f3aacbe2ab5' running
  1. Umonitor Postgres process, so monit won't restart it when Postgres is shutdown using "kill" command directly later.
bosh/0:~# monit unmonitor postgres
bosh/0:~# monit summary
The Monit daemon 5.2.5 uptime: 7d 2h 54m 

Process 'nats'                      not monitored
Process 'postgres'                  not monitored
Process 'blobstore_nginx'           not monitored
Process 'director'                  not monitored
Process 'worker_1'                  not monitored
Process 'worker_2'                  not monitored
Process 'worker_3'                  not monitored
Process 'worker_4'                  not monitored
Process 'director_scheduler'        not monitored
Process 'director_sync_dns'         not monitored
Process 'director_nginx'            not monitored
Process 'health_monitor'            not monitored
Process 'uaa'                       not monitored
Process 'credhub'                   not monitored
System 'system_be0914a6-1473-47f1-58d9-4f3aacbe2ab5' running
  1. Shutdown Postgres using "kill" command with SIGINT signal for fast mode shutdown.
bosh/0:~# postgres_pid=$(/var/vcap/packages/bpm/bin/bpm pid postgres-10) && kill -s SIGINT "${postgres_pid}"
  1. Check Postgres database cluster state and ensure it's been shutting down properly with "shut down" state instead of "in production"
bosh/0:~# su - vcap -c "/var/vcap/packages/postgres-10/bin/pg_controldata -D /var/vcap/store/postgres-10" | grep -F "Database cluster state"
Database cluster state:               shut down
  1. If Postgres database cluster state is in "shut down", then exit the BOSH Director VM and proceed with the BOSH Director upgrade as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Pending Review | Discussion
Development

No branches or pull requests

2 participants