-
Notifications
You must be signed in to change notification settings - Fork 210
Chef backend ctl scenarios for back up and restore testing
Vinay Satish edited this page Jan 11, 2022
·
6 revisions
SCENARIO=chef-backend PLATFORM=ubuntu-18.04 INSTALL_VERSION=14.11.31 UPGRADE_VERSION=14.11.36 BACKEND_VERSION=2.2.0 ENABLE_IPV6=false ENABLE_ADDON_PUSH_JOBS=false ENABLE_GATHER_LOGS_TEST=false ENABLE_PEDANT_TEST=false ENABLE_PSQL_TEST=false ENABLE_SMOKE_TEST=false ENABLE_IPV6=false make apply
- Login to front end
chef-server-ctl user-create -f /tmp/admin.pem admin Admin User [email protected] password; chef-server-ctl org-create -f /tmp/test-validator.pem test Test; chef-server-ctl org-user-add test admin;
mkdir ~/.chef; cp /tmp/admin.pem ~/.chef/; vi ~/.chef/knife.rb
export PATH=$PATH:/opt/opscode/embedded/bin; knife ssl fetch; knife node create FOO -d; knife node create Foo -d; knife node create foo -d; knife node create bar -d;
chef-server-ctl user-list; chef-server-ctl org-list; knife node list;
- Login to on of the follower
- Run this
chef-server-ctl backup
- Run this
chef-backend-ctl restore /var/opt/chef-backup/chef-backup-2022-01-07-11-52-19.tgz
but got the following error
Would you like to proceed? (y/n)
y
✓ Verifying backup has required components
✓ Verifying backup has required components
✓ Unpacking backup to temporary directory
✓ Removing existing data directoriest node
✓ Rewriting configuration for current node
✓ Restoring configuration cluster
✗ Create new Chef Backend cluster
Restoring PostgreSQL data
Starting up Chef Backendles
✓ Cleaning Up Temporary Files
An error occurred during this operation:
Restore failed:
Timed out waiting for cluster to be ready.
root@ip-10-0-10-189:~#
- The status after the restore was as follows
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~# chef-backend-ctl cluster-status
Name IP GUID Role PG ES Blocked Eligible
ip-10-0-10-189 10.0.10.189 e62b212424b293375261e5d5ce0bf81e leader leader master not_blocked true
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~# chef-backend-ctl status
Service Local Status Time in State Distributed Node Status
leaderl running (pid 10560) 0d 0h 6m 4s Error: no cluster configured
epmd running (pid 10410) 0d 0h 6m 15s Error: no cluster configured
etcd running (pid 10352) 0d 0h 6m 17s Error: no cluster configured
postgresql running (pid 10631) 0d 0h 6m 2s Error: no cluster configured
elasticsearch running (pid 10440) 0d 0h 6m 14s Error: no cluster configured
System Local Status Distributed Node Status
disks /var/log/chef-backend: OK; /var/opt/chef-backend: OK health: green; healthy nodes: 1/1
root@ip-10-0-10-189:~#
root@ip-10-0-10-189:~#
- need to copy the /etc/chef-backend/chef-backend-secrets.json of first node to /tmp/chef-backend-secrets.json of joining node
chef-backend-ctl cleanse
chef-backend-ctl join-cluster --accept-license --yes --quiet 10.0.10.189 -p 10.0.4.86 -s /tmp/chef-backend-secrets.json
- this was successful and the status was as follows
root@ip-10-0-4-86:~#
root@ip-10-0-4-86:~# chef-backend-ctl cluster-status
Name IP GUID Role PG ES Blocked Eligible
ip-10-0-4-86 10.0.4.86 8d7db929361e812c3e0964f17b90096a follower follower not_master not_blocked true
ip-10-0-10-189 10.0.10.189 e62b212424b293375261e5d5ce0bf81e leader leader master not_blocked true
root@ip-10-0-4-86:~#
root@ip-10-0-4-86:~#
root@ip-10-0-4-86:~#
root@ip-10-0-4-86:~# chef-backend-ctl status
Service Local Status Time in State Distributed Node Status
leaderl running (pid 13107) 0d 0h 0m 40s leader: 1; waiting: 0; follower: 1; total: 2
epmd running (pid 13083) 0d 0h 0m 41s status: local-only
etcd running (pid 12963) 0d 0h 1m 16s health: green; healthy nodes: 2/2
postgresql running (pid 13246) 0d 0h 0m 36s leader: 1; offline: 0; syncing: 0; synced: 1
elasticsearch running (pid 13104) 0d 0h 0m 42s state: green; nodes online: 2/2
System Local Status Distributed Node Status
disks /var/log/chef-backend: OK; /var/opt/chef-backend: OK health: green; healthy nodes: 2/2
root@ip-10-0-4-86:~#
- This is the leader node of the previous cluster.
- Tried the same steps, initial I had error due to wrong leader ip address and proceed further with below steps.
- Tried again after doing the next step below and correcting the error (Thank you Prajaktha!) and it was successful.
- Status at the end point
root@ip-10-0-1-226:~#
root@ip-10-0-1-226:~# chef-backend-ctl status
Service Local Status Time in State Distributed Node Status
leaderl running (pid 13273) 0d 0h 0m 50s leader: 1; waiting: 0; follower: 2; total: 3
epmd running (pid 13248) 0d 0h 0m 52s status: local-only
etcd running (pid 13128) 0d 0h 1m 25s health: green; healthy nodes: 3/3
postgresql running (pid 13412) 0d 0h 0m 46s leader: 1; offline: 0; syncing: 0; synced: 2
elasticsearch running (pid 13270) 0d 0h 0m 52s state: green; nodes online: 3/3
System Local Status Distributed Node Status
disks /var/log/chef-backend: OK; /var/opt/chef-backend: OK health: green; healthy nodes: 3/3
root@ip-10-0-1-226:~#
root@ip-10-0-1-226:~#
root@ip-10-0-1-226:~# chef-backend-ctl cluster-status
Name IP GUID Role PG ES Blocked Eligible
ip-10-0-1-226 10.0.1.226 37b13f086ea76a33cc635da13322888a follower follower not_master not_blocked true
ip-10-0-10-189 10.0.10.189 e62b212424b293375261e5d5ce0bf81e leader leader master not_blocked true
ip-10-0-4-86 10.0.4.86 8d7db929361e812c3e0964f17b90096a follower follower not_master not_blocked true
root@ip-10-0-1-226:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
Internal Services
-------------------
run: bookshelf: (pid 18182) 5093s; run: log: (pid 17978) 5140s
run: haproxy: (pid 18131) 5094s; run: log: (pid 3311) 5194s
run: nginx: (pid 2380) 4703s; run: log: (pid 18113) 5108s
run: oc_bifrost: (pid 18136) 5094s; run: log: (pid 17792) 5174s
run: oc_id: (pid 18159) 5093s; run: log: (pid 17825) 5166s
run: opscode-erchef: (pid 18265) 5092s; run: log: (pid 18075) 5136s
run: redis_lb: (pid 2099) 4757s; run: log: (pid 18321) 5091s
-------------------
External Services
-------------------
down: elasticsearch: failed to connect to http://127.0.0.1:9200: 404 "Not Found"
run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 1/5 in 4s
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 2/5 in 8s
^CTraceback (most recent call last):
7: from /usr/bin/chef-server-ctl:180:in `<main>'
6: from /usr/bin/chef-server-ctl:180:in `load'
5: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/bin/chef-server-ctl:337:in `<top (required)>'
4: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:745:in `run'
3: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:203:in `block in add_command_under_category'
2: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/plugins/wrap-knife-opc.rb:43:in `block (2 levels) in load_file'
1: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `run_command'
/opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `system': Interrupt
root@ip-10-0-10-216:~#
- update the
chef_backend_members
in/etc/opscode/chef-server.rb
by having only the working nodes of the new cluster chef-server-ctl reconfigure
- The status is as follows
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
Internal Services
-------------------
run: bookshelf: (pid 18182) 5372s; run: log: (pid 17978) 5419s
run: haproxy: (pid 18893) 129s; run: log: (pid 3311) 5473s
run: nginx: (pid 18896) 128s; run: log: (pid 18113) 5387s
run: oc_bifrost: (pid 18136) 5373s; run: log: (pid 17792) 5453s
run: oc_id: (pid 18159) 5372s; run: log: (pid 17825) 5445s
run: opscode-erchef: (pid 18265) 5371s; run: log: (pid 18075) 5415s
run: redis_lb: (pid 18888) 129s; run: log: (pid 18321) 5370s
-------------------
External Services
-------------------
run: elasticsearch: connected OK to http://127.0.0.1:9200
run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18917-ywu2fo
Response: Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18921-t26put
Response: Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
root@ip-10-0-10-216:~#
- do cleanse and reconfigure -
chef-server-ctl cleanse
, make sure you have the proper chef-server.rb and thenchef-server-ctl reconfigure
- the status is fine, but the data is missing
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
Internal Services
-------------------
run: bookshelf: (pid 19206) 37s; run: log: (pid 18839) 210s
run: haproxy: (pid 19099) 39s; run: log: (pid 3694) 238s
run: nginx: (pid 19202) 37s; run: log: (pid 19068) 48s
run: oc_bifrost: (pid 19104) 39s; run: log: (pid 18659) 222s
run: oc_id: (pid 19138) 38s; run: log: (pid 18771) 216s
run: opscode-erchef: (pid 19214) 37s; run: log: (pid 18838) 210s
run: redis_lb: (pid 19094) 41s; run: log: (pid 19249) 36s
-------------------
External Services
-------------------
run: elasticsearch: connected OK to http://127.0.0.1:9200
run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
pivotal
root@ip-10-0-10-216:~#
- Tried again after connecting the 3rd node. added the node ip in the chef-server-.rb and reconfigured. Status at this point is the same with missing data
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
Internal Services
-------------------
run: bookshelf: (pid 19206) 29546s; run: log: (pid 18839) 29719s
run: haproxy: (pid 3060) 429s; run: log: (pid 3694) 29747s
run: nginx: (pid 3063) 429s; run: log: (pid 19068) 29557s
run: oc_bifrost: (pid 19104) 29548s; run: log: (pid 18659) 29731s
run: oc_id: (pid 19138) 29547s; run: log: (pid 18771) 29725s
run: opscode-erchef: (pid 19214) 29546s; run: log: (pid 18838) 29719s
run: redis_lb: (pid 3055) 430s; run: log: (pid 19249) 29545s
-------------------
External Services
-------------------
run: elasticsearch: connected OK to http://127.0.0.1:9200
run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~#
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
pivotal
root@ip-10-0-10-216:~#
- spin up a bare metal instance for fe (install chef-server) spin up 2 instances for backend (install chef-backend)