forked from pivotal-cf/docs-pcf-upgrade
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchecklist.html.md.erb
323 lines (292 loc) · 19.5 KB
/
checklist.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---
title: Upgrade Checklist for PCF v2.2
owner: Ops Manager
---
This topic serves as a checklist for upgrading Pivotal Cloud Foundry (PCF) from v2.1 to v2.2.
## <a id="compatibility"></a>Tile Compatibility
Before you upgrade to PCF v2.2, please check whether PCF v2.2 supports the service tile versions that you currently have deployed.
To check PCF version support for any service tile, from Pivotal or a Pivotal partner:
* Navigate to the tile's download page on [Pivotal Network](https://network.pivotal.io).
* Select the tile version in the **Releases** dropdown.
* See the **Depends On** section under **Release Details**. For more information, refer to the tile's release notes.
If the currently-deployed version of a tile is not compatible with PCF v2.2, you must upgrade the tile before you upgrade PCF. You do not need to upgrade tiles that are compatible with both PCF v2.1 and v2.2.
The [Product Compatibility Matrix](
http://docs.pivotal.io/resources/product-compatibility-matrix.pdf) provides an overview of which PCF versions support which versions of the most popular service tiles from Pivotal.
When you have completed your upgrade please take some time to complete [this survey](https://docs.google.com/forms/d/e/1FAIpQLScf9KuTbF3B9CjldMGqQxF-GA72xMR0DGzaGmhX4uWMrD8pOA/viewform?c=0&w=1).
## <a id="env"></a>Environment Details
Pivotal provides the empty table below as a model to print out or adapt for recording and tracking the tile versions that you have deployed in all of your environments.
| | | **Sandbox** | **Non-Prod** | **Prod** | **Other...** |
| ----------------------------------------------------------------------- | ---------------------------------------------------------------- | ------------ | ------------- | -------- | ------------ |
| **Pivotal Cloud Foundry** | [Ops Manager](https://network.pivotal.io/products/ops-manager) | | | | |
| | [Pivotal Application Service](https://network.pivotal.io/products/elastic-runtime) (PAS) | | | | |
| **Pivotal Cloud Foundry Services** | [MySQL v2](https://network.pivotal.io/products/pivotal-mysql) | | | | |
| | [Redis](https://network.pivotal.io/products/p-redis) | | | | |
| | [RabbitMQ](https://network.pivotal.io/products/p-rabbitmq) | | | | |
| | [Single Sign On](https://network.pivotal.io/products/p-identity) (SSO) | | | | |
| | [Spring Cloud Services](https://network.pivotal.io/products/p-spring-cloud-services) | | | | |
| | [Concourse](https://network.pivotal.io/products/p-concourse) | | | | |
| | ... | | | | |
| **Pivotal Cloud Foundry Partner Services** | [New Relic](https://network.pivotal.io/products/p-new-relic) | | | | |
| | ... | | | | |
## <a id="known-issues"></a>PAS v2.2 Known Issues in the Pivotal Knowledge Base
The following Pivotal Knowledge Base (KB) articles describe known issues for PCF v2.2:
* [Operations Manager Validation Returns TLS Error When Configuring BOSH Director S3 Blobstore](https://community.pivotal.io/s/article/Operations-Manager-Validation-returns-TLS-error-when-configuring-Bosh-Director-S3-blobstore)
* [Some Components Use Go v1.9 to Avoid x.509 Cert Errors in v1.10](https://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/runtime-rn.html#ki-golang-issue)
* [Apps Serve Routes Even After App Deletion Due To MySQL Errors](https://community.pivotal.io/s/article/Apps-Continue-to-Serve-Routes-even-after-App-Deletion-due-to-Mysql2Error-This-connection-is-in-use-by--Thread-errors)
* [Deployment Fails Because Monit Reports Job as Failed](https://community.pivotal.io/s/article/Deployment-fails-because-monit-repots-job-as-failed)
* [Deploying PAS for Windows Tile Results in an Access is Denied Error](https://community.pivotal.io/s/article/Deploying-PAS-for-Windows-Tile-results-in-access-denied-error)
* [Timeouts Connecting to GCS Blobstores when Configured to use Service Account Key authentication](https://community.pivotal.io/s/article/Timeouts-connecting-to-GCS-blobstores-when-configured-to-use-service-account-key-authentication)
* [Spring Cloud Services Deployment fails with can't resolve link error in PAS 2.2](https://community.pivotal.io/s/article/Spring-Cloud-Services-Deployment-fails-with)
## <a id="breaking"></a>PCF v2.2 Breaking Changes
PCF 2.2 has the following breaking changes. For more information, see [PCF v2.2 Breaking Changes](http://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/breaking-changes.html) in the PCF Release Notes.
<table>
<tbody>
<tr>
<th><strong>Breaking Change</strong></th>
<th><strong>Details</strong></th>
</tr>
<tr>
<td>Log Aggregation Systems May Parse Diego Timestamps Incorrectly</td>
<td>The timestamps in the Diego component logs are now in a format compatible with <a href="https://tools.ietf.org/html/rfc3339">RFC 3339</a>. Before enabling the new timestamp format for Diego logs, ensure that your log aggregation system anticipates the timestamp format change.</td>
</tr>
<tr>
<td>SSH Proxy May Restrict Client Access</td>
<td>The SSH proxy now accepts a narrower range of ciphers, MACs, and key exchanges. This feature may have complications if you use a SSH client other than <code>cf ssh</code> to connect.</td>
</tr>
<tr>
<td>Temporary Downtime when Migrating Internal System Databases</td>
<td><p>Upgrading existing internal system databases to the new option of <strong>Internal Databases - MySQL - Percona XtraDB Cluster</strong> causes temporary downtime of PAS system functions. It does not interrupt apps hosted by PAS.</p>
</td>
</tr>
<tr>
<td>App Autoscaler Fails When Loggregator's Log Cache Is Disabled</td>
<td>App Autoscaler relies on API endpoints in Loggregator's Log Cache component. If you disable Log Cache, App Autoscaler will fail.</td>
</tr>
<tr>
<td>Redeploy All Products After Upgrading to Ops Manager v2.2</td>
<td>When upgrading to PCF v2.2, redeploy all product tiles by clicking <strong>Apply Changes</strong> in the Ops Manager installation dashboard. A universal redeploy ensures that the updated BOSH DNS release applies consistently to all products.</td>
</tr>
<tr>
<td>Ops Manager Logs Have Changed Locations</td>
<td>Ops Manager centralized its logs so that you can now configure a syslog server in the Ops Manager <strong>Settings</strong>. Because of this feature change for Ops Manager v2.2, you cannot run scripts to fetch Ops Manager logs at their previous locations.</td>
</tr>
<tr>
<td>Ops Manager API Does Not Return Secret Properties</td>
<td>The Ops Manager API v2.2 <code>GET</code> response from the <code>/api/v0/staged/director/properties</code> endpoint no longer includes the GCS blobstore service key, Health Monitor emailer SMTP password, or Health Monitor PagerDuty service key properties. This change may break any code that relies on retrieving these properties.</td>
</tr>
</tbody>
</table>
## <a id="before"></a>Before Upgrade
<table>
<tbody>
<tr>
<th><strong>Step</strong></th>
<th><strong>Note/Reason </strong></th>
<th><strong>Product or Component</strong></th>
</tr>
<tr>
<td>Upgrade all service tiles to versions compatible with PCF v2.2.</td>
<td>Review the <a href="http://docs.pivotal.io/resources/product-compatibility-matrix.pdf">Compatibility Matrix</a> and tile documentation to check version compatibility.</td>
<td>Service Tiles </td>
</tr>
<tr>
<td><p>Back up your PCF deployment. See <a href="https://docs.pivotal.io/pivotalcf/2-2/customizing/backup-restore/index.html">Backing Up and Restoring Pivotal Cloud Foundry</a> and <a href="https://docs.pivotal.io/pivotalcf/2-2/customizing/backup-restore/disaster-recovery.html">Disaster Recovery in Pivotal Cloud Foundry</a></p></td>
<td><p>Pivotal recommends backing up your installation settings before upgrading or otherwise changing your PCF deployment.</p></td>
<td>PCF</td>
</tr>
<tr>
<td>Review <a href="https://docs.pivotal.io/pivotalcf/2-2/customizing/upgrading-pcf.html">Upgrading Pivotal Cloud Foundry</a> in the PCF v2.2 documentation and complete all relevant actions</td>
<td><p>Review all of the upgrade notes.</p>
<ul><li>If you disabled BOSH DNS, reenable it.</li>
<li>Ensure that lifecycle errands are enabled for your installed tiles.</li>
<li>Ensure that your persistent disk usage is < 50%.</li>
<li>Ensure that you know your decryption passphrase and have it stored safely.</li></ul></td>
<td>PCF</td>
</tr>
<tr>
<td>Review the <a href="http://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/runtime-rn.html">Pivotal Application Service v2.2 Release Notes</a>.</td>
<td>See the PAS release notes for details.</td>
<td>PAS</td>
</tr>
<tr>
<td>Review the <a href="http://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/opsmanager-rn.html">PCF Ops Manager Release v2.2 Release Notes</a></td>
<td><p>See the Ops Manager release notes for any feature changes that may affect you during or after upgrade.</p>
<p>BOSH CLI output formatting has changed. If your deployment uses scripts that rely on BOSH output, you must update them to interpret <a href="https://bosh.io/docs/cli-v2/">BOSH CLI v2</a> command output</p></td>
<td>Ops Manager</td>
</tr>
<tr>
<td>Review the <a href="https://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/runtime-rn.html">Known Issues</a> section of the PAS v2.2 release notes</td>
<td>This section covers recommended actions, new issues, and existing issues for PAS v2.2.</td>
<td>PAS</td>
</tr>
<tr>
<td>Review the <a href="http://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/opsmanager-rn.html#known-issues">Known Issues</a> section of the Ops Manager v2.2 release notes.</td>
<td>This section covers recommended actions, new issues, and existing issues for Ops Man v2.2</td>
<td>Ops Manager</td>
</tr>
<tr>
<td>Review <a href="http://docs.pivotal.io/pivotalcf/2-2/pcf-release-notes/breaking-changes.html">PCF v2.2 Breaking Changes</a>.</td>
<td>This topic describes the breaking changes you need to be aware of when upgrading PCF v2.2.</td>
<td>PCF</td>
</tr>
<tr>
<td>Review <a href="https://docs.pivotal.io/pivotalcf/2-2/security/networking/diego-network-paths.html">Diego Network Communications</a>.</td>
<td>This topic describes the internal network communication paths between components of <a href="https://docs.pivotal.io/pivotalcf/2-2/concepts/diego/diego-architecture.html">Diego</a> workload management system in PAS.</td>
<td>PAS</td>
</tr>
<tr>
<td><p>Run and collect foundation health status in at least one of the following ways:</p>
<ul>
<li>If your PCF deployment has external metrics monitoring set up, verify that VM CPU, RAM, and disk use levels are within reasonable levels.</li>
<li>Run BOSH CLI commands to check system status:
<ul><li><code>bosh -e ALIAS -d DEPLOYMENT_NAME instances --ps</code></li>
<li><code>bosh -e ALIAS vms --vitals</code></li>
<li><code>bosh -e ALIAS -d DEPLOYMENT_NAME cck --report</code></li>
</ul></li>
<li>Check Ops Manager GUI each ERT/Tiles the status page for CPU/RAM/DISK utilization</li>
<li>Validate Ops Manager persistent disk usage is below 50%. If not, follow Prep 6 in the <a href="https://docs.pivotal.io/pivotalcf/2-2/customizing/upgrading-pcf.html#prepare-env">Ops Manager upgrade docs</a>.</li>
</ul>
</td>
<td><p><code>bosh vms --vitals</code> reveals VMs with high CPU, high memory, high disk utilization, and with <code>state</code> != <code>running</code>.</p>
<p><code>bosh instances</code> with the flags <code>--ps</code>, <code>--vitals</code>, or <code>--failing</code>highlights individual job failure.</p></li>
</ul>
</td>
<td>PCF</td>
</tr>
<tr>
<td><p>Review <a href="https://docs.pivotal.io/pivotalcf/2-2/monitoring/index.html#new">KPI Changes from PCF v2.1 to v2.2</a>.</p>
<p>For complete descriptions of each KPI, see <a href="https://docs.pivotal.io/pivotalcf/2-2/monitoring/kpi.html">Key Performance Indicators</a>.</p></td>
<td>This document highlights new and changed Key Performance Indicators (KPIs) that operators may want to monitor with their PCF deployment to help ensure it is in a good operational state.</td>
<td>PAS</td>
</tr>
<tr>
<td><p>Check that Diego cells have sufficient available RAM and disk capacity to support app containers.</p></td>
<td><p>The KPIs that monitor these these resources are are:</p>
<ul><li><code>rep.CapacityRemainingMemory</code></li>
<li><code>rep.CapacityRemainingDisk</code></li></ul>
<p>Review the <a href="https://docs.pivotal.io/pivotalcf/2-2/monitoring/kpi.html#cell">Diego Cell Metrics</a> section of the KPI topic for more information about these KPIs.</p></td>
<td>PAS</td>
</tr>
<tr>
<td>Check that a test app can be pushed and scaled horizontally, manually or via automated testing.</td>
<td>This check ensures that the platform supports apps as expected before the upgrade.</td>
<td>PCF</td>
</tr>
<tr>
<td>If needed, adjust the maximum number of Diego cells that the platform can upgrade simultaneously, to avoid overloading the other cells. See <a href="https://docs.pivotal.io/pivotalcf/2-2/adminguide/diego-cell-upgrade.html">Managing Diego Cell Limits During Upgrade</a>.</td>
<td>For PCF v1.10 and later, the maximum number of cells that can update at once, <code>max_in_flight</code> is 4%. This setting is configured in the BOSH manifest's Diego cell definition. See the <a href="https://docs.pivotal.io/pivotalcf/2-2/adminguide/diego-cell-upgrade.html#limitations-considerations">Preventing Overload</a> section for details.</td>
<td>PAS</td>
</tr>
<tr>
<td><p>To save upgrade time, you can disable unused PAS post-deploy errands.</p></td>
<td><p>See the <a href="https://docs.pivotal.io/tiledev/2-2/tile-errands.html#post-deploy">Post-Deploy Errands</a> section of the Errands topic for details.</p>
<p>Only disable these errands if your environment does not need them.</p></td>
<td>PAS</td>
</tr>
<tr>
<td><p>If you are running Elastic Runtime MySQL as a cluster, run the <code>mysql-diag</code> tool to validate health of the cluster.</p>
</td>
<td><p>See the BOSH CLI v2 instructions in the <a href="http://docs.pivotal.io/p-mysql/1-10/mysql-diag.html">Running mysql-diag</a> topic.</p></td>
<td>PAS</td>
</tr>
<tr>
<td>Run <code>bosh -e ALIAS clean-up --all</code> to clean up old stemcells, releases, orphaned disks, and other resources before upgrade.</td>
<td>This cleanup helps prevent the product and stemcell upload process from exceeding the BOSH Director's available persistent disk space.</td>
<td>BOSH</td>
</tr>
</tbody>
</table>
## <a id="during"></a>During Upgrade
<table>
<tbody>
<tr>
<th><strong>Step</strong></th>
<th><strong>Note/Reason </strong></th>
<th><strong>Product/component</strong></th>
</tr>
<tr>
<td>(Optional) Periodically take snapshots of storage metrics</td>
<td>Pivotal recommends this if you have a large foundation and have experienced storage issues in the past.</td>
<td>PCF</td>
</tr>
<tr>
<td><p>Monitor upgrade progress with methods such as:</p>
<ul><li>Run BOSH CLI status checks:
<ul><li><code>bosh -e ALIAS task TASK_NUMBER</code></li>
<li><code>bosh -e ALIAS vms --vitals, bosh -e ALIAS instances --ps</code></li></ul></li>
<li>Check app availability</li>
<li>Run cf CLI Commands</li>
<li>Check the availability of the Ops Manager GUI</li>
<li>Check NAS performance (if using NAS)</li>
<li>Check vSphere performance (if on vSphere)</li></ul></td>
<td>Monitor the progress of the upgrade, checking the status of the foundation at various locations.</td>
<td>PCF</td>
</tr>
<tr>
<td><p>Use the CF Diego Operator Toolkit (cfdot) to check Diego component instance count by current state.</p></td>
<td>See the <a href="https://github.com/cloudfoundry/cfdot">cfdot documentation</a> on github for details.</td>
<td>PAS</td>
</tr>
<tr>
<td><p>(Optional) If you encounter problems during upgrade, collect the following information:</p>
<ul><li>All job logs</li>
<li>Task debug logs for VM upgrade tasks</li>
<li>Installation log from Ops Manager</li></ul></td>
<td>This information helps determine the cause of upgrade issues.</td>
<td>PCF</td>
</tr>
</tbody>
</table>
## <a id="after"></a>After Upgrade
<table>
<tbody>
<tr>
<th><strong>Step</strong></th>
<th><strong>Note/Reason </strong></th>
<th><strong>Product/component</strong></th>
</tr>
<tr>
<td>Complete the steps in the <a href="https://docs.pivotal.io/pivotalcf/2-2/customizing/upgrading-pcf.html#after-upgrade">After You Upgrade</a> section of the Upgrading Pivotal Cloud Foundry topic.</td>
<td>This ensures that you have the latest version of the Cloud Foundry Command Line Interface (cf CLI).</td>
<td>ALL</td>
</tr>
<tr>
<td>Push and horizontally scale a test application.</td>
<td>This is a performance test for PAS.</td>
<td>PCF</td>
</tr>
<tr>
<td><p>Re-create your alias using BOSH:</p>
<p><code>bosh alias-env ALIAS -e DIRECTOR_IP</code></p></td>
<td>To log into BOSH after upgrading PCF, you need to recreate your alias.</td>
<td>BOSH</td>
</tr>
<tr>
<td>Run BOSH CLI commands to check system status:
<ul><li><code>bosh -e ALIAS -d DEPLOYMENT_NAME instances --ps</code></li>
<li><code>bosh -e ALIAS vms --vitals</code></li>
<li><code>bosh -e ALIAS -d DEPLOYMENT_NAME cck --report</code></li></ul>
</td>
<td>To ensure that all jobs and process are running as expected.</td>
<td>PCF and all tiles</td>
</tr>
<tr>
<td>If you added custom <strong>VM Type</strong> or <strong>Persistent Disk Type</strong> options, ensure that these values are correctly set and were not overwritten.</td>
<td>Verify that the Ops Manager <strong>Resource Config</strong> pane still lists your custom options.</td>
<td>PCF</td>
</tr>
<tr>
<td><p>If you are running PAS MySQL as a cluster, run the <code>mysql-diag</code> tool to validate health of the cluster.</p>
</td>
<td><p>See the BOSH CLI v2 instructions in the <a href="http://docs.pivotal.io/p-mysql/1-10/mysql-diag.html">Running mysql-diag</a> topic.</p></td>
<td>PAS</td>
</tr>
<tr>
<td>Run <code>bosh -e ALIAS clean-up --all</code> to clean up old stemcells, releases, orphaned disks, and other unused resources.</td>
<td></td>
<td>Tiles</td>
</tr>
</tbody>
</table>
## <a id="survey"></a>Survey
Please take some time to help us improve this document by completing the [Upgrade Checklist Survey](https://docs.google.com/forms/d/e/1FAIpQLScf9KuTbF3B9CjldMGqQxF-GA72xMR0DGzaGmhX4uWMrD8pOA/viewform?c=0&w=1).