You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: add baremetal-gpu topology description and setup steps
Document the new GPU topology and provide setup instructions.
Clarify that non-GPU systems should use the baremetal topology.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,8 @@ The pattern provides three deployment topologies:
16
16
17
17
3.**Bare metal** (`baremetal` clusterGroup) — deploys all components on bare metal hardware with Intel TDX or AMD SEV-SNP support. NFD (Node Feature Discovery) auto-detects the CPU architecture and configures the appropriate runtime. Supports SNO (Single Node OpenShift) and multi-node clusters.
18
18
19
+
4.**Bare metal with GPU** (`baremetal-gpu` clusterGroup) — extends the bare metal topology with NVIDIA H100 confidential GPU support. Adds the NVIDIA GPU Operator, IOMMU kernel configuration, and a sample CUDA workload for CC GPU verification. Requires NVIDIA H100 GPUs with confidential computing firmware.
20
+
19
21
The topology is controlled by the `main.clusterGroupName` field in `values-global.yaml`.
20
22
21
23
Azure deployments use peer-pods, which provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor. Bare metal deployments use layered images and hardware TEE features directly.
@@ -109,6 +111,17 @@ The system auto-detects your hardware:
109
111
110
112
Optional: pin PCCS to a specific node with `bash scripts/get-pccs-node.sh` and set `baremetal.pccs.nodeSelector` in the baremetal chart values.
111
113
114
+
### Bare metal GPU deployment
115
+
116
+
1. Set `main.clusterGroupName: baremetal-gpu` in `values-global.yaml`
117
+
2. Run `bash scripts/gen-secrets.sh` to generate KBS keys and PCCS secrets
118
+
3. For Intel TDX: uncomment the PCCS secrets in `~/values-secret-coco-pattern.yaml` and provide your Intel PCS API key
119
+
4.`./pattern.sh make install`
120
+
5. Wait for the cluster to reboot nodes (MachineConfig updates for TDX/SEV-SNP kernel parameters, vsock, and IOMMU)
121
+
6. Approve the GPU Operator install plan when it appears (uses `installPlanApproval: Manual`)
122
+
123
+
> **Note:** The `baremetal-gpu` topology deploys IOMMU MachineConfig on all nodes and will trigger reboots. For clusters without GPUs, use the `baremetal` topology instead. The GPU workload deployment will remain Pending on non-GPU systems but is otherwise harmless.
124
+
112
125
## Sample applications
113
126
114
127
Two sample applications are deployed on the cluster running confidential workloads (the single cluster in `simple` mode, or the spoke in multi-cluster mode):
0 commit comments