Skip to content

Commit b71c9db

Browse files
butler54claude
andcommitted
docs: add baremetal-gpu topology description and setup steps
Document the new GPU topology and provide setup instructions. Clarify that non-GPU systems should use the baremetal topology. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent c2bc4e8 commit b71c9db

1 file changed

Lines changed: 13 additions & 0 deletions

File tree

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ The pattern provides three deployment topologies:
1616

1717
3. **Bare metal** (`baremetal` clusterGroup) — deploys all components on bare metal hardware with Intel TDX or AMD SEV-SNP support. NFD (Node Feature Discovery) auto-detects the CPU architecture and configures the appropriate runtime. Supports SNO (Single Node OpenShift) and multi-node clusters.
1818

19+
4. **Bare metal with GPU** (`baremetal-gpu` clusterGroup) — extends the bare metal topology with NVIDIA H100 confidential GPU support. Adds the NVIDIA GPU Operator, IOMMU kernel configuration, and a sample CUDA workload for CC GPU verification. Requires NVIDIA H100 GPUs with confidential computing firmware.
20+
1921
The topology is controlled by the `main.clusterGroupName` field in `values-global.yaml`.
2022

2123
Azure deployments use peer-pods, which provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor. Bare metal deployments use layered images and hardware TEE features directly.
@@ -109,6 +111,17 @@ The system auto-detects your hardware:
109111

110112
Optional: pin PCCS to a specific node with `bash scripts/get-pccs-node.sh` and set `baremetal.pccs.nodeSelector` in the baremetal chart values.
111113

114+
### Bare metal GPU deployment
115+
116+
1. Set `main.clusterGroupName: baremetal-gpu` in `values-global.yaml`
117+
2. Run `bash scripts/gen-secrets.sh` to generate KBS keys and PCCS secrets
118+
3. For Intel TDX: uncomment the PCCS secrets in `~/values-secret-coco-pattern.yaml` and provide your Intel PCS API key
119+
4. `./pattern.sh make install`
120+
5. Wait for the cluster to reboot nodes (MachineConfig updates for TDX/SEV-SNP kernel parameters, vsock, and IOMMU)
121+
6. Approve the GPU Operator install plan when it appears (uses `installPlanApproval: Manual`)
122+
123+
> **Note:** The `baremetal-gpu` topology deploys IOMMU MachineConfig on all nodes and will trigger reboots. For clusters without GPUs, use the `baremetal` topology instead. The GPU workload deployment will remain Pending on non-GPU systems but is otherwise harmless.
124+
112125
## Sample applications
113126

114127
Two sample applications are deployed on the cluster running confidential workloads (the single cluster in `simple` mode, or the spoke in multi-cluster mode):

0 commit comments

Comments
 (0)