You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add bare metal support for Intel TDX and AMD SEV-SNP (#73)
* feat: add bare metal support for Intel TDX and AMD SEV-SNP
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: update baremetal values to use released charts
Replace git branch references (repoURL/targetRevision/path) with
released Helm chart references (chart/chartVersion) for trustee,
sandboxed-containers, and sandboxed-policies in values-baremetal.yaml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add TDX kernel flag and enable intel-dcap for baremetal
Add tdx.enabled flag (default true) to baremetal chart to conditionally
set kvm_intel.tdx=1 kernel argument. Without this, the kvm_intel module
does not activate TDX and NFD cannot detect it.
Enable intel-dcap application in values-baremetal.yaml for PCCS/QGS
attestation services.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove unused runtime class, kernel params, and commented-out templates
Address PR review feedback:
- Remove detect-runtime-class.yaml (OSC operator manages RuntimeClass)
- Remove bm-kernel-params.yaml and kernel-params-mco.yaml (config should
be provided via initdata or pod annotations to avoid inconsistencies)
- Remove commented-out runtimeclass templates for AMD SNP and Intel TDX
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: update to OSC 1.12 / Trustee 1.1.0
Signed-off-by: Chris Butler <chris.butler@redhat.com>
* feat: integrate Kyverno and update trustee config for baremetal
- Add Kyverno chart and coco-kyverno-policies to baremetal values
- Update trustee chart to 0.3.* with kbs.admin.format v1.1
- Remove bypassAttestation (proper attestation via init_data)
- Remove explicit runtimeClassName overrides (auto-detected by platform)
- Add syncPolicy prune to hello-openshift and kbs-access
- Reset default clusterGroupName to simple
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: set clusterGroupName to baremetal for deployment testing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add UPDATE operation to initdata injection policy
The policy only fired on Pod/Deployment CREATE, so pods created before
the initdata ConfigMap existed never got the cc_init_data annotation.
Adding UPDATE allows Kyverno to inject the annotation when a Deployment
is updated (e.g. by ArgoCD sync), triggering a rolling restart with
the correct initdata.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add intel-device-plugins-operator subscription for SGX/TDX quote generation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: enable TDX config in trustee to point QCNL at local PCCS service
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: store raw SHA-256 hash alongside PCR8 hash in initdata ConfigMaps
Adds RAW_HASH field to both initdata and debug-initdata ConfigMaps.
PCR8_HASH = SHA256(zeros || SHA256(toml)) — used by Azure vTPM attestation
RAW_HASH = SHA256(toml) — used by baremetal TDX/SNP attestation
Both are needed because Azure and baremetal present initdata differently
in their attestation evidence. A single Trustee attestation server must
accept both formats to support multi-platform deployments.
Future: integrate veritas for comprehensive reference value generation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: point trustee at feature branch for baremetal attestation testing
Temporarily uses butler54/trustee-chart feature/baremetal-attestation
branch instead of released chart. This branch includes:
- Baremetal TDX and SNP attestation rules
- Conditional pcr-stash (no error on baremetal without vTPM)
- Raw init_data hash (zero-padded) for baremetal attestation
- TDX QCNL config with use_secure_cert: false for local PCCS
Revert to chartVersion after merging and releasing trustee chart.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: increase kata VM memory for kbs-access to 8192MB
The kbs-access-app container image is ~1GB which causes container
creation timeouts with the default 2GB kata VM memory.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: target Pods only for cc_init_data injection, disable autogen
The autogen Deployment rule causes admission failures when the initdata
ConfigMap hasn't been propagated to the workload namespace yet. By
targeting Pods only (autogen-controllers: none), Deployments are admitted
without ConfigMap resolution. Pods get cc_init_data injected at creation
time when the ConfigMap is available. A rollout restart picks up new
initdata values.
Also removes UPDATE operation — only CREATE is needed since a rollout
restart creates new Pods.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use ${initial_pcr} braces in PCR8 hash computation
Without braces, bash treats $initial_pcr followed by the hex hash
as a single undefined variable name, producing SHA-256 of empty
string instead of the correct PCR extend value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: address PR #73 review comments and merge PR #75 documentation
This commit addresses all review comments from bpradipt and pawelpros on
PR #73, merges documentation from PR #75, and updates container images.
Documentation changes:
- README: Replace "peer-pod infrastructure" wording to clarify Azure vs bare metal
- README: Update OCP version requirements from 4.17+ to 4.19.28+ (OSC 1.12 requirement)
- README: Clarify PCR collection differs for Azure (get-pcr.sh) vs bare metal (manual)
- README: Distinguish Azure (kata-remote) from bare metal (kata-cc) runtime classes
- values-secret.yaml.template: Add missing kbsPrivateKey secret
- values-secret.yaml.template: Reorganize with clear section headers and improved docs
- gen-secrets.sh: Add prominent alert when values-secret file is created
- Merge docs/nfd-matchall-bug.md from PR #75 (NFD matchAll bug report)
- Merge docs/pcr-reference-values-bare-metal.md from PR #75 (PCR collection guide)
Code cleanup:
- Delete obsolete qgs-config-cm.yaml (QGS args now inline)
- Delete obsolete qgs-sgx-cm.yaml (QCNL config via downwardAPI)
- Remove commented-out detect-runtime-class reference in values-baremetal.yaml
Image updates:
- intel-dpo-sgx.yaml: Update intel-sgx-plugin to sha256:4ac8769c (v0.35.0)
- pccs-deployment.yaml: Update osc-pccs to sha256:edf57087 (v1.12)
- qgs-ds.yaml: Update osc-tdx-qgs to sha256:308d66da (v1.12)
Resolves review comments from:
- bpradipt: peer-pod wording, OCP versions, PCR clarification
- pawelpros: obsolete ConfigMaps, image digests, PCR requirements
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: revert clusterGroupName to simple for main branch merge
The clusterGroupName was changed to 'baremetal' in commit a601af0 for
deployment testing. Reverting to 'simple' as the default so existing
users are not affected when this PR merges to main.
The baremetal clusterGroup remains available by setting
clusterGroupName: baremetal in user overrides or CI.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update trustee chart to use upstream 0.3.3 release
Replace butler54/trustee-chart.git fork reference with upstream
chart reference now that validatedpatterns/trustee-chart#21 has
merged and released as v0.3.3.
The 0.3.3 release includes baremetal TDX/SNP attestation support
and NVIDIA GPU attestation via NRAS remote verifier.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Signed-off-by: Chris Butler <chris.butler@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+46-13Lines changed: 46 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,27 +2,29 @@
2
2
3
3
Validated pattern for deploying confidential containers on OpenShift using the [Validated Patterns](https://validatedpatterns.io/) framework.
4
4
5
-
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service), and peer-pod infrastructure — on Azure.
5
+
Confidential containers use hardware-backed Trusted Execution Environments (TEEs) to isolate workloads from cluster and hypervisor administrators. This pattern deploys and configures the Red Hat CoCo stack — including the sandboxed containers operator, Trustee (Key Broker Service) operator, and Kata infrastructure — on Azure cloud instances and bare metal.
6
6
7
7
## Topologies
8
8
9
-
The pattern provides two deployment topologies:
9
+
The pattern provides three deployment topologies:
10
10
11
-
1.**Single cluster** (`simple` clusterGroup) — deploys all components (Trustee, Vault, ACM, sandboxed containers, workloads) in one cluster. This breaks the RACI separation expected in a remote attestation architecture but simplifies testing and demonstrations.
11
+
1.**Single cluster** (`simple` clusterGroup) — deploys all components (Trustee, Vault, ACM, sandboxed containers, workloads) in one cluster on Azure. This breaks the RACI separation expected in a remote attestation architecture but simplifies testing and demonstrations.
12
12
13
13
2.**Multi-cluster** (`trusted-hub` + `spoke` clusterGroups) — separates the trusted zone from the untrusted workload zone:
14
14
-**Hub** (`trusted-hub`): Runs Trustee (KBS + attestation service), HashiCorp Vault, ACM, and cert-manager. This cluster is the trust anchor.
15
15
-**Spoke** (`spoke`): Runs the sandboxed containers operator and confidential workloads. The spoke is imported into ACM and managed from the hub.
16
16
17
+
3.**Bare metal** (`baremetal` clusterGroup) — deploys all components on bare metal hardware with Intel TDX or AMD SEV-SNP support. NFD (Node Feature Discovery) auto-detects the CPU architecture and configures the appropriate runtime. Supports SNO (Single Node OpenShift) and multi-node clusters.
18
+
17
19
The topology is controlled by the `main.clusterGroupName` field in `values-global.yaml`.
18
20
19
-
Currently supports Azure via peer-pods. Peer-pods provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor rather than nesting VMs inside worker nodes.
21
+
Azure deployments use peer-pods, which provision confidential VMs (`Standard_DCas_v5` family) directly on the Azure hypervisor. Bare metal deployments use layered images and hardware TEE features directly.
20
22
21
23
## Current version (4.*)
22
24
23
25
Breaking change from v3. This is the first version using GA (Generally Available) releases of the CoCo stack:
-**Red Hat Build of Trustee 1.1** (GA release; all versions prior to 1.0 were Technology Preview)
27
29
- External chart repositories for [Trustee](https://github.com/validatedpatterns/trustee-chart), [sandboxed-containers](https://github.com/validatedpatterns/sandboxed-containers-chart), and [sandboxed-policies](https://github.com/validatedpatterns/sandboxed-policies-chart)
28
30
- Self-signed certificates via cert-manager (Let's Encrypt no longer required)
@@ -42,9 +44,21 @@ All previous versions used pre-GA (Technology Preview) releases of Trustee:
42
44
43
45
### Prerequisites
44
46
45
-
- OpenShift 4.17+ cluster on Azure (self-managed via `openshift-install` or ARO)
47
+
**Azure deployments:**
48
+
49
+
- OpenShift 4.19.28+ cluster on Azure (self-managed via `openshift-install` or ARO)
46
50
- Azure `Standard_DCas_v5` VM quota in your target region (these are confidential computing VMs and are not available in all regions). See the note below for more details.
47
51
- Azure DNS hosting the cluster's DNS zone
52
+
53
+
**Bare metal deployments:**
54
+
55
+
- OpenShift 4.19.28+ cluster on bare metal with Intel TDX or AMD SEV-SNP hardware
56
+
- BIOS/firmware configured to enable TDX or SEV-SNP
57
+
- Available block devices for LVMS storage (auto-discovered)
58
+
- For Intel TDX: an Intel PCS API key from [api.portal.trustedservices.intel.com](https://api.portal.trustedservices.intel.com/)
59
+
60
+
**Common:**
61
+
48
62
- Tools on your workstation: `podman`, `yq`, `jq`, `skopeo`
49
63
- OpenShift pull secret saved at `~/pull-secret.json` (download from [console.redhat.com](https://console.redhat.com/openshift/downloads))
50
64
- Fork the repository — ArgoCD reconciles cluster state against your fork, so changes must be pushed to your remote
@@ -53,29 +67,48 @@ All previous versions used pre-GA (Technology Preview) releases of Trustee:
53
67
54
68
These scripts generate the cryptographic material and attestation measurements needed by Trustee and the peer-pod VMs. Run them once before your first deployment.
55
69
56
-
1.`bash scripts/gen-secrets.sh` — generates KBS key pairs, attestation policy seeds, and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml`
57
-
2.`bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`)
58
-
3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern
70
+
1.`bash scripts/gen-secrets.sh` — generates KBS key pairs, PCCS certificates/tokens (for bare metal), and copies `values-secret.yaml.template` to `~/values-secret-coco-pattern.yaml`
71
+
2.`bash scripts/get-pcr.sh` — retrieves PCR measurements from the peer-pod VM image and stores them at `~/.coco-pattern/measurements.json` (requires `podman`, `skopeo`, and `~/pull-secret.json`). **Azure only.** Bare metal uses manual PCR collection — see [docs/pcr-reference-values-bare-metal.md](docs/pcr-reference-values-bare-metal.md) for the procedure. Store the measurements at `~/.coco-pattern/measurements.json`.
72
+
3. Review and customise `~/values-secret-coco-pattern.yaml` — this file is loaded into Vault and provides secrets to the pattern. For bare metal, uncomment the PCCS secrets section and provide your Intel PCS API key.
59
73
60
74
> **Note:**`gen-secrets.sh` will not overwrite existing secrets. Delete `~/.coco-pattern/` if you need to regenerate.
61
75
62
-
### Single cluster deployment
76
+
### Single cluster deployment (Azure)
63
77
64
78
1. Set `main.clusterGroupName: simple` in `values-global.yaml`
65
79
2. Ensure your Azure configuration is populated in `values-global.yaml` (see `global.azure.*` fields)
66
80
3.`./pattern.sh make install`
67
81
4. Wait for the cluster to reboot all nodes (the sandboxed containers operator triggers a MachineConfig update). Monitor progress in the ArgoCD UI.
68
82
69
-
### Multi-cluster deployment
83
+
### Multi-cluster deployment (Azure)
70
84
71
85
1. Set `main.clusterGroupName: trusted-hub` in `values-global.yaml`
72
86
2. Deploy the hub cluster: `./pattern.sh make install`
73
87
3. Wait for ACM (`MultiClusterHub`) to reach `Running` state on the hub
74
-
4. Provision a second OpenShift 4.17+ cluster on Azure for the spoke
88
+
4. Provision a second OpenShift 4.19.28+ cluster on Azure for the spoke
75
89
5. Import the spoke into ACM with label `clusterGroup=spoke`
76
90
(see [importing a cluster](https://validatedpatterns.io/learn/importing-a-cluster/))
77
91
6. ACM will automatically deploy the `spoke` clusterGroup applications (sandboxed containers, workloads) to the imported cluster
78
92
93
+
### Bare metal deployment
94
+
95
+
1. Set `main.clusterGroupName: baremetal` in `values-global.yaml`
96
+
2. Run `bash scripts/gen-secrets.sh` to generate KBS keys and PCCS secrets
97
+
3. For Intel TDX: uncomment the PCCS secrets in `~/values-secret-coco-pattern.yaml` and provide your Intel PCS API key
98
+
4.`./pattern.sh make install`
99
+
5. Wait for the cluster to reboot nodes (MachineConfig updates for TDX kernel parameters and vsock)
100
+
101
+
The system auto-detects your hardware:
102
+
103
+
-**NFD** discovers Intel TDX or AMD SEV-SNP capabilities and labels nodes
104
+
-**LVMS** auto-discovers available block devices for storage
105
+
-**RuntimeClass**`kata-cc` is created automatically pointing to the correct handler (`kata-tdx` or `kata-snp`)
106
+
- Both `kata-tdx` and `kata-snp` RuntimeClasses are deployed; only the one matching your hardware has schedulable nodes
107
+
- MachineConfigs are deployed for both `master` and `worker` roles (safe on SNO where only master exists)
108
+
- PCCS and QGS services deploy unconditionally; DaemonSets only schedule on Intel nodes via NFD labels
109
+
110
+
Optional: pin PCCS to a specific node with `bash scripts/get-pccs-node.sh` and set `baremetal.pccs.nodeSelector` in the baremetal chart values.
111
+
79
112
## Sample applications
80
113
81
114
Two sample applications are deployed on the cluster running confidential workloads (the single cluster in `simple` mode, or the spoke in multi-cluster mode):
@@ -85,7 +118,7 @@ Two sample applications are deployed on the cluster running confidential workloa
85
118
-`secure` — a confidential container with a strict policy; `oc exec` is denied even for `kubeadmin`
86
119
-`insecure-policy` — a confidential container with a relaxed policy allowing `oc exec` (useful for testing the Confidential Data Hub)
87
120
88
-
Each confidential pod runs on its own `Standard_DC2as_v5` Azure VM (visible in the Azure portal). Pods use `runtimeClassName: kata-remote`.
121
+
On Azure, each confidential pod runs on its own `Standard_DC2as_v5` Azure VM (visible in the Azure portal) using `runtimeClassName: kata-remote`. On bare metal, pods use `runtimeClassName: kata-cc` and run directly on the underlying TDX or SEV-SNP hardware.
89
122
90
123
-**kbs-access**: A web service that retrieves and presents secrets obtained from the Trustee Key Broker Service (KBS) via the Confidential Data Hub (CDH). Useful for verifying end-to-end attestation and secret delivery in locked-down environments.
0 commit comments