Skip to content

Commit 0930763

Browse files
cluster-sizing: add detailed GPU and CPU sizing recommendations
- Add specific requirements for CPU-only vs GPU-accelerated spokes - Detail minimum vs recommended worker node sizes - Add GPU operator prerequisites (NFD, NVIDIA GPU Operator) - Include cloud instance type examples for GPU nodes Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 33dc8fd commit 0930763

1 file changed

Lines changed: 47 additions & 0 deletions

File tree

content/patterns/hybrid-mesh-platform/cluster-sizing.adoc

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,53 @@ include::modules/cluster-sizing-template.adoc[]
2222

2323
The Hybrid Mesh Platform pattern deploys a **hub** cluster plus **two spoke** clusters (east and west). Plan for three OpenShift clusters at the recommended sizes above.
2424

25+
=== Hub (CPU-only)
26+
27+
* **Recommended (workshop 30–50):** 4 workers, 16 vCPU/worker, 64 GiB/worker (Total: 64 vCPU / 256 GiB). Full stack: GitLab, Developer Hub, OpenShift AI 3.4, Kubecost, Quay, CNV, Kuadrant, Lightspeed, ACS.
28+
* **Minimum demo (1–5 users):** 3 workers, 8 vCPU/worker, 32 GiB/worker (Total: 24 vCPU / 96 GiB). Tight; disable Kubecost/CNV or expect Evicted pods under load.
29+
30+
[NOTE]
31+
====
32+
On RHDP, control-plane nodes are schedulable and carry ~50% of memory (kube-apiserver, etcd, monitoring). GitLab alone uses ~20 GiB steady state. Kubecost adds ~14 GiB — disable on minimum-demo tiers.
33+
====
34+
35+
=== Spokes — CPU-only (default)
36+
37+
All inference on CPU: OVMS `openvino_ir` face-detection, KServe YOLO PPE `best.pt`.
38+
39+
* **Recommended (AI CV + DevSpaces):** 3 workers, 8 vCPU/worker, 32 GiB/worker (Total: 24 vCPU / 96 GiB). NeuroFace + OVMS ModelMesh + YOLO PPE + DevSpaces + Kafka + Skupper.
40+
* **Minimum demo (AI CV only):** 2 workers, 4 vCPU/worker, 16 GiB/worker (Total: 8 vCPU / 32 GiB). NeuroFace + OVMS only; no DevSpaces; tight on KServe cold starts.
41+
42+
=== Spokes — GPU-accelerated (optional)
43+
44+
For faster OVMS/YOLO throughput or self-hosted LLM (vLLM). Requires **NFD** + **NVIDIA GPU Operator** on the spoke (not in pattern by default).
45+
46+
* **Recommended (inference + DevSpaces):** 3 workers, 8 vCPU/worker, 32 GiB/worker, 1x T4 / A10G GPU/worker (Total: 3 GPUs). OVMS GPU, YOLO GPU, local vLLM 7B, DevSpaces.
47+
* **Production-like (multi-model):** 3 workers, 16 vCPU/worker, 64 GiB/worker, 1x A100 / A10G (24+ GB VRAM) GPU/worker (Total: 3 GPUs). Multiple InferenceServices, self-hosted LLM 14–70B.
48+
49+
Set `resources.limits.nvidia.com/gpu: "1"` on InferenceService predictors when GPU nodes are available.
50+
51+
==== GPU operators (install before AI workloads on GPU nodes)
52+
53+
[cols="<,<,<,<"]
54+
|===
55+
| Operator | Channel | Catalog | Install on
56+
57+
| **Node Feature Discovery (NFD)**
58+
| stable
59+
| redhat-operators
60+
| Spokes (or hub) with GPU nodes
61+
62+
| **NVIDIA GPU Operator**
63+
| stable
64+
| certified-operators
65+
| Spokes with GPU nodes
66+
|===
67+
68+
Optional: **NVIDIA Network Operator** (distributed training), **OpenShift Serverless** (Knative KServe scale-to-zero), **NVIDIA NIM Operator** (enterprise LLM).
69+
70+
**Cloud examples:** AWS `g4dn.xlarge` (T4), `g5.2xlarge` (A10G + vLLM 7B); Azure `Standard_NC4as_T4_v3`; GCP `a2-highgpu-1g` (A100).
71+
2572
Optional features such as OpenShift AI workbenches and MaaS-backed inference require additional hub capacity. Workshop Showroom content is maintained in a link:https://github.com/maximilianoPizarro/showroom-hybrid-mesh-ai[separate repository] and does not change minimum cluster sizing for the core pattern.
2673

2774
For RHDP catalog provisioning details and validation steps, see the link:https://maximilianopizarro.github.io/hybrid-mesh-platform/[pattern documentation site].

0 commit comments

Comments
 (0)