[quality] pkg/k8s GPU health subsystem has zero test coverage — 5 pure helpers untested

## Finding

The GPU health and discovery subsystem in `pkg/k8s` contains **~32KB of untested code** across 4 files:

| File | Size | Description |
|------|------|-------------|
| `client_gpu_discovery.go` | 10,864 B | Multi-vendor GPU node discovery (NVIDIA, AMD, Intel, Google TPU, IBM AIU) |
| `client_gpu_health.go` | 9,210 B | GPU node health monitoring with 7 health checks |
| `client_gpu_types.go` | 6,718 B | Type definitions for GPU health structures |
| `client_gpu_nvidia.go` | 5,564 B | NVIDIA operator status inspection |

### Pure helper functions with zero coverage:

1. **`checkOperatorPod()`** (client_gpu_health.go) — inspects pod status for GPU operator DaemonSet pods. Checks for CrashLoopBackOff, non-Running state, and pod-not-found scenarios.
2. **`isStuckPod()`** (client_gpu_health.go) — determines if a pod is stuck via 3 conditions: ContainerStatusUnknown, terminating >5min, pending >10min.
3. **`deriveGPUNodeStatus()`** (client_gpu_health.go) — derives overall health (healthy/degraded/unhealthy) from check results with critical vs non-critical classification.
4. **`unstructuredNestedMap()`** (client_gpu_nvidia.go) — traverses nested unstructured K8s objects.
5. **`unstructuredNestedSlice()`** (client_gpu_nvidia.go) — traverses nested unstructured K8s objects for slices.

### Why this matters:

1. The GPU health endpoint drives the console UI's GPU dashboard — bugs in `deriveGPUNodeStatus` or `isStuckPod` silently misclassify node health
2. `checkOperatorPod` handles edge cases (CrashLoopBackOff, missing pods) that determine whether operators see alerts
3. `unstructuredNestedMap/Slice` parse arbitrary CRD structures — nil panics here crash the handler
4. These are all **pure functions** that can be unit-tested with constructed inputs (no K8s client mocks needed)

## Recommendation

1. Add table-driven tests for `isStuckPod` covering all 3 stuck conditions + happy path
2. Add tests for `deriveGPUNodeStatus` covering healthy/degraded/unhealthy transitions
3. Add tests for `checkOperatorPod` covering Running, CrashLoopBackOff, Pending, not-found
4. Add tests for `unstructuredNestedMap`/`unstructuredNestedSlice` covering nil, missing keys, valid paths

## Priority
- Impact: high (GPU dashboard correctness, silent health misclassification)
- Effort: low (all pure functions, no mocks needed)

---
*Filed by quality agent (ACMM L4/L6 — full mode)*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quality] pkg/k8s GPU health subsystem has zero test coverage — 5 pure helpers untested #19555

Finding

Pure helper functions with zero coverage:

Why this matters:

Recommendation

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	Size	Description
`client_gpu_discovery.go`	10,864 B	Multi-vendor GPU node discovery (NVIDIA, AMD, Intel, Google TPU, IBM AIU)
`client_gpu_health.go`	9,210 B	GPU node health monitoring with 7 health checks
`client_gpu_types.go`	6,718 B	Type definitions for GPU health structures
`client_gpu_nvidia.go`	5,564 B	NVIDIA operator status inspection

Uh oh!

[quality] pkg/k8s GPU health subsystem has zero test coverage — 5 pure helpers untested #19555

Description

Finding

Pure helper functions with zero coverage:

Why this matters:

Recommendation

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions