Skip to content

Commit b3f769a

Browse files
committed
Add ConfigMap/Secret, multi-container labs, diagnose script, exam notes, CI
1 parent c931f96 commit b3f769a

8 files changed

Lines changed: 341 additions & 1 deletion

File tree

.github/workflows/validate.yml

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: Validate manifests
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Validate YAML syntax
16+
run: |
17+
echo "Checking YAML files..."
18+
ERRORS=0
19+
for f in $(find labs -name '*.yml' -o -name '*.yaml'); do
20+
if ! python3 -c "import yaml, sys; yaml.safe_load_all(open('$f'))" 2>/dev/null; then
21+
echo "FAIL: $f"
22+
ERRORS=$((ERRORS + 1))
23+
else
24+
echo " ok: $f"
25+
fi
26+
done
27+
if [ "$ERRORS" -gt 0 ]; then
28+
echo "$ERRORS file(s) failed validation"
29+
exit 1
30+
fi
31+
echo "All YAML files valid"
32+
33+
- name: Lint with kubeval
34+
uses: instrumenta/kubeval-action@master
35+
with:
36+
files: labs
37+
continue-on-error: true
38+
39+
- name: shellcheck scripts
40+
run: |
41+
sudo apt-get install -y shellcheck
42+
shellcheck scripts/*.sh || true

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Built on a bare-metal cluster (3x Raspberry Pi 4, kubeadm, Calico CNI).
99
| Domain | Weight | Labs |
1010
|--------|--------|------|
1111
| [Cluster Architecture & Installation](labs/cluster-setup/) | 25% | kubeadm init/join, etcd backup/restore, upgrade |
12-
| [Workloads & Scheduling](labs/workloads/) | 15% | Deployments, DaemonSets, resource limits, scheduling |
12+
| [Workloads & Scheduling](labs/workloads/) | 15% | Deployments, DaemonSets, ConfigMaps/Secrets, multi-container pods, scheduling |
1313
| [Services & Networking](labs/networking/) | 20% | Services, Ingress, NetworkPolicy, DNS |
1414
| [Storage](labs/storage/) | 10% | PV, PVC, StorageClasses |
1515
| [Troubleshooting](labs/troubleshooting/) | 30% | Broken nodes, CrashLoopBackOff, DNS failures |
@@ -54,6 +54,11 @@ k expose deploy web --port=80 --type=NodePort $do > svc.yml
5454
|--------|---------|
5555
| [`scripts/etcd-backup.sh`](scripts/etcd-backup.sh) | Snapshot etcd and verify restore |
5656
| [`scripts/cluster-upgrade.sh`](scripts/cluster-upgrade.sh) | Step-by-step kubeadm upgrade |
57+
| [`labs/troubleshooting/diagnose.sh`](labs/troubleshooting/diagnose.sh) | Quick cluster health check |
58+
59+
## Notes
60+
61+
- [`notes/exam-tips.md`](notes/exam-tips.md) — Shortcuts, jsonpath examples, common mistakes
5762

5863
## Resources
5964

labs/troubleshooting/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
Largest exam domain. Practice diagnosing issues without looking at solutions first.
44

5+
## Scripts
6+
7+
- [`diagnose.sh`](diagnose.sh) — Quick cluster health check (nodes, control plane, DNS, problem pods, warnings)
8+
59
## Scenarios
610

711
### 1. Node NotReady

labs/troubleshooting/diagnose.sh

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
#!/bin/bash
2+
# Quick cluster health check — useful when things look wrong
3+
set -euo pipefail
4+
5+
RED='\033[0;31m'
6+
GREEN='\033[0;32m'
7+
YELLOW='\033[1;33m'
8+
NC='\033[0m'
9+
10+
pass() { echo -e "${GREEN}[OK]${NC} $1"; }
11+
warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
12+
fail() { echo -e "${RED}[FAIL]${NC} $1"; }
13+
14+
echo "=== Cluster Health Check ==="
15+
echo ""
16+
17+
# 1. Node status
18+
echo "--- Nodes ---"
19+
NOT_READY=$(kubectl get nodes --no-headers | grep -v " Ready" || true)
20+
if [ -z "$NOT_READY" ]; then
21+
pass "All nodes are Ready"
22+
else
23+
fail "Nodes not ready:"
24+
echo "$NOT_READY"
25+
fi
26+
kubectl get nodes -o wide --no-headers
27+
echo ""
28+
29+
# 2. Control plane pods
30+
echo "--- Control Plane ---"
31+
for component in kube-apiserver kube-controller-manager kube-scheduler etcd; do
32+
STATUS=$(kubectl get pods -n kube-system -l component="$component" --no-headers 2>/dev/null | awk '{print $3}')
33+
if [ "$STATUS" = "Running" ]; then
34+
pass "$component"
35+
else
36+
fail "$component — status: ${STATUS:-not found}"
37+
fi
38+
done
39+
echo ""
40+
41+
# 3. CoreDNS
42+
echo "--- CoreDNS ---"
43+
DNS_PODS=$(kubectl get pods -n kube-system -l k8s-app=kube-dns --no-headers 2>/dev/null)
44+
DNS_RUNNING=$(echo "$DNS_PODS" | grep -c "Running" || true)
45+
DNS_TOTAL=$(echo "$DNS_PODS" | wc -l | tr -d ' ')
46+
if [ "$DNS_RUNNING" -eq "$DNS_TOTAL" ] && [ "$DNS_TOTAL" -gt 0 ]; then
47+
pass "CoreDNS ($DNS_RUNNING/$DNS_TOTAL running)"
48+
else
49+
warn "CoreDNS ($DNS_RUNNING/$DNS_TOTAL running)"
50+
fi
51+
echo ""
52+
53+
# 4. Pods not running
54+
echo "--- Problem Pods ---"
55+
BAD_PODS=$(kubectl get pods -A --no-headers --field-selector=status.phase!=Running,status.phase!=Succeeded 2>/dev/null || true)
56+
if [ -z "$BAD_PODS" ]; then
57+
pass "No pods in bad state"
58+
else
59+
warn "Pods not Running/Succeeded:"
60+
echo "$BAD_PODS"
61+
fi
62+
echo ""
63+
64+
# 5. Recent events (warnings only)
65+
echo "--- Recent Warnings (last 10min) ---"
66+
WARNINGS=$(kubectl get events -A --field-selector type=Warning --sort-by=.lastTimestamp 2>/dev/null | tail -10)
67+
if [ -z "$WARNINGS" ]; then
68+
pass "No recent warnings"
69+
else
70+
echo "$WARNINGS"
71+
fi
72+
echo ""
73+
74+
# 6. Resource pressure
75+
echo "--- Resource Usage ---"
76+
if kubectl top nodes &>/dev/null; then
77+
kubectl top nodes
78+
else
79+
warn "Metrics server not available (kubectl top won't work)"
80+
fi
81+
echo ""
82+
echo "=== Done ==="

labs/workloads/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
- [`resource-limits.yml`](resource-limits.yml) — Pod with requests/limits and LimitRange
99
- [`node-affinity.yml`](node-affinity.yml) — Schedule pods to specific nodes
1010
- [`taint-toleration.yml`](taint-toleration.yml) — Taint a node, schedule with toleration
11+
- [`configmap-secret.yml`](configmap-secret.yml) — ConfigMap + Secret as env vars and volume mounts
12+
- [`multi-container.yml`](multi-container.yml) — Init container + sidecar logging pattern
1113

1214
## Key Concepts
1315

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# App config via ConfigMap + database credentials via Secret
2+
apiVersion: v1
3+
kind: ConfigMap
4+
metadata:
5+
name: app-config
6+
namespace: default
7+
data:
8+
APP_ENV: "production"
9+
LOG_LEVEL: "info"
10+
MAX_CONNECTIONS: "100"
11+
config.yaml: |
12+
server:
13+
port: 8080
14+
read_timeout: 30s
15+
write_timeout: 30s
16+
cache:
17+
ttl: 300
18+
---
19+
apiVersion: v1
20+
kind: Secret
21+
metadata:
22+
name: db-credentials
23+
namespace: default
24+
type: Opaque
25+
stringData:
26+
DB_HOST: "postgres.database.svc.cluster.local"
27+
DB_USER: "app"
28+
DB_PASSWORD: "changeme"
29+
---
30+
apiVersion: v1
31+
kind: Pod
32+
metadata:
33+
name: app
34+
labels:
35+
app: backend
36+
spec:
37+
containers:
38+
- name: app
39+
image: busybox:1.36
40+
command: ["sh", "-c", "echo \"DB=$DB_HOST user=$DB_USER\" && cat /etc/app/config.yaml && sleep 3600"]
41+
# Individual keys as env vars
42+
envFrom:
43+
- secretRef:
44+
name: db-credentials
45+
env:
46+
- name: APP_ENV
47+
valueFrom:
48+
configMapKeyRef:
49+
name: app-config
50+
key: APP_ENV
51+
- name: LOG_LEVEL
52+
valueFrom:
53+
configMapKeyRef:
54+
name: app-config
55+
key: LOG_LEVEL
56+
# File as volume mount
57+
volumeMounts:
58+
- name: config-volume
59+
mountPath: /etc/app
60+
readOnly: true
61+
volumes:
62+
- name: config-volume
63+
configMap:
64+
name: app-config
65+
items:
66+
- key: config.yaml
67+
path: config.yaml

labs/workloads/multi-container.yml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Init container waits for a dependency, sidecar ships logs
2+
apiVersion: v1
3+
kind: Pod
4+
metadata:
5+
name: web-app
6+
labels:
7+
app: web
8+
spec:
9+
initContainers:
10+
# Wait until the database service is resolvable via DNS
11+
- name: wait-for-db
12+
image: busybox:1.36
13+
command:
14+
- sh
15+
- -c
16+
- |
17+
echo "Waiting for postgres to become available..."
18+
until nslookup postgres.default.svc.cluster.local; do
19+
echo " ...not ready, retrying in 2s"
20+
sleep 2
21+
done
22+
echo "postgres is up"
23+
containers:
24+
# Main application container
25+
- name: app
26+
image: nginx:1.27
27+
ports:
28+
- containerPort: 80
29+
volumeMounts:
30+
- name: logs
31+
mountPath: /var/log/nginx
32+
resources:
33+
requests:
34+
cpu: 100m
35+
memory: 128Mi
36+
limits:
37+
cpu: 250m
38+
memory: 256Mi
39+
40+
# Sidecar: tails nginx access log to stdout (collected by cluster logging)
41+
- name: log-shipper
42+
image: busybox:1.36
43+
command: ["sh", "-c", "tail -F /var/log/nginx/access.log"]
44+
volumeMounts:
45+
- name: logs
46+
mountPath: /var/log/nginx
47+
readOnly: true
48+
resources:
49+
requests:
50+
cpu: 25m
51+
memory: 32Mi
52+
limits:
53+
cpu: 50m
54+
memory: 64Mi
55+
56+
volumes:
57+
- name: logs
58+
emptyDir: {}

notes/exam-tips.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Exam Notes
2+
3+
Things I keep forgetting or that cost me time in practice runs.
4+
5+
## Time Management
6+
7+
- 2 hours, 17 questions — roughly 7 min per question
8+
- Flag hard ones and come back, don't get stuck on a single task
9+
- Some questions are worth 4%, others 7-8% — prioritize high-value ones
10+
11+
## Shortcuts That Save Time
12+
13+
```bash
14+
# Set these up FIRST, before touching any question
15+
alias k='kubectl'
16+
alias kgp='kubectl get pods -A'
17+
alias kgn='kubectl get nodes'
18+
alias kd='kubectl describe'
19+
export do='--dry-run=client -o yaml'
20+
export now='--grace-period=0 --force'
21+
22+
# vim settings (add to ~/.vimrc)
23+
set tabstop=2
24+
set shiftwidth=2
25+
set expandtab
26+
```
27+
28+
## jsonpath
29+
30+
Comes up all the time. I always forget the syntax.
31+
32+
```bash
33+
# Get internal IPs of all nodes
34+
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}'
35+
36+
# List all container images running in a namespace
37+
kubectl get pods -n kube-system -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}'
38+
39+
# Get PV sorted by capacity
40+
kubectl get pv --sort-by=.spec.capacity.storage
41+
42+
# Custom columns
43+
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
44+
```
45+
46+
## etcd
47+
48+
Always need the certs. Check the etcd pod manifest if unsure:
49+
50+
```bash
51+
cat /etc/kubernetes/manifests/etcd.yaml | grep -E 'cert|key|cacert'
52+
```
53+
54+
## Common Mistakes
55+
56+
- Forgetting `--namespace` — always double-check which namespace the question asks for
57+
- NetworkPolicy: once you create ANY policy selecting a pod, all other traffic is denied by default
58+
- PV/PVC: accessModes and capacity must match, otherwise the PVC stays Pending
59+
- `kubeadm upgrade apply` only on control plane, `kubeadm upgrade node` on workers
60+
- Static pod manifests go in `/etc/kubernetes/manifests/`, not applied via kubectl
61+
- After editing a static pod manifest, kubelet picks it up automatically — no restart needed
62+
63+
## kubectl Tricks
64+
65+
```bash
66+
# Generate YAML without applying
67+
kubectl run tmp --image=nginx $do > pod.yml
68+
69+
# Quick debug pod
70+
kubectl run debug --image=busybox:1.36 --rm -it -- sh
71+
72+
# Check if RBAC allows something
73+
kubectl auth can-i create deployments --as=dev -n staging
74+
75+
# See why a pod isn't scheduled
76+
kubectl describe pod <name> | grep -A5 Events
77+
78+
# Diff before applying
79+
kubectl diff -f manifest.yml
80+
```

0 commit comments

Comments
 (0)