Skip to content

docs: local hardware-in-the-loop FT8/WSPR bench plan (#171)#172

Draft
ringof wants to merge 24 commits into
mainfrom
claude/171-local-hitl
Draft

docs: local hardware-in-the-loop FT8/WSPR bench plan (#171)#172
ringof wants to merge 24 commits into
mainfrom
claude/171-local-hitl

Conversation

@ringof

@ringof ringof commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Adds docs/local-hwil-plan.mdPlan 1 for a local, closed-RF
hardware-in-the-loop test bench, drivable by a local Claude Code /goal.

Closes #171 (plan deliverable; implementation tracked there).

What this is

A plan (docs-only, no code) for a bench that exercises the full real signal
path of this firmware end to end:

QDX (TX, known message) -> attenuators -> RX888 -> FX3 firmware (this repo)
  -> radiod (rx888.so) -> decoder -> decoded message -> single pass/fail line

The orchestrator emits one grep-stable verdict line, which is what makes a
/goal honest — the goal evaluator judges only what's in the transcript, and
that line is the output of a real RF decode.

Two phases, one shared bench

  • Phase A — FT8 (ft8_lib encode+decode, ~15–30 s/run): fast iteration gate.
  • Phase B — WSPR via the actual wsprdaemonwsprd → spot pipeline:
    deployment-representative sign-off.

Only the TX mode and decoder tail differ between phases; the orchestrator is
parameterized by MODE={ft8,wspr}. Builds on the existing
docker/ka9q-radio/ harness (already builds radiod + rx888.so, flashes
SDDC_FX3.img).

Explicitly deferred to Plan 2

Remote self-hosted GitHub Actions runner, Actions/cron wiring, and the runner
security model (untrusted-fork code on a machine wired to a transmitter).
That surface is large enough to warrant its own plan.

Scope of this PR

Documentation only — docs/local-hwil-plan.md. No source/build/config changes;
firmware build and host tests are unaffected (build.yml paths-ignores
docs/**). Implementation lands in follow-up PRs against #171.

Open operator questions to resolve before implementation: initial FT8 band,
reserved test callsign/grid, availability of a USB-switchable hub for
self-reset, and single- vs sibling-container split.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT


Generated by Claude Code

claude added 4 commits June 9, 2026 01:04
Plan 1: a local, closed-RF test bench (QDX -> attenuators -> RX888 ->
FX3 firmware -> radiod -> decoder) drivable by a local Claude Code
/goal. Two phases sharing one bench: FT8 (ft8_lib) as the fast
iteration loop, WSPR via real wsprdaemon as the deployment-
representative sign-off. Documents the operator interface contract
(RF safety / attenuation power rating, QDX/RX888 config, pass
criteria), unattended-loop hygiene, and closed-system etiquette
(no public spots). Remote self-hosted-runner CI and its security
model are explicitly deferred to a separate plan (Plan 2).

Docs-only; no code, build, or behavior changes.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT
Add the staged goal ladder (G0 software encode/decode + fixtures, then
rungs 1–7) with a hardware-dependency map; record decisions: HITL image
built FROM the ka9q-radio image; rung 2 split into 2a (CAT) / 2b
(soundcard); rung 3 CW carrier ~10 MHz with numeric PASS (peak within
300 Hz, >=20 dB over noise); red DANGER!! pre-gate banner on every rung
that keys the QDX into the RX888; and a generated known-content audio
fixtures section (reference-tool authored, encoder doubles as TX
stimulus, same-tool self-loopback caveat). First implementation step is
now G0.

Docs-only; no code, build, or behavior changes.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT
First implementation rung (G0) of the local HITL bench. Hardware-free
audio encode->decode self-tests that print one grep-stable verdict line
and exit 0/1 (the contract a local /goal reads):

  - run_g0_ft8.sh : ft8_lib gen_ft8 -> wav -> decode_ft8
  - run_g0_wspr.sh: wspr-cui wsprsimwav -> wav -> (sox 12k) -> wsprd
  - gen_ft8_wav.sh / gen_wspr_wav.sh: known-content audio generators
    (also the TX stimulus source for the on-bench rungs)

Real tools own protocol encode/decode; sox only converts sample rate
(48k<->12k). External sources are cloned+built under .build/ (gitignored),
pinned: ft8_lib@9fec6ca, wspr-cui@839b86f (mirrors the Dockerfile SHA-pin
convention). Each rung decodes an authoritative fixture if present, else a
self-loop. Committed FT8 fixture (12 kHz, 352K); WSPR runs self-loop
(audio generated on the fly) to avoid a multi-MB blob.

Host deps: gcc gfortran libfftw3-dev sox libgfortran5 libfftw3-single3.
No FX3 SDK or hardware. Validated: both rungs OK/exit 0; wrong .expected
-> FAIL/exit 1.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT
Each run now writes its self-loop audio to a visible out/ dir (gitignored)
and prints the path + a copy-pastable decode command, so the operator has a
real audio file to verify with their own tools (jt9 / wsprd) instead of
trusting the harness verdict. WSPR previously surfaced no file at all.

  out/g0_ft8_selfloop.wav        (12 kHz)
  out/g0_wspr_selfloop_48k.wav   (48 kHz, QDX TX rate)
  out/g0_wspr_selfloop_12k.wav   (12 kHz, wsprd input)

Honors the earlier "FT8 fixture only" choice: out/ is gitignored, no
multi-MB blob committed.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT
claude and others added 20 commits June 9, 2026 02:35
Plan drifted from what G0 became; bring it in line:
- Status: approved/in-progress, G0 implemented (PR #172).
- Name the real tooling: gen_ft8 (ft8_lib), wsprsimwav (wspr-cui) for
  WSPR audio, sox for rate conversion only (not synthesis).
- Rewrite the fixtures/independence section to the operator-as-verifier
  model: encoder independence waived; each run emits audio to out/ for the
  operator to decode with their own jt9/wsprd. Document committed-FT8-
  fixture vs on-the-fly-WSPR.
- Next step -> rung 1; mark test callsign/grid resolved (T1ABC/FN20).
- tests/bench/README: add a copy-paste Prerequisites apt line so a clean
  checkout reproduces the run for independent verification.

Docs-only.

https://claude.ai/code/session_014Q2buDF3FceHaXdJBovDCT
First QDX-touching rung in the HWIL ladder. Proves the bench host can
set frequency, key/unkey PTT, and read CAT responses from the QDX over
USB CDC serial. The reusable qdx_cat.py module is used by all later
QDX-touching rungs (2b, 3, 5, 7).

New files:
- tests/bench/qdx_cat.py — QdxCat class (context manager, safety RX on exit)
- tests/bench/rung2a_cat_test.py — 6-check test (ID, freq, PTT, IF)
- tests/bench/run_rung2a.sh — shell wrapper with preflight checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kenwood CAT set commands (FA<freq>;, TX;, RX;) do not produce a serial
response — only query commands do. Added send_set() for these and
switched set_freq, tx_on, tx_off to use it. Validated against real
QDX hardware (fw 1_09): all 6 rung 2a checks pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Proves the bench host can play audio into and capture audio from the
QDX over its USB Audio Class device. Validates device discovery,
capture, playback, and PTT + playback integration — the audio
plumbing reused by every TX rung (3, 5, 7).

New files:
- qdx_audio.py: reusable audio discovery + playback helpers
- rung2b_audio_test.py: test sequence (4 checks)
- run_rung2b.sh: shell wrapper with preflight checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qdx_cat.py:
- Fix __exit__ crash: write RX; directly to port, bypassing send_set()
  whose reset_input_buffer() throws I/O errors on a wedged port.

qdx_audio.py:
- generate_tone: produce S24 stereo 48 kHz (QDX native format) instead
  of S16 mono that hw: rejects.
- play_to_qdx: use hw: directly instead of plughw: so format mismatches
  fail loud.

rung2a_cat_test.py:
- Lead with FA; (proven on hardware) instead of gating on ID;.
- ID/VN are informational, not gating.

rung2b_audio_test.py:
- Capture check verifies actual audio energy, not just file size > 0.
- Honest verdict: explicitly states RF output is NOT verified by this
  test and requires manual confirmation with a separate receiver.

All checks validated on real QDX hardware with 12V DC supply.
RF output confirmed by operator using independent HF receiver.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- bench_rf_test.py: new manual RF verification tool (--port, --freq,
  --tone, --duration). Reads current QDX freq by default.
- rung2a_cat_test.py: env vars → argparse. Wrap ID/VN and IF queries
  in try/except so they're non-fatal.
- rung2b_audio_test.py: env vars → argparse. Print dial freq and
  expected carrier before PTT test.
- README.md: add rungs 2a/2b, bench_rf_test, helper modules, usage.
- local-hwil-plan.md: update status through rung 3 (confirmed manually).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…to image

Move the Docker build context from docker/ka9q-radio/ to the project
root so the Dockerfile can COPY tests/bench/ tools directly. This makes
the container self-contained for rung 2 testing and future CI — no
volume mounts needed for the QDX bench scripts.

- Prefix existing COPY paths (patches/, rx888-test.conf, entrypoint.sh)
  with docker/ka9q-radio/ for the new context
- Add python3, python3-serial, sox, alsa-utils to runtime image
- COPY 5 bench tools into /usr/local/lib/bench/
- Add ka9q.sh build subcommand with the new -f Dockerfile syntax
- Add conditional /dev/ttyACM0 passthrough in ka9q.sh start
- Update build command in 8 docs/scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
QDX transmits a known tone (1500 Hz at 14,095,600 Hz dial), powers
captures the spectrum via ka9q-radio/RX888, and the script validates
the tone appears at the expected carrier (14,097,100 Hz) above the
noise floor. First fully closed-loop automated RF test in the bench
ladder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The powers output format is a single line per snapshot:
  timestamp, low_freq, high_freq, binwidth, num_bins, p0, p1, ...
not one freq,power pair per line. Fixed the parser to compute bin
frequencies from low_freq + i * binwidth.

Also added a one-shot retry when the first powers invocation returns
no parseable bins — common on fresh container start.

Verified on real hardware via docker exec (43.1 dB margin, peak dead
on 14,097,100 Hz).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Routine pin bump — no patches, no config changes. Key upstream
deltas since the previous pin (87567fa): USB watchdog resets only
after a successful transfer, rx888 globals moved into struct
sdrstate, isfinite() float-exception guard, TESTFX3 query failure
now non-fatal, -march=native off by default.

Also fixes stale "active patch 04" wording in README (patch 04 was
upstreamed at 87567fa).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ple rate

Add [FT8] and [WSPR] channel sections to rx888-test.conf covering
the four QDX bands (80/40/30/20m) at standard dial frequencies.
Data groups ft8-pcm.local and wspr-pcm.local give bench scripts a
capture target for decode rungs 4-7.

Replace FFTW wisdom generation (impractical in containers) with a
runtime ADC_SAMPRATE env var (default 64m8, optional 129m6 for
full-rate). Entrypoint sed-substitutes the sample rate into the
config at startup.

Existing smoke test (ka9q_smoke.sh) and unit test (ka9q_test.sh)
are unaffected — they use ad-hoc powers queries via SSRC 30303,
independent of named channel sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pcmrecord is ka9q-radio's native RTP-to-WAV recorder with built-in
FT8 (-8) and WSPR (-w) slot alignment. Needed for rung 4+ bench
tests to capture demodulated audio from the ft8-pcm/wspr-pcm
channel groups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add FT8 roundtrip test: QDX transmits a randomly-generated FT8 message
(unique callsign + grid per run) on each of four bands, RX888/radiod
demodulates, pcmrecord captures slot-aligned WAV, decode_ft8 asserts the
message decodes. Verified on live hardware — all 4 bands pass.

Dockerfile: build ft8_lib (gen_ft8 + decode_ft8) pinned to 9fec6ca,
copy binaries + rung4 script into runtime image.

Key fixes found during bring-up:
- pcmrecord writes WAVE_FORMAT_EXTENSIBLE WAV that ft8_lib can't parse;
  normalize through sox before decode_ft8.
- SIGTERM on pcmrecord leaves a short partial for the next slot; try all
  captured WAVs instead of just the last one.
- Clean capture directory per band to avoid stale files from prior runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…script

Build wsprsimwav + wsprd (jj1bdx/wspr-cui, pinned 839b86f) in the Docker
image alongside ft8_lib.  Add gfortran to builder, libgfortran5 to runtime.

New wspr_roundtrip_test.py orchestrates a single-band (40m, 7.0386 MHz)
WSPR encode→TX→capture→decode loop using pcmrecord -w (120s slot-aligned
captures) and wsprd.  Reports SNR so the operator can calibrate attenuation
to the -10 to -15 dB target range.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ware

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs found during first hardware validation of the WSPR roundtrip:

1. SSRC mismatch: radiod rounds freq-in-kHz (7038.6 → 7039) but the
   script truncated (7038600 // 1000 = 7038).  pcmrecord found no
   matching stream → no captures.  Fix: use round() instead of //.

2. TX audio too quiet: wsprsimwav's -6 dB output was below the QDX's
   modulation threshold — no RF despite confirmed PTT.  Fix: normalize
   via sox gain -n during the mono→stereo conversion.  Exposed as
   --drive (default -1 dB) so the operator can tune the level.

Hardware-validated: WSPR roundtrip decode confirmed at SNR +29 dB on
40m (7.038600 MHz) with QDX → attenuator → RX888 → radiod → wsprd.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set AD8370 VGA gain to 0 dB (was 10) and PE4304 attenuator to 31.5 dB
(was 0) to reduce signal level for the QDX→RX888 bench loopback.
WSPR roundtrip SNR dropped from +29 to +6 dB; an additional inline
20 dB pad is needed to reach the -10 to -15 dB target.

Update HWIL plan to reflect WSPR hardware validation complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Drop rung2a_/rung2b_/rung3_/rung4_/wspr_roundtrip_ prefixes from bench
test scripts in favor of descriptive names (cat_test, audio_test,
loopback_test, ft8_test, wspr_test, rf_test).  Update all verdict
strings, log prefixes, temp dirs, output filenames, docstrings,
Dockerfile COPY lines, and docs to match.

Add bench.sh — a unified dispatcher that runs host tests directly and
container tests via docker exec, with `bench.sh all` for full runs and
`bench.sh list` for discovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…utput

- bench.sh: SKIP_AUDIO=1 skips audio test in `bench.sh all`; all
  python3 invocations now use -u for unbuffered stdout/stderr so
  output streams live through `docker exec` without a TTY.

- ft8_test.py: add --passes arg (default 3). Each band runs N passes
  with a fresh random message per pass; all must decode for a band
  to pass. Fail-fast on first decode failure. Verdict shows pass
  counts (e.g. "20m(1/3)").

- wspr_test.py: expand from 40m-only to 80/40/30/20m matching the
  radiod [WSPR] config. Loop over bands with per-band message
  generation. Remove --freq arg (replaced by built-in bands list).
  SSRC derived automatically from dial frequency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…00 samples

pcmrecord occasionally emits slot captures with a few extra samples
beyond 180000 (e.g. 180005 — ~0.4 ms of timing jitter).  decode_ft8
(ft8_lib) crashes with exit 255 ("cannot load wave file") when the
sample count exceeds 180000.

This caused the FT8 bench test to fail nondeterministically on 20m
while 80m/40m/30m passed — the 20m capture happened to land 5 samples
over the limit.  The signal was fine (+15.5 dB SNR, decoded after trim).

Add "trim 0 15.0" to the sox normalization in decode_and_check(),
capping output at exactly 180000 samples.  No-op on files already at
or under that count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local hardware-in-the-loop (HITL) test bench: closed-RF FT8/WSPR loopback drivable by a local /goal

2 participants