test(metrics): LatencyAssertions helpers + budget E2E guard by yuga-hashimoto · Pull Request #464 · yuga-hashimoto/open-dash

yuga-hashimoto · 2026-04-19T23:48:53Z

Priority 5: Refactor / Quality (testing the priority-1 latency contract)

Adds latency-budget regression guards so the P8.2 500 ms / 200 ms targets stop being silently broken by future pipeline edits.

What changed

Production helper (`app/src/main/.../voice/metrics/LatencyAssertions.kt`)

Three pure-Kotlin helpers callable from both unit and instrumented tests:

`checkSpanRecorded` — guards "the timing path stopped recording"
`checkNoBudgetViolations` — hard budget assertion (use sparingly)
`checkBudgetViolationRateAtMost` — soft assertion suitable for CI where emulator cold-start spikes one-shot timings

Helpers return `null` on success / message on failure so callers wrap with `assertThat(...).isNull()` and the failure renders inline.

Tests

`LatencyAssertionsTest` (8 unit cases) — exercises each helper for pass/fail/edge cases with virtual time
`LatencyBudgetE2ETest` (instrumented) — runs the same fast-path utterance as `VoicePipelineFastPathE2ETest`, then asserts:
1. `FAST_PATH_TO_RESPONSE` was actually recorded (regression catches a code path that bypasses `endSpan`)
2. violation rate stays inside a soft ceiling

Why a soft check (and not a hard 200 ms guard in CI)

Hard-asserting <200 ms on AVD cold start would flake on healthy code paths (Hilt cold start, ToolExecutor first-touch, AlarmManager binder). The current ceiling is permissive (1.0) — the goal is "the recorder still counts" not the 200 ms target itself. As the suite grows and we have multiple warm samples per run, we can tighten the ceiling.

Real-device budget validation lives in `docs/real-device-smoke-test.md` step 3.

Test plan

`./gradlew testStandardDebugUnitTest assembleStandardDebug` — green
`./gradlew assembleStandardDebugAndroidTest` — green
Once ci: add instrumented test workflow on macOS AVD #461 (CI emulator workflow) merges, `connectedStandardDebugAndroidTest` will run `LatencyBudgetE2ETest` in CI

## Priority 5: Refactor / Quality (testing the priority-1 latency contract) Adds: - LatencyAssertions (production code, in main) — three pure-Kotlin helpers callable from both unit and instrumented tests: * checkSpanRecorded — guards "the timing path stopped recording" * checkNoBudgetViolations — hard budget assertion (use sparingly) * checkBudgetViolationRateAtMost — soft assertion suitable for CI where emulator cold-start spikes one-shot timings. Helpers return null on success / message on failure so callers wrap with assertThat(...).isNull() and the failure renders inline. - LatencyAssertionsTest (8 unit cases) — exercises each helper for pass/fail/edge cases with virtual time. - LatencyBudgetE2ETest (instrumented) — runs the same fast-path utterance as VoicePipelineFastPathE2ETest, then asserts: 1) FAST_PATH_TO_RESPONSE was actually recorded (regression catches a code path that bypasses endSpan) 2) violation rate stays inside a soft ceiling (currently permissive at 1.0 since a single fast-path call is 0 % or 100 %; the goal is "the recorder still counts" not the 200 ms target itself — real-device budget validation lives in docs/real-device-smoke-test.md) As the suite grows we can tighten the ceiling. Why a soft check: hard-asserting <200 ms on AVD cold start would flake on healthy code paths (Hilt cold start, ToolExecutor first-touch). Verification: - ./gradlew testStandardDebugUnitTest assembleStandardDebug — green - ./gradlew assembleStandardDebugAndroidTest — green

yuga-hashimoto merged commit 409be8f into main Apr 19, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(metrics): LatencyAssertions helpers + budget E2E guard#464

test(metrics): LatencyAssertions helpers + budget E2E guard#464
yuga-hashimoto merged 1 commit into
mainfrom
feat/e2e-budget-assert

yuga-hashimoto commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuga-hashimoto commented Apr 19, 2026

Priority 5: Refactor / Quality (testing the priority-1 latency contract)

What changed

Why a soft check (and not a hard 200 ms guard in CI)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant