Skip to content

test(metrics): LatencyAssertions helpers + budget E2E guard#464

Merged
yuga-hashimoto merged 1 commit into
mainfrom
feat/e2e-budget-assert
Apr 19, 2026
Merged

test(metrics): LatencyAssertions helpers + budget E2E guard#464
yuga-hashimoto merged 1 commit into
mainfrom
feat/e2e-budget-assert

Conversation

@yuga-hashimoto

Copy link
Copy Markdown
Owner

Priority 5: Refactor / Quality (testing the priority-1 latency contract)

Adds latency-budget regression guards so the P8.2 500 ms / 200 ms targets stop being silently broken by future pipeline edits.

What changed

Production helper (`app/src/main/.../voice/metrics/LatencyAssertions.kt`)

Three pure-Kotlin helpers callable from both unit and instrumented tests:

  • `checkSpanRecorded` — guards "the timing path stopped recording"
  • `checkNoBudgetViolations` — hard budget assertion (use sparingly)
  • `checkBudgetViolationRateAtMost` — soft assertion suitable for CI where emulator cold-start spikes one-shot timings

Helpers return `null` on success / message on failure so callers wrap with `assertThat(...).isNull()` and the failure renders inline.

Tests

  • `LatencyAssertionsTest` (8 unit cases) — exercises each helper for pass/fail/edge cases with virtual time
  • `LatencyBudgetE2ETest` (instrumented) — runs the same fast-path utterance as `VoicePipelineFastPathE2ETest`, then asserts:
    1. `FAST_PATH_TO_RESPONSE` was actually recorded (regression catches a code path that bypasses `endSpan`)
    2. violation rate stays inside a soft ceiling

Why a soft check (and not a hard 200 ms guard in CI)

Hard-asserting <200 ms on AVD cold start would flake on healthy code paths (Hilt cold start, ToolExecutor first-touch, AlarmManager binder). The current ceiling is permissive (1.0) — the goal is "the recorder still counts" not the 200 ms target itself. As the suite grows and we have multiple warm samples per run, we can tighten the ceiling.

Real-device budget validation lives in `docs/real-device-smoke-test.md` step 3.

Test plan

  • `./gradlew testStandardDebugUnitTest assembleStandardDebug` — green
  • `./gradlew assembleStandardDebugAndroidTest` — green
  • Once ci: add instrumented test workflow on macOS AVD #461 (CI emulator workflow) merges, `connectedStandardDebugAndroidTest` will run `LatencyBudgetE2ETest` in CI

## Priority 5: Refactor / Quality (testing the priority-1 latency contract)

Adds:
- LatencyAssertions (production code, in main) — three pure-Kotlin
  helpers callable from both unit and instrumented tests:
    * checkSpanRecorded — guards "the timing path stopped recording"
    * checkNoBudgetViolations — hard budget assertion (use sparingly)
    * checkBudgetViolationRateAtMost — soft assertion suitable for CI
      where emulator cold-start spikes one-shot timings.
  Helpers return null on success / message on failure so callers wrap
  with assertThat(...).isNull() and the failure renders inline.
- LatencyAssertionsTest (8 unit cases) — exercises each helper for
  pass/fail/edge cases with virtual time.
- LatencyBudgetE2ETest (instrumented) — runs the same fast-path
  utterance as VoicePipelineFastPathE2ETest, then asserts:
    1) FAST_PATH_TO_RESPONSE was actually recorded (regression catches
       a code path that bypasses endSpan)
    2) violation rate stays inside a soft ceiling (currently permissive
       at 1.0 since a single fast-path call is 0 % or 100 %; the goal
       is "the recorder still counts" not the 200 ms target itself —
       real-device budget validation lives in
       docs/real-device-smoke-test.md)
  As the suite grows we can tighten the ceiling.

Why a soft check: hard-asserting <200 ms on AVD cold start would flake
on healthy code paths (Hilt cold start, ToolExecutor first-touch).

Verification:
- ./gradlew testStandardDebugUnitTest assembleStandardDebug — green
- ./gradlew assembleStandardDebugAndroidTest — green
@yuga-hashimoto yuga-hashimoto merged commit 409be8f into main Apr 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant