test(metrics): LatencyAssertions helpers + budget E2E guard#464
Merged
Conversation
## Priority 5: Refactor / Quality (testing the priority-1 latency contract)
Adds:
- LatencyAssertions (production code, in main) — three pure-Kotlin
helpers callable from both unit and instrumented tests:
* checkSpanRecorded — guards "the timing path stopped recording"
* checkNoBudgetViolations — hard budget assertion (use sparingly)
* checkBudgetViolationRateAtMost — soft assertion suitable for CI
where emulator cold-start spikes one-shot timings.
Helpers return null on success / message on failure so callers wrap
with assertThat(...).isNull() and the failure renders inline.
- LatencyAssertionsTest (8 unit cases) — exercises each helper for
pass/fail/edge cases with virtual time.
- LatencyBudgetE2ETest (instrumented) — runs the same fast-path
utterance as VoicePipelineFastPathE2ETest, then asserts:
1) FAST_PATH_TO_RESPONSE was actually recorded (regression catches
a code path that bypasses endSpan)
2) violation rate stays inside a soft ceiling (currently permissive
at 1.0 since a single fast-path call is 0 % or 100 %; the goal
is "the recorder still counts" not the 200 ms target itself —
real-device budget validation lives in
docs/real-device-smoke-test.md)
As the suite grows we can tighten the ceiling.
Why a soft check: hard-asserting <200 ms on AVD cold start would flake
on healthy code paths (Hilt cold start, ToolExecutor first-touch).
Verification:
- ./gradlew testStandardDebugUnitTest assembleStandardDebug — green
- ./gradlew assembleStandardDebugAndroidTest — green
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Priority 5: Refactor / Quality (testing the priority-1 latency contract)
Adds latency-budget regression guards so the P8.2 500 ms / 200 ms targets stop being silently broken by future pipeline edits.
What changed
Production helper (`app/src/main/.../voice/metrics/LatencyAssertions.kt`)
Three pure-Kotlin helpers callable from both unit and instrumented tests:
Helpers return `null` on success / message on failure so callers wrap with `assertThat(...).isNull()` and the failure renders inline.
Tests
Why a soft check (and not a hard 200 ms guard in CI)
Hard-asserting <200 ms on AVD cold start would flake on healthy code paths (Hilt cold start, ToolExecutor first-touch, AlarmManager binder). The current ceiling is permissive (1.0) — the goal is "the recorder still counts" not the 200 ms target itself. As the suite grows and we have multiple warm samples per run, we can tighten the ceiling.
Real-device budget validation lives in `docs/real-device-smoke-test.md` step 3.
Test plan