ADR-0020: On-Device LLM Testing and Validation Approach
DateFebruary 12, 2026
CategoryInfrastructure
Context
Upcoming features rely on on-device LLMs. We need a repeatable, offline testing approach to validate model quality, latency, memory usage, and determinism across devices without telemetry.
Decision
We will implement a local, telemetry-free evaluation harness that:
- Runs a fixed prompt suite with expected schema validation
- Captures latency, token throughput, and memory usage
- Validates output constraints (length, structure, safety rules)
- Supports device-tier baselines (mid-range target first)
Rationale
- Enables fast iteration without cloud dependencies.
- Ensures regressions are caught when swapping models or prompts.
- Aligns with privacy and offline-only constraints.
Consequences
- Requires ongoing maintenance of prompt suites and expected schemas.
- Device lab or emulator profiling is needed for performance baselines.
- Adds build-time and CI effort.
Alternatives Considered (Optional)
- Manual spot checks only — unreliable and non-repeatable.
- Cloud-based evaluation — violates offline-only constraints.
Notes (Optional)
- This harness will be referenced by all Phase 5+ feature ADRs.