ADR-0020: On-Device LLM Testing and Validation Approach

DateFebruary 12, 2026

CategoryInfrastructure

Context

Upcoming features rely on on-device LLMs. We need a repeatable, offline testing approach to validate model quality, latency, memory usage, and determinism across devices without telemetry.

Decision

We will implement a local, telemetry-free evaluation harness that:

Runs a fixed prompt suite with expected schema validation
Captures latency, token throughput, and memory usage
Validates output constraints (length, structure, safety rules)
Supports device-tier baselines (mid-range target first)

Rationale

Enables fast iteration without cloud dependencies.
Ensures regressions are caught when swapping models or prompts.
Aligns with privacy and offline-only constraints.

Consequences

Requires ongoing maintenance of prompt suites and expected schemas.
Device lab or emulator profiling is needed for performance baselines.
Adds build-time and CI effort.

Alternatives Considered (Optional)

Manual spot checks only — unreliable and non-repeatable.
Cloud-based evaluation — violates offline-only constraints.

Notes (Optional)

This harness will be referenced by all Phase 5+ feature ADRs.

Back to All Decisions