All Decisions

ADR-0020: On-Device LLM Testing and Validation Approach

DateFebruary 12, 2026
CategoryInfrastructure
Tags
llm-inferencetesting

Context

Upcoming features rely on on-device LLMs. We need a repeatable, offline testing approach to validate model quality, latency, memory usage, and determinism across devices without telemetry.

Decision

We will implement a local, telemetry-free evaluation harness that:

  • Runs a fixed prompt suite with expected schema validation
  • Captures latency, token throughput, and memory usage
  • Validates output constraints (length, structure, safety rules)
  • Supports device-tier baselines (mid-range target first)

Rationale

  • Enables fast iteration without cloud dependencies.
  • Ensures regressions are caught when swapping models or prompts.
  • Aligns with privacy and offline-only constraints.

Consequences

  • Requires ongoing maintenance of prompt suites and expected schemas.
  • Device lab or emulator profiling is needed for performance baselines.
  • Adds build-time and CI effort.

Alternatives Considered (Optional)

  • Manual spot checks only — unreliable and non-repeatable.
  • Cloud-based evaluation — violates offline-only constraints.

Notes (Optional)

  • This harness will be referenced by all Phase 5+ feature ADRs.