All Decisions

ADR-0018: On-Device LLM Inference Engine

DateFebruary 12, 2026
CategoryLLM Integration
Tags
llm-inference

Context

We need a reliable on-device inference stack to support upcoming LLM-assisted features. The app is offline-only, must run on mid-range Android phones, and should have predictable performance. We are leaning toward models from Liquid AI, but need to choose an inference engine and model format that can run those models locally with acceptable latency and memory use.

Decision

We will adopt an on-device inference engine that supports Liquid AI models and provides:

  • Stable Android integration
  • Quantized model support
  • Streaming token generation
  • Predictable memory usage

We will prioritize a lightweight native runtime (C/C++ with JNI) and ship a single model format to minimize complexity.

Rationale

  • Optimizes for offline reliability and predictable performance.
  • Reduces integration risk by standardizing on one engine and format.
  • Keeps memory and binary size under control for mid-range devices.

Consequences

  • Engine choice constrains model format and tooling.
  • Any future engine switch will require model re-export and retesting.
  • We must build a thin abstraction layer to isolate engine specifics.

Alternatives Considered (Optional)

  • Multiple engines per device class — increases complexity and QA surface.
  • Cloud fallback — violates offline-only constraint.

Notes (Optional)

  • Revisit if Liquid AI releases an official Android runtime with better performance or tooling.