ADR-0018: On-Device LLM Inference Engine

DateFebruary 12, 2026

CategoryLLM Integration

Context

We need a reliable on-device inference stack to support upcoming LLM-assisted features. The app is offline-only, must run on mid-range Android phones, and should have predictable performance. We are leaning toward models from Liquid AI, but need to choose an inference engine and model format that can run those models locally with acceptable latency and memory use.

Decision

We will adopt an on-device inference engine that supports Liquid AI models and provides:

Stable Android integration
Quantized model support
Streaming token generation
Predictable memory usage

We will prioritize a lightweight native runtime (C/C++ with JNI) and ship a single model format to minimize complexity.

Rationale

Optimizes for offline reliability and predictable performance.
Reduces integration risk by standardizing on one engine and format.
Keeps memory and binary size under control for mid-range devices.

Consequences

Engine choice constrains model format and tooling.
Any future engine switch will require model re-export and retesting.
We must build a thin abstraction layer to isolate engine specifics.

Alternatives Considered (Optional)

Multiple engines per device class — increases complexity and QA surface.
Cloud fallback — violates offline-only constraint.

Notes (Optional)

Revisit if Liquid AI releases an official Android runtime with better performance or tooling.

Back to All Decisions