ADR-0019: Model Size, Quantization, and Memory Budget

DateFebruary 12, 2026

CategoryLLM Integration

Context

We need on-device LLMs that run comfortably on mid-range phones without thermal throttling or excessive battery drain. Liquid AI models are the preferred direction, but we must pick a model size and quantization strategy that fits memory and latency constraints.

Decision

We will target a small, quantized Liquid AI model with a strict memory budget and token limits per feature.

Scope boundaries:

This ADR defines model size and quantization targets for Phase 4–6 features.
It does not define prompt templates or feature-specific schemas.

Rationale

Mid-range devices require conservative RAM and thermal limits.
Quantization provides large memory savings with acceptable quality loss for summarization and briefing tasks.
Smaller models reduce latency and improve user experience.

Consequences

Output quality may be lower than larger models; prompts must be carefully tuned.
Some advanced synthesis may need a future “larger model” option.
Token limits must be enforced in the app layer.

Alternatives Considered (Optional)

Larger model with aggressive pruning — higher risk of latency spikes.
Multiple model tiers — increases packaging size and testing load.

Notes (Optional)

Final memory and latency targets will be validated by the local test harness ADR.

Back to All Decisions