All Decisions

ADR-0019: Model Size, Quantization, and Memory Budget

DateFebruary 12, 2026
CategoryLLM Integration
Tags
llm-inference

Context

We need on-device LLMs that run comfortably on mid-range phones without thermal throttling or excessive battery drain. Liquid AI models are the preferred direction, but we must pick a model size and quantization strategy that fits memory and latency constraints.

Decision

We will target a small, quantized Liquid AI model with a strict memory budget and token limits per feature.

Scope boundaries:

  • This ADR defines model size and quantization targets for Phase 4–6 features.
  • It does not define prompt templates or feature-specific schemas.

Rationale

  • Mid-range devices require conservative RAM and thermal limits.
  • Quantization provides large memory savings with acceptable quality loss for summarization and briefing tasks.
  • Smaller models reduce latency and improve user experience.

Consequences

  • Output quality may be lower than larger models; prompts must be carefully tuned.
  • Some advanced synthesis may need a future “larger model” option.
  • Token limits must be enforced in the app layer.

Alternatives Considered (Optional)

  • Larger model with aggressive pruning — higher risk of latency spikes.
  • Multiple model tiers — increases packaging size and testing load.

Notes (Optional)

  • Final memory and latency targets will be validated by the local test harness ADR.