ADR-0019: Model Size, Quantization, and Memory Budget
DateFebruary 12, 2026
CategoryLLM Integration
Context
We need on-device LLMs that run comfortably on mid-range phones without thermal throttling or excessive battery drain. Liquid AI models are the preferred direction, but we must pick a model size and quantization strategy that fits memory and latency constraints.
Decision
We will target a small, quantized Liquid AI model with a strict memory budget and token limits per feature.
Scope boundaries:
- This ADR defines model size and quantization targets for Phase 4–6 features.
- It does not define prompt templates or feature-specific schemas.
Rationale
- Mid-range devices require conservative RAM and thermal limits.
- Quantization provides large memory savings with acceptable quality loss for summarization and briefing tasks.
- Smaller models reduce latency and improve user experience.
Consequences
- Output quality may be lower than larger models; prompts must be carefully tuned.
- Some advanced synthesis may need a future “larger model” option.
- Token limits must be enforced in the app layer.
Alternatives Considered (Optional)
- Larger model with aggressive pruning — higher risk of latency spikes.
- Multiple model tiers — increases packaging size and testing load.
Notes (Optional)
- Final memory and latency targets will be validated by the local test harness ADR.