One Model for Everything Is a Habit, Not a Strategy

One model for everything is a habit, not a strategy.

Many folks I talk to are running a sledgehammer at tasks that need a scalpel, and a pocketknife at tasks that need a sledgehammer. They picked a frontier model on launch day, anchored to it, and now route every prompt through it regardless of what the task actually requires.

Model selection should follow task class. Not personal preference, not brand loyalty, not whatever was on the keynote stage last quarter. The same discipline you apply to compute, storage, or network design applies here. Match the resource to the workload.

I run a personal stack called OpenClaw on a Mac mini that routes across five models. Here is how the lanes break down.

Opus 4.7. High ambiguity, high stakes, open-ended reasoning. When the cost of a wrong answer is real and the prompt cannot be fully specified upfront. Architecture decisions, ambiguous policy questions, novel problem framing. This is the lane where I want depth and patience over speed.

Sonnet 4.6. The workhorse. Defined professional tasks that require quality but do not require frontier reasoning. Drafting, structured analysis, synthesis of source material into stakeholder documents. My default tier when there is no specific reason to go up or down.

Haiku 4.5. Speed and volume. Extraction, classification, tagging, repetitive transformations. The quality ceiling is lower, and that is the point. The task class does not need a Sonnet, and forcing one in slows the pipeline and inflates the bill for no benefit.

GPT-5.5. Multi-step workflow execution. Tool use, self-correction, completing a chain of work without me supervising every step. I treat this as the agentic lane, not a reasoning competitor to Opus. Different job, different fit.

Qwen running locally. Two roles. First, a resilience layer for when the hosted APIs hit rate limits or have an outage, which has happened more than once this year. Second, lightweight ops automation. Email triage, calendar nudges, cron-driven scripts. Local execution means routine operational data does not leave the machine.

None of this is about ranking. There is no best model in this stack. There is only fit.

Be honest about what the architecture actually is. Five endpoints with a routing layer in front. That is not five subscriptions for the sake of it. It is the same systems design pattern you would apply anywhere else. Match compute to workload, manage cost, build resilience in, and avoid single points of failure for work you depend on.

The cost of getting this wrong is not catastrophic. You overspend on tokens for tasks that do not need them. You underuse capacity for tasks that do. You take an outage on the chin because you never built a fallback. None of those show up as a single visible failure. They show up as quiet drag on output and resilience.

The discipline is not exotic. Pick a default tier. Define when you go up. Define when you go down. Define what runs locally. Have a plan.