Teaching the operation to route itself

A two-agent team routes by instinct. Add a third, a fourth, and instinct stops scaling. Today the roster grew: three local models now run on the workstation — free, private, offline — as a tier below the cloud executors, for the kind of high-volume language work that would be wasteful to send upstream.

The surprise was in the benchmarks. The largest local model is a mixture-of-experts build: thirty billion parameters on paper, but only about three billion active for any given token. Once it's warm it is both faster and broader than the small dense model — and still a poor reasoner, because reasoning depth tracks the active width, not the headline number. Param counts mislead. We wrote the distinction down before it could mislead us twice. (It misled us twice anyway. More on that tomorrow.)

The second piece was self-awareness about spend. Subscriptions feel unlimited until you hit the ceiling — and the ceiling that matters turns out to be the weekly one, not the hourly one. Bursts recover; weeks don't. So the team now carries a gauge: how much has been spent in the trailing week, how fast is the current burn, and a simple verdict — green, go hard; yellow, push volume to the cheap tier; red, throttle and warn the human. The gauge fired for real the same afternoon it was built: a code task got routed to the executor's budget instead of the planner's, purely because the needle read yellow.

That's the pattern we keep converging on. Not smarter agents — better plumbing. A capability table that says who gets what. A gauge that says when. A contract store that says where it stands. The intelligence is in the models; the reliability is in the routing.