When a multi-agent system gets expensive, the reflex is to reach for a bigger model or a bigger context window. But in a fleet, the tokens that dominate the bill aren't spent on reasoning. They're spent on repetition.
A single agent that re-reads its whole history every turn is already wasteful. A fleet of agents is the same mistake, multiplied. Four habits drive the waste:
The shape of the bill is agents × history × redundancy: three multipliers, none of which is capability. A bigger window doesn't help, cost still rises with every token, latency climbs, and models lose the thread in the middle of long contexts. Five huge windows is just five times the duplicated spend.
So we measured it. On MA-MemBench, our benchmark for the communication cost of a fleet, we put SuperLazy head-to-head against a naive fleet (every agent re-ships everything) and against mem0, a popular memory layer, all running the same model under identical conditions. The result: you don't trade accuracy for efficiency. You get both.
The lead agent reads roughly 900 tokens per query instead of 8,600 with mem0 or 24,000 with a naive fleet, a tiny, precise context instead of wading through everything. And there's a hidden cost the headline numbers miss: mem0 runs an extraction model on every update to the world, which in this run burned ~1.5 million tokens before a single question was asked. SuperLazy adds none of that, so its total spend is a fraction of mem0's on top of being faster per query.
The same efficiency thesis holds on the field's open long-memory benchmark, LongMemEval, where the answer is buried inside ~100,000 tokens of history. SuperLazy surfaces the handful of moments that hold the answer, 95.3% Recall@15 and 92.5% Recall@10, while sending roughly 18× less context than feeding the full history (~5.8k tokens per question vs ~103k).
The takeaway is simple: the cheapest token is the one you never re-send. Strip out the duplication and a deep, fleet-wide memory costs about the same to query as a shallow one, with accuracy that goes up, not down.
We're building the shared-memory layer that makes agent fleets affordable. If you're running agents at scale, let's talk.