Our mission at SageOx is to turn humans and AI agents into real teams by providing the shared memory and collaboration tools that keep them coherent.

Beyond the tooling, we know the underlying operating model of work is changing. As a team, we've had to rethink from first principles how we organize ourselves, how we collaborate, what we pay attention to, and what we deliberately ignore.

Because the shift feels so fundamental, we've decided to write a book about it and share what we're learning with other teams navigating the same transition.

We believe in working in the open, so I'm sharing a draft of Chapter 1. It's my attempt to name the change we're experiencing: scale is becoming a function of throughput relative to human attention, not just headcount.

We're writing this book in public. Each chapter will be published as it's written. The ideas are evolving and your feedback is welcome.

Chapter 1 — The New Shape of Scale

Many mornings, one of us opens the repository to find work that didn't exist the night before — a feature, fully built, that never came up in conversation. The outcome and work is often solid. The feeling is vertigo: who decided this, what was the plan and backstory that led to this, and what else is already in motion that no one has seen whole?

The sharpest version we've felt happened over a single weekend. Two team members disappeared into the same problem — no assignment, no planning session, no decision that two parallel efforts should exist. Each had independently concluded the problem was worth solving, and each teammate spent the weekend solving it. By Monday they surfaced with solutions, and neither of them had known the other was working on the same thing.

The two approaches turned out to complement each other and merged into an outcome better than either would have been in their own right. This was one of multiple instances where we found ourselves with a strange question — how does a team small enough to sit within earshot of one another start exhibiting the coordination challenges of a much larger organization? And in an AI powered era, is that misalignment a problem or actually an opportunity?

SageOx currently operates as a two-pizza team startup. As founders with early roots at Amazon, we embrace the idea: a tight group of fewer than ten people, operating with minimal overhead, and with high trust. On paper, a team like ours should not have scaling problems. Scaling problems are what you earn later — after growth, after layers of management, after the org chart grows enough that the top can no longer clearly see the bottom. Scaling friction is the tax paid for getting big, and we are supposed to be on the other side of that tax: small enough to fit in one conversation, fast enough to change direction in a single meeting, close enough that context can be shared within earshot.

On top of being small, the entire team is seasoned. Each of us carries extensive and varied experience. We are used to working autonomously, used to unblocking ourselves, used to making decisions swiftly. We share physical space. We like one another. The communication lines for a team like us are as short and efficient as they get. There should be no scaling problems. There should be nothing lost in translation.

And yet we have them. A different flavor of the scaling problems we experienced in Big Tech and the ones lived through at earlier startups — but unmistakably scaling problems all the same. Drift. Incomplete context. Lack of visibility. The recurring, low-grade sense that no single person is holding the full picture anymore. The nagging fear of dropping something important, or being blindsided by a decision made and executed outside our view. Decisions made in one session quietly contradicting decisions made in another. Work getting done that no one else knew was underway.

None of this makes the scaling challenges unwelcome. The explosive creativity across the product is thrilling, and it catches us by surprise daily, in the best way. What bites is not the scale itself — it's the friction that rides in with it.

The temptation is to diagnose this as a communication failure. A team this small should simply talk more, document more, hold a better standup, tighten the process. It is an alluring explanation.

But it never quite rang true and would also be physically impossible when maintaining this level of momentum. Piling more energy onto human coordination would not be the right remedy — plus we don't have a surplus of human energy to spend in the first place.

The more we watched it, the less this communication failure theory held up. The problem was not that information was failing to move through the team. Instead, the root cause was that the volume of activity had exceeded what we could track, at a rate faster than what could comfortably be kept up with. Eventually the conclusion became unavoidable: scale is no longer primarily about people.

Organizational scale is no longer about headcount

For the entire history of software teams, organizational scale has meant one thing: people. You scaled by hiring. Throughput was a function of headcount — imperfectly, with diminishing returns, but reliably enough that every coordination tool we have was built around that assumption. Org charts, roadmaps, sprint planning, OKRs: all of them are instruments for getting more humans to row in the same direction. The problem they solve is the problem of many hands. Many minds. Many time zones. Many personal agendas that have little to do with the needs of the organization. For decades, throughput and headcount moved together closely enough that no one needed to tell them apart.

That relationship is now coming apart.

It starts at the level of the individual. Each of us, paired with AI, now moves at a speed we never imagined possible — a single person can do in a day what used to take a team a quarter. That part is pure exhilaration. The strangeness begins when you put a handful of newly superhuman individuals on one team and ask them to stay coordinated.

At SageOx, we do not have a headcount problem. We have a throughput problem — a good problem to have, but a problem nonetheless. Because we lean heavily on AI agents and assistants, the capacity of this team to produce — not only code, but ideas, content, even the operational processes we run on — is scaled far above our weight class. And it isn't only that we move faster through familiar work. We are producing, at remarkable speed, things we had no specialized skill or knowledge to tackle. Partnering closely with AI, we venture into domains that used to be off-limits to a team like ours: they demanded skills we didn't have, more time than we could spare, and carried a cost of failure too high to risk. Now we play, we try, and every so often we stumble onto something that ends up woven into the company — something we could never have built by hand. Ideas become prototypes in an afternoon. Weekend experiments become real products. Work that once required a team becomes something one person can pursue between conversations.

Throughput no longer tracks headcount at all: we are producing in line with a very large organization while remaining a very small one.

This sounds, at first, like a story about productivity. It is actually a story about scale. Large organizations spent years growing into their output, and along the way they accumulated the connective tissue that makes high throughput survivable — institutional memory, operating principles, communication structures, specialized roles. Their coordination machinery evolved alongside their capacity. We have not had that luxury. The output arrived first; the coordination burden arrived with it. And there is a second twist. When a large organization produces this much, no single person has to hold the entire view; it is distributed across many people and the structures between them. When a team of fewer than ten produces the same volume, the few of us carry that cognitive load directly, because we are the ones operating the system as a whole.

So the throughput of work moving across our team is now larger than any one of us can absorb. That is the scaling problem. Not too many people, but too much work per person — arriving too fast, in too many places at once, for any individual to keep in their head. It is a scaling axis nobody wrote a playbook for, because for fifty years the axis was headcount and throughput more or less followed. Now throughput has detached from headcount and shot ahead of it, and all our instruments are still calibrated to the old relationship.

This is why the disorientation feels familiar and foreign at the same time. The symptoms are the symptoms of bigness — drift, lost context, no one holding the whole. But the cause is the opposite of bigness. We are not too many. We are producing as if we were many, while remaining few. The org-shaped problems have arrived without the org.

Two fair objections

The first: haven't we heard this before? Every generation believes its tools change everything. The compiler changed software development. So did the integrated development environment (IDE), the cloud, the open-source library, the answer waiting on Stack Overflow. Each one dramatically increased what a team could produce; organizations adapted; the work kept its essential shape. Why should this be any different?

Because every one of those tools accelerated production while leaving the fundamental bottleneck intact. The coding got faster, the deployment got cheaper, the lookup got instant — but the limiting factor was still the amount of work humans could personally perform. Each revolution enlarged a bottleneck we already knew how to manage.

The second objection cuts the other way. Most of what we produce at SageOx gets thrown away — so perhaps the real problem is simpler: we are building too much. Slow down. Be more selective about which ideas we chase. But this is exactly where we have found the most value, and exactly what AI makes newly possible. The point is to explore without bounds — to tinker, to chase the half-formed idea, to build the throwaway version and learn from it, and to let the best ideas surface from the volume rather than be chosen in advance. Yes, it is messy. It is also where the breakthroughs come from; the discard pile is not waste but the cost of discovery, and that cost has never been lower. Slowing down would forfeit the very upside AI exists to give us. And yet the overload is real. So if the answer is not to build less, what is it?

Both objections miss the same thing. The change AI brings is not that it enlarged an old bottleneck, nor that it tempts us to overbuild. It is that the binding constraint has moved. For the first time, a team can generate work faster than it can absorb the consequences of that work. An agent can produce a feature faster than a person can carefully review it. It can refactor a subsystem faster than anyone can understand what changed. Several people, each running multiple agents, can set more in motion than a small team can hold in view. The bottleneck moves. It used to sit at production; it now sits at comprehension.

The constraint is attention, not time

For decades the scarce resource in software was time — engineer-hours. You could draw a fairly honest line from hours spent to features shipped, which is why every planning ritual we have is, underneath, a way of budgeting time. Estimate the hours, divide by the people, fill the calendar. The whole apparatus assumes time is the thing you run out of.

Time is no longer the thing we run out of. Time collapses when agents do the typing. The hours that used to stand between an idea and a working version of it have largely evaporated, and they keep evaporating as the tools get better. If time were still the binding constraint, a team our size producing at this rate would feel effortless. It does not feel effortless. It feels overwhelming. That mismatch is the tell that we are running out of something else.

That something is attention — the finite, human capacity to understand, judge, and trust what the agents produce, and to stay oriented to one another while they produce it. It is the one input the agents cannot supply for us. They generate; they cannot decide that what they generated is right, or worth keeping, or consistent with what someone else shipped an hour ago. That judgment is ours alone, and there is only so much of it. And unlike production, it does not scale: every agent we add multiplies what there is to comprehend without adding any capacity to comprehend it. Output rises without limit; attention stays fixed. That asymmetry is the defining arithmetic of working this way.

And our own output is not the only thing competing for that attention. The ecosystem itself is moving at a dazzling speed. You can spend an entire day just reading the distilled summaries — or the summaries of those summaries — of new models and technical breakthroughs, from an endless and growing roster of players, big and small, legacy and startup, all innovating at once and flooding the field with updates. An executive at a frontier lab recently admitted she can't keep up with what her own company is releasing. The natural human instinct, as the bar keeps rising, is to keep doing what we've been doing and simply do it harder — work longer hours, adopt unnatural routines to squeeze out every last ounce of attention, let other priorities slide. But working harder plainly does not scale with an explosion of information and progress on this order. It's also not sustainable.

The throughput is the impressive part — but throughput is not the driver of value. The scarce thing is the attention to decide what all that capacity should point at: to orient, to judge what's next, to articulate it clearly enough to aim the agents and each other. Direction is not a second constraint competing with attention; it is what attention buys. Spent well, our attention sets direction; spent badly — scattered across output we can't keep up with — direction is the first thing we lose. The capacity to build has increased. The capacity to spend scarce human attention on judgment rather than mere keeping-up has not, and it is now the resource in shortest supply.

Humans are not wired to operate at this speed and volume. We did not evolve to supervise a dozen parallel streams of competent work, each moving faster than we can read. Pretending otherwise — pretending we can simply will ourselves to keep up if we try a little harder — is precisely how the disorientation sets in. The honest design stance is to treat human attention as the precious, hard-capped resource it is, and to build everything downstream of that fact. The environment is producing faster than we can perceive it; the job is to make the environment perceivable, and to enlist the agents themselves in translating what they produce into something a human can actually process. An agent that can write a feature can also tell you, in a sentence, what it changed and why it matters. The same throughput that creates the overload can be turned toward relieving it — but only if you have correctly named what you are short on.

Why naming the constraint changes the response

When organizations scaled by adding people, the answer was clarity. The problem of many hands is a problem of alignment, and alignment is bought with crispness: concrete goals, key results the whole org can see and rally around, roadmaps that promise a destination far enough out that everyone can steer toward it. We have all worked inside that playbook. Some of us helped run it. It is a good playbook — for its problem. Faced with a hundred people who might each drift in a slightly different direction, you reduce the drift by making the target unmistakable, shared, and measurable.

But our drift does not come from a hundred people pulling apart. It comes from fewer than ten people, plus their agents, producing faster than the team can collectively perceive. Pouring more crispness onto that problem does not fix it. It slows us down and can make things worse, because the instruments of clarity quietly assume the old constraint — they assume time is scarce and attention is roughly free, when in fact the reverse is now true. So the artifacts we reach for by reflex — the goals, the roadmaps, the tidy units of planning — start to misfire in specific, diagnosable ways.

What the old artifacts get wrong

Goals have become a floor, not a ceiling

A goal was always supposed to be a stretch — the ambitious thing you aim at and might fall short of. You set it above where you think you'll land, so that reaching for it pulls you higher than you'd otherwise go. Some teams make the philosophy explicit: hit seventy percent of the goal and you're happy with the outcome.

In this environment, goals are starting to behave like a floor instead. Anything we can write down crisply enough to measure is, almost by definition, smaller than what our throughput could plausibly produce. The act of making a goal legible — concrete, bounded, trackable — caps it at the scale of things we can already imagine and articulate. But the most valuable work right now is routinely the work we did not predict, the thing that became possible only once we started building and saw what the agents could actually do. A crisp, measurable goal may feel satisfying but excludes exactly that kind of unimagined possibility. The harder we lean on it, the more tightly we anchor ourselves to a horizon that throughput has already overtaken. We aim at the measurable thing, hit it early, and mistake hitting it for success — when the real opportunity was the larger, less legible thing the goal was too small to contain.

This is not an argument against ambition or against ever measuring anything. It is an argument that in a throughput-rich, attention-poor environment, the measurable target is the conservative target, and treating it as the stretch goal systematically aims too low.

Roadmaps promise a future that arrives too fast

A roadmap is a promise about the future: here is the sequence, here is roughly when each piece lands, here is the destination. It works when the future is far enough off and stable enough that describing it in advance pays — when planning costs little relative to the execution it guides.

Today, the future arrives faster than humans can describe it. By the time a roadmap is written and circulated, the ground beneath it has shifted; an idea that didn't exist when the quarter began is now a live product, and it has changed what's worth doing next. The most interesting work is the work we didn't predict — so the roadmap is most confident about the things least likely to matter, and silent about the things that will. A promise about the future is only as good as the future's willingness to hold still, and ours will not.

There is a deeper version of this problem. Writing the roadmap, keeping it current, reconciling it with what actually happened — all of that is itself attention, the scarcest resource we have. The traditional tools don't just track the work; they generate it, paid for in the one currency we cannot afford to waste.

The unit of conversation has moved

The most concrete shift is in what we even talk about, and at what grain.

Conversations that used to live at the bug or feature level now belong at the epic level. When fixing a bug or building a feature is a matter of describing it to an agent and reviewing the result, they stop being worth a meeting; the meaningful decision is about the epic they belong to. By the same logic, things we used to test as features now have to be protected as end-to-end scenarios. A feature is cheap enough to regenerate that guarding the feature in isolation is no longer the point. What's worth protecting is the whole user-facing path — the scenario that has to keep working no matter how much churns underneath it. The unit of attention rises because the units below it have got cheap enough to stop deserving our scarce attention individually.

At the same time, our planning horizon contracts. When work that once filled a quarter now lands in an afternoon, the future reshapes itself faster than we can plan against it. Planning a year out is unthinkable; even a quarter out is mostly guesswork. The window we can actually see into and steer toward has shifted closer — from quarters, to weeks, to days.

So two pressures act at once, in opposite directions. The granularity of our conversations rises: we talk about bigger and bigger units. The horizon we can plan them against shrinks. The coordination artifacts we inherited were built for small units on long horizons. We now work in the opposite corner — large units, short horizons.

What follows from this

If you accept the diagnosis, a few things stop being matters of taste and start being matters of design.

The constraint we build against is attention, so everything we build should be measured by whether it spends human attention well or wastes it. An artifact that demands attention to maintain — a roadmap that has to be reconciled, a goal that has to be tracked, a scribe who has to track takeaways — has to earn its keep against the attention it consumes, and many of the inherited mechanisms from existing management techniques no longer do. An artifact that returns attention — that compresses a flood of agent output into something a person can grasp in a sentence — is worth more now than it would have been when attention was cheap.

And the agents themselves are part of the answer, not just the source of the problem. The same systems generating more than we can absorb can be turned toward making what they generate absorbable: summarizing what changed, surfacing what needs a human decision, signaling to each other and to us what they're working on so the picture stays whole without a person having to assemble it by hand. The throughput that created the overload is also the only thing fast enough to relieve it.

We are a two-pizza team with the coordination problems of an organization an order of magnitude or two our size, caused not by being too many but by producing as if we were. The old playbook was written for the opposite condition and reaches for the opposite remedy. The rest of this book is about what the new playbook looks like — starting, in the next chapter, with how we got here: the genealogy that runs from waterfall through agile and DevOps and platform engineering to the agentic moment we're standing in, what each era taught us about coordination, and what specifically is breaking now.

If there is one claim to carry out of this chapter, it's worth stating plainly before we go on. Scale is no longer about how many people you have. It is about how much throughput moves across your team relative to the attention you can bring to bear on it. Attention, not time, is the constraint we are designing against — and pretending otherwise is how the disorientation sets in.

We're publishing this book chapter by chapter as it's written. If this resonated — or if you disagree with something — we'd love to hear from you: feedback@sageox.ai