What has Claude NOT changed in 2026

The first 10% needs phronesis and the last 1% polish still needs elbow grease

Ajit Banerjee

Ajit Banerjee

In 2012, Jeff Bezos told a room that he kept getting asked the wrong question:

"I very frequently get the question: 'What's going to change in the next ten years?' I almost never get the question: 'What's NOT going to change in the next ten years?' And I submit to you that that second question is actually the more important of the two — because you can build a business strategy around the things that are stable in time."

For Amazon the stable things were low prices, fast delivery, vast selection: "it's impossible to imagine a future where a customer says, 'I love Amazon, I just wish the prices were a little higher.'" You pour energy into the things that don't move, because the dividend keeps paying.

The reason that question is hard is that the things which do move never announce themselves. In Building Evolutionary Architectures, the authors make the point that nobody decrees the Next Big Thing:

"Many developers suspect that a cabal of architects retreat to an ivory tower to decide what the Next Big Thing will be. The process is much more organic. New capabilities constantly arise within our ecosystem, providing new ways to combine with existing features to enable new capabilities."

Their example is microservices, and they make it with a time machine. Pitch microservices to a head of operations in the year 2000 ("I'll need about 50 new computers, 50 operating-system licenses, another 20 for isolated databases…"), and the answer is "Please leave my office." The idea was fine. The ecosystem to support it, cheap Linux and Continuous Delivery, wasn't there yet. Microservices didn't arrive because someone was clever; they arrived because the substrate underneath finally made the combination affordable.

In 2026 the new piece of substrate is AI coding, and it is every bit as ecosystem-shifting. Claude made a whole stretch of the work cheap the way Linux and CD once did, and the organic churn the authors describe is roaring: new combinations, new things suddenly possible. That's the part every essay this year is about, and it deserves the attention.

But the more useful question is still Bezos's. With the substrate heaving under us, what didn't move? Two stories from a single week of work convinced me there are two parts of building that are same as they ever were, and that they sit at opposite ends of the same job.

The barbell

Here is the claim I'll defend. The work of shipping real software has a shape, and AI coding bent the middle of it almost flat. The stretch from "I roughly know what I'm building" to "it mostly works" (call it 10% to 99%) used to be the grind. Now it's the part I'd happily quote a 40× on. Claude is astonishing there.

What it didn't touch is the two ends.

The first 10% — deciding what the thing even is, the foundational call that every later line of code inherits — still takes phronesis, Aristotle's word for practical wisdom: knowing the right thing to do in the particular case, which you can only earn by having been burned in particular cases before. And the last 1% — the polish, the one bug that won't die, the thing that has to actually work in the real world and stubbornly doesn't — still takes elbow grease. Foundational thinking at one end, bloody-minded grinding at the other, and a melted middle between them. A barbell.

SageOx is the hivemind for human-agent teams, and I spend my days at both ends of that barbell. This week each end handed me a perfect example. I'll start at the far end: the elbow grease.

The last 1%: a bug I can only catch by hand

We've been moving SageOx's recording onto Media-over-QUIC, the conversation live on the wire instead of uploaded after the fact. The transport works. What's left is the 1% that decides whether a customer ever trusts it: does it survive the real world, where Wi-Fi drops mid-sentence and a device sits in a conference room for an hour?

So I've been running our own standups through the Ox Dots, and I set up an SSID called OxChaos, a network I flip on and off from time to time to see what breaks. This is the cheap, mean version of the chaos testing Ryan has been building into our gremlins: unleash the adversary on your own front door before a customer does. Mostly the device is robust now. Long recordings run for hours. Then, every so often, minute 40 hands me something new.

One of them was a beauty. The device's TLS handshake to our relay would die with -0x7200 (MBEDTLS_ERR_SSL_INVALID_RECORD), and here's the cruel part:

"Once it gets parked, it stays parked till I power cycle it. It's happened three times this month. That's why this is the month's Heisen Bug."

It took weeks to corner, and the cause was nowhere near where I was looking. The device couldn't reliably get a contiguous block of heap much past 12 KB. mbedTLS preallocates a 4 KB input buffer and, when an inbound record won't fit (a fat certificate chain pushes the handshake past 4 KB), it grows that buffer toward its 16 KB ceiling. On a fragmented heap the larger contiguous block sometimes isn't there, the allocation fails, and the handshake dies with -0x7200. Which is exactly why it never surfaced in my tcpdumps: the bytes on the wire were fine. The failure wasn't on the network at all: it was memory fragmentation on the device, and you cannot see a fragmented heap in a packet capture.

When I described this to Rupak Majumdar, he pushed the hard objection, Rich Sutton's bitter lesson: general methods that throw compute at a problem keep beating clever hand-built rules. So why not point a fuzzer at the firmware and walk away? Because random chaos mostly finds random crashes. The bug that matters hides at a particular seam: a buffer wrapping, two clocks drifting out of sync, a TLS record landing one byte over a 4 KB cap at minute 40. Finding it is an act of aim, not volume. You have to suspect where the system is weak before you torture it there. That suspicion is the engineering sensibility, and it turns out to be the same faculty as the design taste at the other end of the barbell.

That's the shape of the last 1%. The two tools that scale (a packet capture, and a model that reasons fluently about TLS record sizes) were both staring at the wrong layer, because the bug lived in the collision of this firmware, this heap, and this certificate chain. Catching it took a human on the device, suspicious of the heap, patient and a little obsessive. Physical, particular, unglamorous. Elbow grease, same as it ever was.

The first 10%: hashing it out over 18-year-old whiskey

The other end of the barbell showed up the same week, when Lore dropped.

Lore is Epic's new open-source version control for code-plus-large-binary-assets: "next-generation open source version control," MIT-licensed, content-addressed Merkle trees, chunked storage, hydrate-on-demand. If that sounds familiar, it should. It's doing for game assets what Xet, the content-addressed storage now carrying Hugging Face toward an exabyte by the end of the year, did for ML data. So when Lore released, two former XetHub engineers, Yucheng and Hoyt, came by SageOx HQ, and we opened the 18-year-old Aberlour that Galex had brought in for his birthday and discussed design tradeoffs for a couple of hours.

Yucheng, Hoyt, and Ajit at SageOx HQ over a bottle of Aberlour.

Let me be clear about what this is and isn't. I am not a game-dev person, and I'm not poking holes in a storage system built for a problem that isn't mine. Lore is built to let a studio of artists coordinate on shared, unmergeable assets, and it's well-shaped for that. I read its design the way you'd read a colleague's: looking for the one axis where our experience would have pushed the other way.

Racing driver Jackie Stewart's line is the one that fits:

"You don't have to be an engineer to be a racing driver, but you do have to have mechanical sympathy."

Martin Thompson dragged that into software for a reason: you can know every fact about a system and still not feel what it does to the machine at speed. For XetHub, as we were designing a system that could scale to exabytes, one of the core design rules was that the server would never carry metadata at the cardinality of the chunk, and the chunk was ~64 KiB. What I noticed in Lore is that it has no xorbs, Xet's aggregate of those chunks packed into a 64 MiB object. Lore writes one object, and one metadata row, per chunk. Cleaner, to a reader. A missing gear, to us. The whiskey was about whether that's real intuition or just nostalgia for the thing we built.

That night's argument was really an effort to articulate the intuition behind that rule. We landed on two reasons, both only visible six orders of magnitude out from where the design feels fine:

Bookkeeping at scale. Store data this way and you keep one bookkeeping entry per chunk, and the chunk is ~64 KiB. The database that holds those entries, DynamoDB, is huge: trillions of entries, no problem, and we know people pushing it that far. But an exabyte in ~64 KiB chunks is 1 EB ÷ 64 KiB ≈ 15 trillion chunks. So 15 trillion entries, right up against that ceiling with no margin. Bundle the chunks into xorbs first and you track a thousand times fewer entries: clear of the ceiling, with room to spare for the extra bookkeeping a real filesystem brings. Big enough, and the chunk simply can't be the unit you count. The xorb has to sit on top.

Speed at the client. One does not want an expensive machine with GPUs sitting idle waiting to download a model or a dataset. If the client pulls ~64 KiB at a time, it's throttled by request rate, not bandwidth, even with a CDN edge 100 ms away. Do the back-of-envelope: ~64 KiB chunks at 100 connections a second gets you ~6 MiB/s, and then you're slamming into connection limits on both ends. To saturate a client's pipe you need each connection pulling a big payload. That's not a nice-to-have. It's the whole game. Reconstructing a 5 GiB file from ~80 ranged reads into large objects is a different universe from ~80,000 individual fetches.

The reason that we can trust our call to build up the additional layer of xorbs is that it has already been paid out at full scale. Back in December 2022 we wrote about scaling git-xet to terabyte sized repos; as a small startup, a bench is all you ever get to test on. A terabyte to an exabyte is a million-fold jump (10⁶), and nobody hands you a dress rehearsal. The bet was that the principles in that write-up would hold a million times past anything we could afford to measure. They did. Watching Xet carry Hugging Face to an exabyte without a glitch — a call made on a benchmark, vindicated a million-fold on the infrastructure the ML world now runs on — is one of the highlights of my career.

That's the first 10%. Not the code — the call. And it's the kind of call that comes from building systems at scale before, not from having read about it. As Yucheng put it yesterday: even in the world of Claude, performance is still hard. The model has read every paper on content-defined chunking. It hasn't internalized the battle scars yet.

Same as it ever was

Werner Vogels argued this week that the rules are being rewritten, that "writing is no longer the only way to make an idea tangible," that the old write-then-discuss-then-build sequence should invert into build-then-test-then-write, because "you will learn more in one evening of building than in two weeks of writing about what you think will happen." He's right, and we've been living it at SageOx since December 2025. We prototype first; the doc comes after.

But under all the reinvention, two things didn't move, and they're the two ends of the barbell. The first 10% still wants phronesis: the foundational judgment that the whole build inherits, earned the slow way, in particular failures. The last 1% still wants elbow grease: a human in the room with a stubborn device and a chaos SSID, refusing to quit at minute 40.

The middle melted, and that's the headline everyone's writing. The quieter truth is that when the middle gets cheap, the ends don't get cheaper too: they get more valuable. The ends are still ours.

Appendix: Lore and Xet, side by side

A word on where this came from, and a caveat. I went deep on Lore because I'm going to use it. We're adopting it inside SageOx. Everything above is me revisiting choices we made in the early days of XetHub in the light of Lore's recently open sourced implementation. Not a scorecard. Lore is a genuinely good piece of work, well-shaped for the problem it set out to solve. Any mistake in my understanding of its internals is mine, and I'd welcome the correction.

It helped me to line the two vocabularies up. Same ideas, different names:

ConceptXetLore
Content-defined dedup unitchunk (8 / 64 / 128 KiB)fragment (32 / 64 / 256 KiB)
A file's upper tree / identityCDMT (content-defined Merkle tree)fragment list (fixed-threshold, size-split)
Reconstruction + dedup indexshard (in S3, ~48 B/chunk; a pointer row in DynamoDB)DynamoDB rows (fragments + metadata)

Lore keeps its index in DynamoDB, one row per fragment. Because Lore is now open source, you can read it straight from the source. Each fragment is a single row: the FragmentsEntry item carries the fragment hash and its repository-context, keyed by partition key hash, sort key repository_context, one entry per fragment, per repository-context. The payload bytes never land in that table: the write path puts them in S3 under the same hash, so the DynamoDB table is a pure index. That's the "Bookkeeping at scale" section with receipts: one index row for every fragment you store.


If you're building infrastructure at an incredible pace, I'd love to compare notes on the joy and the headaches of working in this new mode: the design calls you can only make on a bench, and the 3am bugs you can only catch by hand. Come see how we're reinventing product development at our Friday 10am standup at SageOx HQ, AI House, Pier 70; a few of those sessions are up at sageox.ai/blog. I hope I'll have the pleasure of chatting in person real soon: hi@sageox.ai.