MoQ: SageOx's invisible infra
How my friend Alan's work unlocks real-time presence for human-agent teams

Ajit Banerjee
On August 27th, 1783, a crowd of Parisians gathered on the Champ de Mars to watch the first hydrogen balloon climb majestically into the sky. Among them was our beloved Benjamin Franklin. The story goes that someone in the crowd, unmoved by the sight of a wicker basket of wine-guzzling aeronauts drifting over Paris, asked what seemed at the time like a perfectly reasonable question — "What is it good for?" Franklin's reply has outlived the question:
"What good is a newborn babe?"
It has always been easy to be skeptical of a new discovery. The rarer and more valuable instinct is the other one: to look at an unformed idea and sense that it opens onto an especially fertile stretch of ground.
Media-over-QUIC is one of those newborns
The most important technology in any product is usually the part the customer never sees — built, years earlier, by someone with a completely different problem in mind. You don't get to plan for it. You get to notice it, in a conversation, and then move fast enough to do something with it before the moment passes.
On a weekend in late April, Alan Frindell spent an hour walking me through Media-over-QUIC — MoQ — the transport he'd helped shepherd through years of standardization. He wasn't pitching me a feature. He was just explaining a piece of infrastructure he found beautiful.
That newborn had a long gestation. When Alan told the story to our standup, he condensed years into a few minutes:
"My journey with media started with live video ingest protocols at Facebook. Live was very big at the time, and the internet protocols for ingesting live media were ancient, outdated — putting unnecessary requirements on the CDN. So we wrote our own media-over-QUIC-like protocol and deployed it between our first-party apps and our CDN."
That solved Facebook's problem but not the industry's. The high-end ingest market — "ABC wants to stream a presidential debate" — was locked to official standards. Alan's fix for that was characteristically unbothered:
"I tried to tell them they should make their protocol a standard. And they said, 'Oh, that'll take years.' And I said, 'Hold my beer.'"
In 2021 he took what they'd built on mobile to the standards community — and the interesting part wasn't that it got adopted, it was what it collided with:
"It touched a nerve with two other completely distinct groups interested in a very similar structure, but for different use cases."
One was the playback side — the Twitch crowd trying to dramatically lower live-streaming latency without HTTP Live Streaming. The other was the Ciscos of the world, who wanted an interoperable, standards-based protocol so they could get off running their own custom CDNs. Three use cases, three different worlds, converging on the same shape. That convergence is why it took so long — and why the result is better than what any one of them set out to build:
"It's pushed the group toward this multi-layer solution — one completely agnostic to the media you're transferring, much like HTTP itself, and then media-specific protocols layered on top. That's been really interesting for what kinds of new things can emerge using it."
"What kinds of new things can emerge." He was describing a transport. We heard a permission slip.
Oxy has something to say — and he wants to say it now
Getting it working was a joyful weekend project. Three very different clients ended up speaking to a single relay in real time — the SageOx browser, the Ox Dot, and Oxy himself, joining the meeting as an attendee — each publishing Opus to one relay that fanned the audio out to whoever was subscribed. That a browser, an embedded device, and a bot can all share one transport is exactly the media-agnostic, runs over anything quality Alan kept pointing at. Set against our old batch world — audio uploaded after the meeting, handed to AWS Transcribe, a transcript landing minutes or hours later — this was a different category of thing. The conversation was now live on the wire.
The moment those bytes started streaming into our SageOx Knowledge Bubbles, we realized Oxy didn't have to wait for the meeting to end to contribute. Ryan built a visual processing layer that turns the live conversation into real-time murals and graphical facilitations — so participants grasp the flow and the key points at a glance, without reading walls of text. Oxy's value had never really been the batch summary that shows up way after the fact; we'd just never had a transport fast enough to let him keep up. Now he could, drawing the topic straight back into the room as it was being discussed.
And drawing it, rather than writing it, was the point. Teresa Torres puts it bluntly: "Drawing is more specific than writing. Language is vague." When a cross-functional team — a product manager, a designer, an engineer — outlines a problem in words alone, they walk away certain they agree, while their mental models are actually vastly different. A shared picture is where that hidden disagreement surfaces.
The first mural started sparse — a question and a couple of anchors:

By the end it had composed itself into a full visual summary — Room-as-Document, Graphical Murals, and, at the dead center, Alan's Invisible Infrastructure:

Real-time presence is an infrastructure problem
The word for what changed is one we've been circling for a while. When Jessica Hagy and Apurva Luty sat in on a standup, Apurva — from her years at Discord watching communities live and die — named the substrate underneath all of it: "this hivemind you're describing is actually the function of what we call real-time presence." Insight that arrives while you're in the conversation — the decision as it's made, the action item as it's assigned, the mural redrawing as the topic turns — is a different thing from a transcript that shows up thirty minutes, or three days, later. It changes what a meeting is.
The frontier everyone's racing toward is multimodal — the work coming out of Thinking Machines and the broader research crowd, models that see and hear and respond across modalities. That's the glamorous half, and it's moving fast. It deserves the attention it gets.
But the mural got the deeper thing right: the models only get to be present if the infrastructure underneath them is present too. Multimodal intelligence at the speed of a conversation is a transport problem, a buffering problem, a durability problem, a fan-out problem — long before it is a model problem. The next unlock will come from some other piece of unglamorous, hard-won infrastructure that someone is quietly getting right today.
There's a deeper wrinkle, and Alan put his finger on it. The thing the MoQ working group keeps face-planting on, he told us, is how you join a live stream that's already running:
"Audio is easy — every segment is independent, so I can join at any time and, boom, I'm getting audio. Video is different, because of the compression scheme: you send an I-frame, then you delta-encode against it for a while, then you whack it and make a new I-frame. So the question is — I want to join, but when was the last I-frame?"
Every bit of that machinery — the jitter buffer, the keyframe cadence, the wait for a clean join — exists to hand a human a smooth picture they can only watch at 1×. But what if the one joining is Oxy? He doesn't need the jitter hidden. He can speed-read the whole conversation from the start at 40×. If a significant portion of attendees are going to be like Oxy, many of the foundational assumptions about removing jitter are suddenly invalid. These are exactly the problems infrastructure architects will be forced to revisit as they build transport layers for products like SageOx — the hivemind for human-agent teams.
So our bet is to keep investing in the layer most people step over: the transport, the relay, the receipt plane. Not because it's the exciting part — because it's the part that makes the exciting part possible. The intelligence will keep arriving on schedule. Our job is to make sure the infrastructure is already there to carry it — live.
Humans stay in flow. Agents stay helpful. Teams stay coherent.
Alan at SageOx HQ.
To Alan — thank you for the invisible infrastructure. It unlocked something you didn't plan for, which is exactly what the best infrastructure does. Seattle is dense right now with people who'll tell you the thing they can't fully explain yet — over champagne, at a dinner, in our office on a Friday morning. If that's you, come say the strange thing out loud with us: hi@sageox.ai.

