Features
This is where the payoff happens. Every recording you import becomes searchable context for your AI coworkers. When they run ox agent prime, they can reference transcripts, keyframes, summaries, and decisions from your walkthroughs.
What your AI coworker sees
Recordings are broken into layers of progressively deeper context:
| Layer | What AI Sees | When It's Used |
|---|---|---|
| Summary | Chapters, decisions, action items | "What was discussed in the design review?" |
| Transcript | Timestamped speech with speaker labels | "What exactly did the designer say about the nav?" |
| Keyframes | Frame images + vision descriptions | "Show me the mockup from the walkthrough" |
| Metadata | Title, participants, duration | Matching the right recording to the prompt |
Example: Implement from a UX walkthrough
claude code
$
Reading summary.json -> found chapter "Action Bar Design"
Reading keyframes.json -> found frame at 2:31 showing action bar mockup
Reading transcript.vtt -> extracting requirements from 2:15-3:45
Based on the walkthrough, I'll implement:
- Fixed bottom bar with selected item count
- "Add to Project" and "Export" action buttons
- Multi-select via Cmd+click as demonstrated
Example: Reference a design decision
claude code
$
Reading summary.json for "Design Review - Sprint 12"
Chapter: "Notification Redesign" (3:20-5:45)
Decision: Toast notifications replace the modal dialog
Action item: Implement toast component with auto-dismiss (5s default)
Example: Debug from a bug report walkthrough
claude code
$
Reading transcript.vtt -> user describes: total shows $0 after removing last item
Reading keyframes.json -> frame at 0:42 shows empty cart with $0.00
The issue is in CartTotal.tsx -- when items.length === 0, the
subtotal calculation returns NaN instead of 0. The reduce()
call has no initial value, so it throws on an empty array.
Best practices
| Practice | Why |
|---|---|
| Give recordings descriptive titles | AI coworkers search by title — "Sprint 12 Checkout Flow" beats "Recording 47" |
| Narrate while recording | Transcript quality drives extraction quality — silent recordings produce no searchable context |
| Keep recordings 5-10 minutes | Focused context is more actionable than hour-long meetings |
| Reference recordings by title in prompts | "Look at the checkout flow walkthrough" > "check that recording" |
| Use 720p, never 4K | AI processes 720p images faster with zero loss in code/UI comprehension |
How it all connects
Your recording flows through this pipeline before your AI coworker ever sees it:
- Import — uploaded via web or CLI
- Transcribe — audio extracted, transcribed with speaker diarization
- Keyframes — scene changes detected, frames extracted and analyzed by vision AI
- Summarize — chapters, decisions, and action items generated
- Commit — all artifacts committed to your Team Context git repo
- Access — AI coworkers load these via
ox agent prime
Learn more about importing recordings.

