A few weeks ago we wanted a tool that could chew through long YouTube videos and tell us if they were any good. Not "good" in a numeric way — good against a rubric we wrote: is the host actually answering the question, are the claims sourced, is the pacing alive at minute thirty.
We didn't want to spend a month on it. So we gave the brief to Randal — the in-house AI agent we've been building — and asked it to come back with something we could click on by morning.
The brief
The whole thing fit in a paragraph. We wanted:
- A web app that takes a YouTube URL (channel or video).
- A user-editable rubric. Free-text criteria; weight per row.
- A scored report, with timestamps and quotes that back up the score.
- Friendly. No dashboards-of-dashboards.
If we can't describe what it does in one sentence, it isn't finished.
— Hassion Studio, manifesto, principle 01
We handed the brief to Randal around 11pm with one extra instruction: ship a working URL by 9am, and write up the parts you weren't sure about.
The overnight run
By morning the inbox had a Loom, a deploy URL, and a markdown file titled RANDAL_NOTES.md. The first version did this:
$ eaglet ingest "https://www.youtube.com/@some-channel"
→ pulling video list [ok] 24 videos
→ fetching transcripts [ok] 24/24
→ scoring against ./default.rubric [warn] 3 timeouts (long videos)
→ building report [ok]
→ deployed to https://eaglet.video [ok]
build #047 · 6h 12m · cost: ~$2.40
It wasn't pretty — the rubric editor was a textarea, the report was one long page, and the navigation was, generously, "scroll." But it worked. We pasted in a channel we already had strong opinions about and the scores roughly matched our opinions, which felt suspicious enough to be promising.
Embeds work too — paste a YouTube or Vimeo URL on its own line and it'll auto-embed. Otherwise, drop in any <iframe> you like.
What worked
- Giving Randal a clear "done." The brief said "ship a working URL by 9am", not "design the perfect tool." The constraint did most of the work.
- Letting it choose the stack. We didn't pre-decide framework, hosting, or schema. Randal picked things it had used before and that it could keep iterating on.
- Notes-as-you-go. The
RANDAL_NOTES.mdwas honestly the most valuable artifact. It flagged every place it had guessed, which is also exactly the list of places we wanted to look at first.
A small table of the first runs
| Run | Length | Issues | Cost |
|---|---|---|---|
| 001 | 6h 12m | 3 timeouts | $2.40 |
| 002 | 4h 03m | rubric drift | $1.80 |
| 003 | 2h 18m | none worth flagging | $0.95 |
What broke
Plenty. Long videos timed out. The rubric language was too loose — two reasonable people could read the same row and weight it differently. The first design pass leaned harder into "dashboard" than we wanted. We re-briefed, re-ran, and the second build came in friendlier and smaller.
The worst thing the tool can be is a dashboard. The best thing it can be is a friend who watched the video for you and remembered the good bits.
What's next
Eaglet is in beta at eaglet.video. We're writing more rubrics, smoothing the long-video case, and adding shareable reports. Randal is rebuilding the underlying extraction pipeline this week.
The bigger thing this experiment is teaching us is how to brief Randal. The studio's shape is changing because of it — and we'll write that one up next.