
AI Explained Official Podcast
Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.
Episodes
30 episodes
When Will AI Models Blackmail You, and Why?
In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?Thanks to Storyblo...
•
Season 2
•
Episode 21
•
26:19

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know
What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next. Thanks to Storyblocks for sponsoring this video! Download unlimit...
•
Season 2
•
Episode 20
•
14:00
.png)
AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed
There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plu...
•
Season 2
•
Episode 19
•
16:41
.png)
Claude 4: Full 120 Page Breakdown … Is it the Best New Model?
Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverag...
•
Season 2
•
Episode 18
•
19:04
.png)
Google Takes No Prisoners Amid Torrent of AI Announcements
Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is p...
•
Season 2
•
Episode 17
•
17:07
.png)
AI Improves at Self-improving
AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two pr...
•
Season 2
•
Episode 16
•
17:41
.png)
o3 breaks (some) records, but AI becomes pay-to-win
A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.https://app....
•
Season 2
•
Episode 15
•
14:33
.png)
o3 and o4-mini - they’re great, but easy to over-hype
Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but ...
•
Season 2
•
Episode 14
•
14:24
.png)
‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed
This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are i...
•
Season 2
•
Episode 13
•
20:09
.png)
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...
The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and...
•
Season 2
•
Episode 12
•
23:51

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)
Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘con...
•
Season 2
•
Episode 11
•
21:21
.png)
Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI
Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indi...
•
Season 2
•
Episode 10
•
13:47
.png)
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)
Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs...
•
Season 2
•
Episode 9
•
12:58

GPT 4.5 - not so much wow
GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4...
•
Season 2
•
Episode 8
•
25:05
.png)
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple...
•
Season 2
•
Episode 7
•
27:39

AGI: (gets close), Humans: ‘Who Gets the Money?’
A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will pl...
•
Season 2
•
Episode 6
•
22:17

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark...
•
Season 2
•
Episode 5
•
18:32

o3-mini and the “AI War”
o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a ...
•
Season 2
•
Episode 4
•
15:21
.png)
Nothing Much Happens in AI, Then Everything Does All At Once
When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deeps...
•
Season 2
•
Episode 3
•
23:09
.png)
Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to gi...
•
Season 2
•
Episode 2
•
13:11
.png)
OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward
Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models t...
•
Season 2
•
Episode 1
•
23:41

o3 - wow
o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the...
•
Season 1
•
Episode 9
•
22:20
.png)
Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Vo...
•
Season 1
•
Episode 8
•
13:40

Sora is Out, But is it a Distraction?
After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system ...
•
Season 1
•
Episode 7
•
15:34

o1 Pro Mode – Full Analysis (plus o1 paper highlights)
Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. ...
•
Season 1
•
Episode 6
•
16:43
.png)