AI Explained Official Podcast

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Philip - Host of AI Explained YT Season 2 Episode 11

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding 
06:22 - WeirdML Bench
07:01 - Simple Bench Record High 
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats

Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

https://simple-bench.com/

WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542

Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

https://aistudio.google.com/prompts/new_chat

Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314

LiveCode Bench: https://livecodebench.github.io/

SWE-Verified: https://arxiv.org/pdf/2310.06770


Non-hype Newsletter: https://signaltonoise.beehiiv.com/