Listen to this chapter in the podcast from the beginning to learn more about full episode.

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

26 Sep 2025

Description

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end.After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter how much we scale, we’ll need some new architecture to enable continual learning.And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly — like all humans, and indeed, like all animals.This new paradigm will render our current approach with LLMs obsolete.In our interview, I did my best to represent the view that LLMs might function as the foundation on which experiential learning can happen… Some sparks flew.A big thanks to the Alberta Machine Intelligence Institute for inviting me up to Edmonton and for letting me use their studio and equipment.Enjoy!Watch on YouTube; listen on Apple Podcasts or Spotify.Sponsors* Labelbox makes it possible to train AI agents in hyperrealistic RL environments. With an experienced team of applied researchers and a massive network of subject-matter experts, Labelbox ensures your training reflects important, real-world nuance. Turn your demo projects into working systems at labelbox.com/dwarkesh* Gemini Deep Research is designed for thorough exploration of hard topics. For this episode, it helped me trace reinforcement learning from early policy gradients up to current-day methods, combining clear explanations with curated examples. Try it out yourself at gemini.google.com* Hudson River Trading doesn’t silo their teams. Instead, HRT researchers openly trade ideas and share strategy code in a mono-repo. This means you’re able to learn at incredible speed and your contributions have impact across the entire firm. Find open roles at hudsonrivertrading.com/dwarkeshTimestamps(00:00:00) – Are LLMs a dead-end?(00:13:04) – Do humans do imitation learning?(00:23:10) – The Era of Experience(00:33:39) – Current architectures generalize poorly out of distribution(00:41:29) – Surprises in the AI field(00:46:41) – Will The Bitter Lesson still apply post AGI?(00:53:48) – Succession to AIs Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Audio

Featured in this Episode

Dwarkesh Patel

Richard Sutton

Transcription

Full Episode

0.031 - 18.725 Dwarkesh Patel

Today, I'm chatting with Richard Sutton, who is one of the founding fathers of reinforcement learning and inventor of many of the main techniques used there, like TD learning and policy gradient methods. And for that, he received this year's Turing Award, which, if you don't know, is basically the Nobel Prize for Computer Science. Richard, congratulations. Thank you, Dvarkis.

19.126 - 33.086 Dwarkesh Patel

And thanks for coming on the podcast. It's my pleasure. Okay, so first question is, My audience and I are familiar with the LLM way of thinking about AI. Conceptually, what are we missing in terms of thinking about AI from the RL perspective?

34.048 - 62.099 Richard Sutton

Well, yes, I think it's really quite a different point of view. And it can easily get separated and lose the ability to talk to each other. And yeah, large language models have become such a big thing. Generative AI in general, a big thing. And our field is subject to bandwagons and fashions. So we lose track of the basic, basic things. Because I consider reinforcement learning to be basic AI.

62.299 - 79.195 Richard Sutton

And what is intelligence? The problem is to understand your world. And reinforcement learning is about understanding your world. Whereas large language models are about mimicking people, doing what people say you should do. They're not about figuring out what to do.

79.394 - 97.862 Dwarkesh Patel

Huh. I guess you would think that to emulate the trillions of tokens in the corpus of internet text, you would have to build a world model. In fact, these models do seem to have very robust world models, and they're the best world models we've made to date in AI, right? So what do you think that's missing?

98.664 - 100.887 Richard Sutton

I would disagree with most of the things you just said.

100.907 - 102.87 Dwarkesh Patel

Great.

102.85 - 125.038 Richard Sutton

Just to mimic what people say is not really to build a model of the world at all, I don't think. You know, you're mimicking things that have a model of the world, the people. But I don't want to approach the question in an adversarial way. But I would question the idea that they have a world model. So a world model would enable you to predict what would happen.

126.059 - 148.416 Richard Sutton

They have the ability to predict what a person would say. They don't have the ability to predict what will happen. What we want, I think, to quote Alan Turing, what we want is a machine that can learn from experience. Right. Where experience is the things that actually happen in your life. You do things, you see what happens, and that's what you learn from.

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

Full Episode

Want to see the complete chapter?

Login Required