Menu
Sign In Search Podcasts Charts Entities Add Podcast API Pricing

Richard Sutton

👤 Person
505 appearances

Podcast Appearances

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, yes, I think it's really quite a different point of view.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it can easily get separated and lose the ability to talk to each other.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And yeah, large language models have become such a big thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Generative AI in general, a big thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And our field is subject to bandwagons and fashions.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we lose track of the basic, basic things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because I consider reinforcement learning to be basic AI.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And what is intelligence?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The problem is to understand your world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And reinforcement learning is about understanding your world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Whereas large language models are about mimicking people, doing what people say you should do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They're not about figuring out what to do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I would disagree with most of the things you just said.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Just to mimic what people say is not really to build a model of the world at all, I don't think.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You know, you're mimicking things that have a model of the world, the people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But I don't want to approach the question in an adversarial way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But I would question the idea that they have a world model.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So a world model would enable you to predict what would happen.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They have the ability to predict what a person would say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They don't have the ability to predict what will happen.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What we want, I think, to quote Alan Turing, what we want is a machine that can learn from experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Where experience is the things that actually happen in your life.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You do things, you see what happens, and that's what you learn from.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The large language models learn from something else.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They learn from here's a situation and here's what a person did.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And implicitly, the suggestion is you should do what the person did.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

No, I agree that it's the large language model perspective.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I don't think it's a good perspective.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, curious why.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So, to be a prior for something, there has to be a real thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I mean, a prior bit of knowledge should be the basis for actual knowledge.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What is actual knowledge?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no definition of actual knowledge in that large language framework.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What makes an action a good action to take?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You recognize the value, the need for continual learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So if you need to learn continually, continually means learning during normal interaction with the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so then there must be some way during the normal interaction to tell what's right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yep.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Is there any way for it to tell, in the largest language model set up, to tell what's the right thing to say?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You will say something and you will not get feedback about what the right thing to say is because there's no definition of what the right thing to say is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if there's no goal, then there's one thing to say, another thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So there's no ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can't have prior knowledge if you don't have ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because the prior knowledge is supposed to be a hint or an initial belief about what the truth is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there isn't any truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no right thing to say.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, in reinforcement learning, there is a right thing to say or a right thing to do because the right thing to do is the thing that gets you reward.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we have a definition of what the right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so we can have prior knowledge or knowledge provided by people about what the right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then we can check it to see because we have a definition of what the actual right thing to do is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Now, an even simpler case is when you're trying to make a model of the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

When you predict what will happen, you predict and then you see what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so there's ground truth.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no ground truth in large language models because you don't have a prediction about what will happen next.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If you say something in your conversation, the large language models have no prediction about what the person will say in response to that or what the response will be.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Oh, no, they will respond to that question right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But they have no prediction in the substantive sense that they won't be surprised by what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if something happens that isn't what you might say they predicted, they will not change because an unexpected thing has happened.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And to learn that, they'd have to make an adjustment.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I'm just saying they don't have, in any meaningful sense, they don't have a prediction of what will happen next.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They will not be surprised by what happens next.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They'll not make any changes if something happens based on what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's not what the world will give them in response to what they do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Let's go back to their lack of goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

For me, having a goal is the essence of intelligence.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Something is intelligent if it can achieve goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I like John McCarthy's definition that intelligence is the computational part of the ability to achieve goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So you have to have goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You're just a behaving system.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You're not anything special.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You're not intelligent.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And you agree that large language models don't have goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think they have a goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What's the goal?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Next second prediction.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's not a goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It doesn't change the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You know, tokens come at you, and if you predict them, you don't influence them.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, it's not a goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's not a substantive goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can't look at a system and say, oh, it has a goal if it's just sitting there predicting and being happy with itself that it's predicting accurately.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, the math problems are different.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Making a model of the physical world and carrying out the consequences of mathematical assumptions or operations, those are very different things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The empirical world has to be learned.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You have to learn the consequences.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Whereas the math is more just computational.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's more like standard planning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So there they can have a goal to find the proof.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And they are in some way given that goal to find the proof.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's an interesting question whether large language models are a case of the bitter lesson.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because they are clearly a way of using massive computation, things that will scale with computation up to the limits of the internet.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But they're also a way of putting in lots of

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

knowledge.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so this is an interesting question.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's a sociological or industry question.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Will they reach the limits of the data and be superseded by things that can get more data just from experience rather than from

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

from people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

In some ways, it's a classic case of the bitter lesson.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The more human knowledge we put into the large language models, the better they can do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so it feels good.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And yet, one, well, I in particular expect there to be systems that can learn from experience, which could well perform much, much better and be much more scalable, in which case it will be another instance of the bitter lesson that the things that used human knowledge were eventually superseded by things that just trained from experience and computation.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, in every case of the bitter lesson, you know, you could start with human knowledge.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then do the scalable things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's always the case.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And there's never any reason why that has to be bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But in fact, and in practice, it has always turned out to be bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because people get locked into the human knowledge approach and they psychologically, or, you know, now I'm speculating why it is, but this is what has always happened.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That, yeah, they get, their lunch gets eaten by the methods that are truly scalable.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, give me a sense of what the scalable method is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The scalable method is you learn from experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You try things, you see what works.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

No one has to tell you.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

First of all, you have a goal.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So without a goal, there's no sense of right or wrong or better or worse.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So large language models are trying to get by without having a goal or a sense of better or worse.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's just, you know, it's exactly starting in the wrong place.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

How old are these kids?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's surprising.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can have such a different point of view.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

When I see kids, I see kids just trying things and waving their hands around and moving their eyes around.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And no one tells them... There's no imitation for how they move their eyes around or even the sounds they make.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They may want to create the same sounds, but the actions, the thing that the...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The large language model is learning from training data.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's not learning from experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's learning from something that will never be available during its normal life.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's never any training data that says you should do this action in normal life.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, I shouldn't have said never.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But I don't know.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think I would even say it about school.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But formal schooling is the exception.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Don't be difficult.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I mean, this is obvious.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I don't think learning is really about training.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think learning is about learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's about an active process.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The child tries things and sees what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, it does not.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't think about training when we think of an infant growing up.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

These things are actually rather well understood.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If you go to look about how psychologists think about learning, there's nothing like imitation.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe there are some extreme cases where humans might do that or appear to do that, but there is no basic animal learning process called imitation.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The basic animal learning process is for prediction and for trial and error control.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I mean, it's really interesting how sometimes the most hardest things to see are the obvious ones.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's obvious if you just look at animals and how they learn and you look at psychology and how our theories of them, it's obvious that supervised learning is not part of the way animals learn.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't have examples of desired behavior.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What we have is examples of things that happened, one thing that followed another.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And we have examples of we did something and there were consequences.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there are no examples of supervised learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Supervised learning is not something that happens in nature.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And, you know...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

School, even if that was the case, we should forget about it because that's some special thing that happens in people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It doesn't happen broadly in nature.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Squirrels don't go to school.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Squirrels can learn all about the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's absolutely obvious, I would say, that supervised learning doesn't happen in animals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Why are you trying to distinguish humans?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Humans are animals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What we have in common is more interesting.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What we have, what distinguishes us, we should be paying less attention to.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I like the way you consider that obvious because I consider the opposite obvious.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, I think we have to understand how we are animals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if we understood a squirrel, I think we'd be almost all the way there.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's understanding human intelligence.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The language part is just a small veneer on the surface.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so this is great.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You know, we're finding out the very different ways that we're thinking.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We're not arguing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We're trying to share our different ways of thinking with each other.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

No, I think about it the same way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But still, it's a small thing on top of basic trial and error learning, prediction learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that's what distinguishes us, perhaps, from many animals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But we're an animal first.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And we were an animal before we had language and all those other things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Morphics.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Let's lay out a little bit about what it is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It says that experience, action, sensation, well, sensation, action, reward, and then this happens on and on and on, makes more life.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It says that this is the foundation and the focus of intelligence.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Intelligence is about taking that stream and altering the actions to increase the rewards in the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So learning then is from the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

and learning is about the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So that second part is particularly telling.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What you learn, your knowledge, your knowledge is about the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Your knowledge is about if you do some action, what will happen?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Or it's about which events will follow other events.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's about the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's the content of the knowledge is statements about the stream.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so because it's a statement about the stream, you can test it by comparing it to the stream and you can learn it continually.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So when you're imagining this future continual learning agent.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They're not future.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Of course, they exist all the time.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

This is what reinforcement learning paradigm is, learning from experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The reward function is arbitrary.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so if you're playing chess, it's to win the game of chess.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If you're a squirrel, maybe the reward has to do with getting nuts.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

In general, for an animal, you would say the reward is to avoid pain and to acquire pleasure.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And there's also would be a component having to do with, I think there should be a component having to do with your increasing understanding of your environment.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That would be sort of an intrinsic motivation.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I don't like the word model when used the way you just did.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think a better word would be the network.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I think you mean the network.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe there's many networks.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So anyway, things would be learned and then you'd have copies and many instances.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And sure, you'd want to share knowledge across all.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

the instances.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And there would be lots of possibilities for doing that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Like there is not today.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can't have one child grow up and learn about the world and then every new child has to repeat that process.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Whereas with AIs, with the digital intelligence, you could hope to do it once and then copy it into the next one as a starting place.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So this would be a huge savings and I think actually it would be much more important than trying to learn from people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So this is something we know very well, and the basis of it is temporal difference learning, where the same thing happens in a less grandiose scale, like when you learn to play chess.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The long-term goal is winning the game, and yet you want to be able to learn from shorter-term things, like taking your opponent's pieces.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so you do that by having a value function, which predicts the long-term outcome.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then if you take the guy's pieces, well, your prediction about the long-term outcome is changed.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It goes up.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You think you're going to win.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then that increase in your belief changes.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

immediately quote reinforces the uh the move that led to taking the piece okay so we have this long-term 10-year goal of making a startup and making a lot of money and so when we make progress we say oh i'm i'm i'm more likely to uh achieve the long-term goal and that rewards the the steps along the way

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think the crux of this, and I'm not sure, but...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The big world hypothesis seems very relevant, and the reason why humans become useful on their job is because they are encountering the particular part of the world, and it can't have been anticipated, and it can't all have been put in in advance.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The world is so huge that you can't... The dream, as I see it, the dream of large language models is you can teach the agent everything and it will know everything and it won't have to learn anything online.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

right during its life okay and and your examples are all well really you have to because you can there's a lot to you can teach it but there's all little idiosyncrasies of the particular life they're leading and the the particular people they're working with and what they like as opposed to what average people like right and so that's just saying the world is really big and so you're going to have to learn it uh along the way

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I'm- So I would say you're just doing regular learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe using context, because in large language models, all that information has to go into the context window.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But in a continual learning setup, it just goes into the weights.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe, yeah, so maybe context is the wrong word to use, because I mean a more general thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You learn a policy that's specific to the environment that you're finding yourself in.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So maybe we're trying to ask the question of, it seems like the reward is too small of a thing to do all the learning that we need to do.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But, of course, we have the sensations, right?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We have all the other information we can learn from.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't just learn from the reward.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We learn from all the data.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So now I want to talk about the base common model of the agent with the four parts.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we need a policy.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The policy says...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

In the situation I'm in, what should I do?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We need a value function.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The value function is the thing that is learned with TD learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And the value function produces a number.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The number says, how well is it going?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then you watch if that's going up and down and use that to adjust your policy.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay, so those two things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then there's also the perception component, which is the construction of your state representation, your sense of where you are now.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And the fourth one is what we're really getting at, most transparently anyway.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The fourth one is the transition model of the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's why I am uncomfortable just calling everything models, because I want to talk about the model of the world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

the transition model of the world your belief that if you do this what will happen what will be the consequences of what you do so your physics of the world but it's all it's not just physics it's also um abstract models like you know your model of how you traveled um from california up to edmonton for this podcast that was a model and that's a transition model and that would be uh learned and it's not learned from reward it's learned from you did things you saw what happened yeah you made that model of the world

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That will be learned very richly from all the sensation that you receive, not just from the reward.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It has to include the reward as well, but that's a small part of the whole model, small crucial part of the whole model.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The idea is totally general.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I do use all the time, as my canonical example, the idea of an AI agent is like a person.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And people, in some sense, they have just one world they live in.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that world may involve chess and it may involve Atari games.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But those are not a different task or a different world.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Those are different states that they encounter.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so the general idea is not limited at all.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They just set it up.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It was not their ambition to have one agent across those games.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If we want to talk about transfer, we should talk about transfer, not across games or across tasks, but transfer between states.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We're not seeing transfer anywhere.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We're not seeing general... Critical to good performance is that you can generalize well from one state to another state.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't have any methods that are good at that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What we have are people try different things and they settle on something that a representation that transfers well or that generalizes well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But we don't have any automated techniques to promote.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We have very few automated techniques to promote transfer.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And none of them are used in modern deep learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The researchers did it.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because there's no other explanation.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Gradient descent will not make you generalize well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It will make you solve the problem.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It will not make you get new data.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

you generalize in a good way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Generalization means train on one thing that affects what you do on the other things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So we know deep learning is really bad at this.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

For example, we know that if you train on some new thing, it will often catastrophically interfere with all the old things that you knew.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So this is exactly bad generalization.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Generalization, as I said, is some kind of

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

influence of training on one state on other states.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And generalization is not necessarily good or bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Just the fact that you generalize is not necessarily good or bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You can generalize poorly, you can generalize well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So generalization always will happen, but we need algorithms that will cause the generalization to be good rather than bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, large language models, so complex.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We don't really know what information they had prior.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We have to guess because they've been fed so much.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

This is one reason why they're not a good way to do science.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's just so uncontrolled, so unknown.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But if you come up with an entirely new... They're getting a bunch of things right, perhaps.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so the question is why?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, it may be that they don't need to generalize to get them right because the only way to get some of them right is to form something which gets all of them right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So if there's only one answer and you find it, that's not called generalization.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's the only way to solve it, and so they find the only way to solve it.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Generalization is when it could be this way, it could be that way, and they do it the good way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, there's nothing in them which will cause it to generalize well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Creating dissent will cause them to find a solution to the problems they've seen.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if there's only one way to solve them, they'll do that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there are many ways to solve it, some which generalize well, some which generalize poorly.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's nothing in them, in the algorithms, that will cause them to generalize well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But people, of course, are involved.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if it's not working out, they fiddle with it.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

until they find a way, perhaps until they find a way which it generalizes well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So, yeah, I...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

thought a little bit about this.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There are many things, or a handful of things.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

First, the large language models are surprising.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's surprising how effective artificial neural networks are at language tasks.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That was a surprise.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It wasn't expected.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Language seemed different.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So that's impressive.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's a long-standing controversy in AI about simple basic principle methods, the general-purpose methods like search and learning, compared to human-enabled systems like symbolic methods.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So in the old days, it was interesting because things like search and learning were called weak methods because they just use general principles.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They're not using the power that comes from imbuing a system with human knowledge.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So those were called strong and weak.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so I think the weak methods have just totally won.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's the biggest question from the old days of AI.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What would happen?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Learning and search have just won the day.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But there's a sense which that was not surprising to me because I was always voting for or hoping or rooting for the simple basic principles.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so even with the large language models, it's surprising how well it worked, but it was all good and gratifying.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And things like AlphaGo, it's sort of surprising how well that was able to work.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And AlphaZero in particular, how well it was able to work.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But it's all very gratifying because, again, it's simple basic principles are winning the day.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So the whole AlphaGo thing has a precursor, which is TD Gammon.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Jerry Tesoro did exactly that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

reinforcement learning, temporal difference learning methods to play backgammon.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it beat the world's best players.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it worked really well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so in some sense, AlphaGo was merely a scaling up of that process.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It was quite a bit of scaling up, and there was also an additional innovation in how the search was done.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But it made sense.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It wasn't surprising in that sense.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

AlphaGo actually didn't use TD Learning.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It waited to see the final outcomes.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But AlphaZero used TD and AlphaZero was applied to all the other games and did extremely well.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I've always been very impressed by the way AlphaZero plays chess because I'm a chess player and it just

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It just sacrifices material for sort of positional advantages.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it's just content and patient to sacrifice that material for a long period of time.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so that was surprising that it worked so well, but also gratifying and fitting into my worldview.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So this has led me where I am.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Where I am is I'm in some sense a contrarian or thinking differently from the field is.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I am personally just kind of content being out of sync with my field for a long period of time, perhaps decades, because occasionally I have improved right in the past.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And the other thing I do to help me not feel I'm out of sync and thinking in a strange way is to look not at my local environment or my local field, but to look back in time and into history and to see what people have thought classically about the mind in many different fields.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I don't feel I'm out of sync with the larger traditions.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I really view myself as a classicist rather than as a contrarian.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I go to what the larger community of thinkers about the mind have always thought.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You want to presume that it's been done.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, but you're using it to get AGI again.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So these AGIs, if they're not superhuman already, then the knowledge that they might impart would be not superhuman.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I'm not sure your idea makes sense because it seems to presume the existence of AGI.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then we've already worked that out.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And the way AlphaZero was an improvement was it did not use the human knowledge, but just went from experience.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So why do you say bring in other agents' expertise to teach it when it's worked so well from experience and not by help from another agent?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think more interesting is just think about that case.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Which when you have many AIs, will they help each other?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

the way cultural evolution works in people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Let's just, maybe we should talk about that.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The bitter lesson, oh, who cares about that?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

That's an empirical observation about a particular period in history.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

70 years in history, no longer, doesn't necessarily have to apply the next 70 years.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So the interesting question is, you're an AI, you get some more computer power.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Should you use it to make yourself more computationally capable?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Or should you use it to spawn off a copy of yourself to go learn something interesting on the other side of the planet or on some other topic and then report back to you?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yep.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think that's a really interesting question that will only arise in the age of digital intelligences.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I'm not sure what the answer is, but I think it will... More questions.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Will it be possible to really spawn it off, send it out, learn something new, something perhaps very new, and then will it be able to be reincorporated into the original?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Or will it have changed so much that...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It can't really be done.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Is that possible or is it not?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And you can carry this to its limit, as I saw one of your videos the other night that suggested that it could, where you spawn off many, many copies, do different things, highly decentralized, but report back to the central master.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that this will be such a powerful thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, I think one thing that, so this is my attempt to add something to this view, is that a big question, a big issue will become corruption.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You know, if you really could just get information from anywhere and bring it into your central mind, you could become more and more powerful.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it's all digital and they all speak some internal digital language.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe it'll be easy and possible, but...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

it will not be that easy, as easy as you're imagining, because you can lose your mind this way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If you pull in something from the outside and build it into your inner thinking, it could take over you.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It could change you.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It could be your destruction rather than your increment in knowledge.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think this will become a big concern, particularly when you're, oh, he's figured all about how to play some new game or figured out he studied Indonesia and you want to incorporate that into your mind.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So you think, oh, just read it all in.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that'll be fine.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

But no, you've just read a whole bunch of bits into your mind.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And they could have viruses in them.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They could have hidden goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They can warp you and change you.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And this will become a big thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

How do you have cybersecurity in the age of digital spawning and reforming again?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, so I do think succession to digital or digital intelligence or augmented humans is inevitable.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So the argument, I have a four part argument.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Step one is,

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

there's no government or organization that gives humanity a unified point of view that dominates and that can arrange.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

There's no consensus about how the world should be run.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And number two, we will figure out how intelligence works.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Researchers will figure it out eventually.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And number three, we won't stop

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Just with human-level intelligence, we will reach superintelligence.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And number four is that it's inevitable over time that the most intelligent things around would gain resources and power.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And, uh, so put all that together, it's, you know, you, um, it's sort of inevitable that you're going to have, um, succession to AI or to AI enabled augmented humans.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So within those, those four things seem clear and, and, and sure to happen.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Uh, but within that set of possibilities, some, there can be good outcomes as well as less good outcomes, bad outcomes.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And, um,

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I'm just trying to be realistic about where we are and ask how we should feel about it.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so then I do encourage people to think positively about it, first of all, because it's something we humans have always tried to do for thousands of years, tried to understand themselves, trying to make themselves think better.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

you know, just understand themselves.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So this is a great success as science, humanities.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We're finding out what this essential part of humanness is, what it means to be intelligent.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And then what I usually say is that this is all kind of human-centric.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

What if we look, you step aside from being a human and just say, take the point of view of the universe.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And this is, I think, a major stage in the universe, a major transition, a transition from replicators, humans and animals,

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

plants we're all replicators and that gives us some strengths and some limitations and then we're entering the age of design where because our ai's are designed our our our all of our physical objects are designed our buildings are designed our our technology is designed and we're we're designing now uh ai's things that can be intelligent themselves and that are themselves capable of design and so this is this is a key step in the world and in the universe and i think

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So it's the transition from the world in which most of the interesting things that are, are replicated.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Replicated means you can make copies of them, but you don't really understand them.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Like right now we can make more intelligent beings, more children, but we don't really understand how intelligence works.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Whereas we're reaching now to having design intelligence, intelligence that we do understand how it works, and therefore we can change it in different ways and at different speeds than otherwise.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And our future, they might not be replicated at all.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We may just design AIs, and those AIs will design other AIs, and everything will be done by design and construction rather than by replication.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, I mark this as one of the four great stages of the universe.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

First there's dust, ends of stars, and then stars make planets, and the planets give rise to life, and now we're giving life to designed entities.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so I think we should be proud that we are giving rise to this great transition in the universe.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Yeah, so it's an interesting thing.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Should we consider them part of humanity or different from humanity?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's our choice.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's our choice whether we say, oh, they are our offspring and we should be proud of them and we should celebrate their achievements.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Or we could say, oh, no, they're not us and we should be horrified.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

It's interesting that it feels to me like a choice, and yet it's such a strongly held thing that how can we be a choice?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I like these sort of contradictory implications of thought.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So are you thinking like maybe we are like the Neanderthals who give rise to Homo sapiens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe Homo sapiens will give rise to a new group of people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Well, I think it's relevant to point out that for most of humanity, they don't have much influence on what happens.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Most of humanity doesn't influence who can control the atom bomb.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

or who controls the nation states.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Even as a citizen, I often feel that we don't control the nation states very much.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

They're out of control.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

A lot of it has to do with just how you feel about change.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And if you think the current situation is really, really good, then you're more likely to be suspicious of change and averse to change than if you think...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

it's imperfect.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I think it's imperfect.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

In fact, I think it's pretty bad.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I'm open to change.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I think humanity has had a super good track record.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And maybe it's the best thing that there's been, but it's far from perfect.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We should be concerned about our future, the future.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We should try to make it good.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We also, though, should recognize the limits, our limits.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And I think we want to avoid the feeling of entitlement, avoid the feeling, oh, we're here first.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We should always have it in a good way.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

How should we think about the future and how much control a particular species on a particular planet should have over it?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And how much control do we have?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

You know, a counterbalance to our limited control over the long-term future of humanity should be how much control do we have over our own lives?

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Like we have our own goals and we have our families and those things are much more controllable than like trying to control the whole universe.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So I think it's appropriate for us to really work towards our own local goals.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And it's kind of aggressive for us saying, oh, the future has to evolve this way that I want it to.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Sure.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Because then we'll have arguments.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Different people think the future, the global future should evolve in different ways.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And they have conflict.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

So you're saying we're trying to design the future and the principles by which it will evolve and come into being.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Right.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so you're saying the first thing you're saying is, well, we will, we try to teach our children general principles which will promote

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

more likely evolutions.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Maybe we should also seek for things being voluntary.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

If there is change, we want it to be voluntary rather than imposed on people.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think that's a very important point.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And yeah, that's all good.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

I think this is like a big, you know, the big...

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

the big or one of the really big human enterprises to design society.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And that's been ongoing for thousands of years again.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

And so it's like the more things change, really the more things, they stay the same.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

We still have to figure out how to be.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

The children will still come up with different values that seem strange to their parents and their grandparents and things will evolve.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Okay.

Dwarkesh Podcast
Richard Sutton – Father of RL thinks LLMs are a dead-end

Thank you very much.