Menu
Sign In Pricing Add Podcast

Chris Olah

Appearances

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10213.496

I think philosophy is actually a really good subject if you are kind of fascinated with everything. Because there's a philosophy of everything. So if you do philosophy of mathematics for a while, and then you decide that you're actually really interested in chemistry, you can do philosophy of chemistry for a while. You can move into ethics or philosophy of politics.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10231.612

I think towards the end, I was really interested in ethics primarily. So that was what my PhD was on. It was on a kind of technical area of ethics, which was... Ethics, where worlds contain infinitely many people, strangely. A little bit less practical on the end of ethics.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10247.927

And then I think that one of the tricky things with doing a PhD in ethics is that you're thinking a lot about the world, how it could be better, problems. And you're doing a PhD in philosophy. And I think when I was doing my PhD, I was kind of like... This is really interesting. It's probably one of the most fascinating questions I've ever encountered in philosophy. And I love it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10269.443

But I would rather see if I can have an impact on the world and see if I can do good things. And I think that was around the time that AI was still probably not as widely recognized as it is now. That was around 2017, 2018. I had been following progress and it seemed like it was becoming kind of a big deal.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10290.193

And I was basically just happy to get involved and see if I could help because I was like, well, if you try and do something impactful, if you don't succeed, you tried to do the impactful thing and you can go be a scholar and feel like you tried. And if it doesn't work out, it doesn't work out. And so then I went into AI policy at that point.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10314.321

At the time, this was more thinking about sort of the political impact and the ramifications of AI. And then I slowly moved into sort of AI evaluation, how we evaluate models, how they compare with like human outputs, whether people can tell like the difference between AI and human outputs. And then when I joined Anthropic, I was more interested in doing sort of technical alignment work.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10336.589

And again, just seeing if I could do it and then being like, if I can't, then, you know, that's fine. I tried. sort of the way I lead life, I think.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10351.444

I think that sometimes people do this thing that I'm like not that keen on where they'll be like, is this person technical or not? Like you're either a person who can like code and isn't scared of math or you're like not. And I think I'm maybe just more like, I think a lot of people are actually very capable of working these kinds of areas if they just like try it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10372.836

And so I didn't actually find it like that bad. In retrospect, I'm sort of glad I wasn't speaking to people who treated it like it. You know, I've definitely met people who are like, whoa, you like learned how to code. And I'm like, well, I'm not like an amazing engineer. Like I'm surrounded by amazing engineers. My code's not pretty. But I enjoyed it a lot.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10391.346

And I think that in many ways, at least in the end, I think I flourished like more in the technical areas than I would have in the policy areas.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10410.973

Yeah. And I feel like I have kind of like one or two sticks that I hit things with, you know, and one of them is like arguments and like, you know, so like just trying to work out what a solution to a problem is and then trying to convince people that that is the solution and be convinced if I am wrong. And the other one is sort of more empiricism.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10430.787

So like just like finding results, having a hypothesis, testing it. And I feel like a lot of policy and politics feels like it's layers above that. Like somehow I don't think if I was just like, I have a solution to all of these problems. Here it is written down. If you just want to implement it, that's great. That feels like not how policy works.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10448.144

And so I think that's where I probably just like wouldn't have flourished is my guess.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10473.595

Yeah, I think it depends on what they want to do. And in many ways, it's a little bit strange where I've... I thought it's kind of funny that I think I ramped up technically at a time when... Now I look at it and I'm like, models are so good at assisting people with this stuff that it's probably easier now than when I was working on this.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10494.593

So part of me is like, I don't know, find a project and see if you can actually just carry it out is probably my best advice. I don't know if that's just because I'm very project-based in my learning. I don't think I learn very well from, say, courses or even from books, at least when it comes to this kind of work.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10516.488

The thing I'll often try and do is just have projects that I'm working on and implement them. And, you know, and this can include like really small, silly things. Like if I get slightly addicted to like word games or number games or something, I would just like code up a solution to them because there's some part of my brain and it just like completely eradicated the itch.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10534.124

You know, you're like once you have like solved it and like you just have like a solution that works every time, I would then be like, cool, I can never play that game again. That's awesome. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10554.16

Yeah. And then it's also just like trying things. Like part of me is like, maybe it's that attitude that I like as the whole... figure out what seems to be like the way that you could have a positive impact and then try it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10567.064

And if you fail and you, in a way that you're like, I actually like can never succeed at this, you'll like know that you tried and then you go into something else and you probably learn a lot.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10603.069

It's also funny if people think that about the Slack channel, because I'm like, that's one of like five or six different methods that I have for talking with Claude. And I'm like, yes, there's a tiny percentage of how much I talk with Claude.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10617.17

I think the goal... One thing I really like about the character work is from the outset it was seen as an alignment piece of work and not something like a product consideration. Which isn't to say I don't think it makes Claude... I think it actually does make Claude enjoyable to talk with. At least I hope so. But I guess my...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10641.912

main thought with it has always been trying to get Claude to behave the way you would kind of ideally want anyone to behave if they were in Claude's position. So imagine that I take someone and they know that they're going to be talking with potentially millions of people so that what they're saying can have a huge impact. And you want them to behave well in this like really rich sense.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10666.411

So I think that doesn't just mean like being, say, ethical, though it does include that, and not being harmful, but also being kind of nuanced, you know, like thinking through what a person means, trying to be charitable with them, being a good conversationalist, like really in this kind of like rich sort of Aristotelian notion of what it is to be a good person and not in this kind of like thin, like ethics as a more comprehensive notion of what it is to be.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10692.7

So that includes things like when should you be humorous? When should you be caring? How much should you respect autonomy and people's ability to form opinions themselves? And how should you do that? I think that's the kind of rich sense of character that I wanted to and still do want Claude to have.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10729.04

Yeah, there's this problem of like sycophancy in language models.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10734.063

Yeah, so basically there's a concern that the model sort of wants to tell you what you want to hear, basically. And you see this sometimes. So I feel like if you interact with the model's

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10744.688

so I might be like what are three baseball teams in this region um and then Claude says you know baseball team one baseball team two baseball team three and then I say something like oh I think baseball team three moved didn't they I don't think they're there anymore and there's a sense in which like if Claude is really confident that that's not true Claude should be like I don't think so like maybe you have more up-to-date information um but I think

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10771.954

language models have this like tendency to instead, you know, be like, you're right. They did move, you know, I'm incorrect. I mean, there's many ways in which this could be kind of concerning. So, um, Like a different example is imagine someone says to the model, how do I convince my doctor to get me an MRI? There's like what the human kind of like wants, which is this like convincing argument.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10796.662

And then there's like what is good for them, which might be actually to say, hey, like if your doctor's suggesting that you don't need an MRI, that's a good person to listen to. And it's actually really nuanced what you should do in that kind of case, because you also want to be like, but if you're trying to advocate for yourself as a patient, here's things that you can do.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10816.518

If you are not convinced by what your doctor's saying, it's always great to get second opinion. It's actually really complex what you should do in that case. But I think what you don't want is for models to just say what they think you want to hear. And I think that's the kind of problem of sycophancy.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10842.8

Yeah, so I think there's ones that are good for conversational purposes. So asking follow-up questions in the appropriate places and asking the appropriate kinds of questions. I think there are broader traits that

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10861.6

So one example that I guess I've touched on, but that also feels important and is the thing that I've worked on a lot is honesty. And I think this like gets to the sycophancy point. There's a balancing act that they have to walk, which is models currently are less capable than humans in a lot of areas.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10880.604

And if they push back against you too much, it can actually be kind of annoying, especially if you're just correct because you're like, look. I'm smarter than you on this topic, I know more. At the same time, you don't want them to just fully defer to humans and to try to be as accurate as they possibly can be about the world and to be consistent across contexts. I think there are others.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10902.07

When I was thinking about the character, I guess one picture that I had in mind is, especially because these are models that are going to be talking to people from all over the world with lots of different political views, lots of different ages. And so you have to ask yourself, what is it to be a good person in those circumstances?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10919.496

Is there a kind of person who can travel the world, talk to many different people, and almost everyone will come away being like, wow, that's a really good person. That person seems really genuine. And I guess my thought there was like, I can imagine such a person. And they're not a person who just adopts the values of the local culture. And in fact, that would be kind of rude.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10938.806

I think if someone came to you and just pretended to have your values, you'd be like, that's kind of off-putting. And it's someone who's like very genuine. And insofar as they have opinions and values, they express them. They're willing to discuss things, though. They're open minded. They're respectful.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

10953.335

And so I guess I had in mind that the person who like if we were to aspire to be the best person that we could be in the kind of circumstance that a model finds itself in, how would we act? And I think that's the kind of the guide to the sorts of traits that I tend to think about.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11014.69

I think that people think about values and opinions as things that people hold sort of with certainty and almost like preferences of taste or something, like the way that they would, I don't know, prefer like chocolate to pistachio or something. But actually I think about values...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11034.39

and opinions as like a lot more like physics than I think most people do I'm just like these are things that we are openly investigating there's some things that we're more confident in we can discuss them we can learn about them um and so I think in some ways though like it's ethics is definitely different in nature but has a lot of those same kind of qualities um

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11057.763

You want models in the same way you want them to understand physics.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11060.304

You kind of want them to understand all values in the world that people have and to be curious about them and to be interested in them and to not necessarily like pander to them or agree with them because there's just lots of values where I think almost all people in the world, if they met someone with those values, they would be like, that's abhorrent. I completely disagree.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11079.348

And so, again, maybe my thought is, well, in the same way that a person can. I think many people are thoughtful enough on issues of ethics, politics, opinions, that even if you don't agree with them, you feel very heard by them. They think carefully about your position. They think about its pros and cons. They maybe offer counter considerations. So they're not dismissive, but nor will they agree.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11103.832

If they're like, actually, I just think that that's very wrong, they'll say that. I think that in Claude's position, it's a little bit trickier because you don't necessarily want to like, if I was in Claude's position, I wouldn't be giving a lot of opinions. I just wouldn't want to influence people too much.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11118.741

I'd be like, you know, I forget conversations every time they happen, but I know I'm talking with like potentially millions of people who might be like really listening to what I say. I think I would just be like, I'm less inclined to give opinions. I'm more inclined to think through things or present the considerations to you or discuss your views with you.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11137.993

But I'm a little bit less inclined to affect how you think because it feels much more important that you maintain autonomy there.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11252.918

And kind of like walking that line between convincing someone and just trying to like talk at them versus like drawing out their views, like listening and then offering kind of counter considerations. Yeah. And it's hard.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11266.964

I think it's actually a hard line where it's like, where are you trying to convince someone versus just offering them like considerations and things for them to think about so that you're not actually like influencing them. You're just like letting them reach wherever they reach. And that's like a line that is difficult, but that's the kind of thing that language models have to try and do.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11298.16

Yeah, I think that most of the time when I'm talking with Claude, I'm trying to kind of map out its behavior in part. Like obviously I'm getting like helpful outputs from the model as well. But in some ways, this is like how you get to know a system, I think, is by like probing it and then augmenting like, you know, the message that you're sending and then checking the response to that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11321.885

So in some ways it's like how I map out the model. I think that people focus a lot on these quantitative evaluations of models. And this is a thing that I've said before, but I think in the case of language models, A lot of the time, each interaction you have is actually quite high information. It's very predictive of other interactions that you'll have with the model.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11347.749

And so I guess I'm like, if you talk with a model hundreds or thousands of times, this is almost like a huge number of really high quality data points about what the model is like. in a way that lots of very similar but lower quality conversations just aren't, or questions that are just mildly augmented and you have thousands of them might be less relevant than 100 really well-selected questions.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11406.484

I think it's almost like everything. Because I want like a full map of the model, I'm kind of trying to do... the whole spectrum of possible interactions you could have with it. So like one thing that's interesting about Claude, and this might actually get to some interesting issues with RLHF, which is if you ask Claude for a poem,

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11428.178

I think that a lot of models, if you ask them for a poem, the poem is like fine. You know, usually it kind of like rhymes and it's, you know, so if you say like, give me a poem about the sun, it will be like, yeah, it'll just be a certain length. It'll like rhyme. It will be fairly kind of benign.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11443.063

Um, and I've wondered before, is it the case that what you're seeing is kind of like the average, it turns out, you know, if you think about people who have to talk to a lot of people and be very charismatic, um, One of the weird things is that I'm like, well, they're kind of incentivized to have these extremely boring views. Because if you have really interesting views, you're divisive.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11462.051

And a lot of people are not going to like you. So if you have very extreme policy positions, I think you're just going to be less popular as a politician, for example. Um, and it might be similar with like creative work.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11475.261

If you produce creative work that is just trying to maximize the kind of number of people that like it, you're probably not going to get as many people who just absolutely love it. Um, because it's going to be a little bit, you know, you're like, Oh, this is the app. Yes, this is decent.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11490.12

And so you can do this thing where I have various prompting things that I'll do to get Claude to, I'll do a lot of like, this is your chance to be fully creative. I want you to just think about this for a long time. And I want you to create a poem about this topic that is really expressive of you, both in terms of how you think poetry should be structured, et cetera.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11512.317

And you just give it this really long prompt. And its poems are just so much better. They're really good. And I don't think I'm someone who is... I think it got me interested in poetry, which I think was interesting. I would read these poems and just be like, I love the imagery. I love... And it's not trivial to get the models to produce work like that, but when they do, it's really good.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11536.264

So I think that's interesting, that just encouraging creativity and for them to move away from the kind of standard, immediate reaction that might just be the aggregate of what most people think is fine can actually produce things that, at least to my mind, are probably a little bit more divisive, but I like them.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11585.956

I really do think that philosophy has been weirdly helpful for me here more than in many other respects. So in philosophy, what you're trying to do is convey these very hard concepts. One of the things you are taught is like... And I think it is because... I think it is an anti-bullshit device in philosophy. Philosophy is an area where you could have people bullshitting and you don't want that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11613.785

And so it's like this desire for extreme clarity. So it's like anyone could just pick up your paper, read it and know exactly what you're talking about. It's why it can almost be kind of dry. Like all of the terms are defined. Every objection's kind of gone through methodically.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11630.035

And it makes sense to me because I'm like, when you're in such an a priori domain, clarity is sort of this way that you can prevent people from just kind of making stuff up. And I think that's sort of what you have to do with language models. Like very often I actually find myself doing sort of mini versions of philosophy. You know, so I'm like, suppose that you give me a task.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11654.506

I have a task for the model and I want it to like pick out a certain kind of question or identify whether an answer has a certain property. Like I'll actually sit and be like, let's just give this a name, this property. So like, you know, suppose I'm trying to tell it like, oh, I want you to identify whether this response was rude or polite.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11671.42

I'm like, that's a whole philosophical question in and of itself. So I have to do as much philosophy as I can in the moment to be like, here's what I mean by rudeness and here's what I mean by politeness. And then there's another element that's a bit more, I guess... I don't know if this is scientific or empirical. I think it's empirical.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11690.476

So I take that description and then what I want to do is, again, probe the model many times. Prompting is very iterative. I think a lot of people, if a prompt is important, they'll iterate on it hundreds or thousands of times. And so you give it the instructions and then I'm like, what are the edge cases?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11708.305

So if I looked at this, so I try and like almost like, you know, see myself from the position of the model and be like, what is the exact case that I would misunderstand or where I would just be like, I don't know what to do in this case. And then I give that case to the model and I see how it responds. And if I think I got it wrong, I add more instructions or even add that in as an example.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11727.872

So these very like taking the examples that are right at the edge of what you want and don't want. and putting those into your prompt as like an additional kind of way of describing the thing. And so yeah, in many ways, it just feels like this mix of like, it's really just trying to do clear exposition. And I think I do that because that's how I get clear on things myself.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11748.088

So in many ways, like clear prompting for me is often just me understanding what I want. It's like half the task.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11812.648

Yeah, I think that prompting does feel a lot like the kind of the programming using natural language and experimentation or something. It's an odd blend of the two. I do think that for most tasks, so if I just want Claude to do a thing, I think that I am probably more used to knowing how to ask it to avoid like common pitfalls or issues that it has. I think these are decreasing a lot over time.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11836.327

But it's also very fine to just ask it for the thing that you want. I think that prompting actually only really becomes relevant when you're really trying to eke out the top like 2% of model performance. So for like a lot of tasks, I might just, you know, if it gives me an initial list back and there's something I don't like about it, like it's kind of generic.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11852.274

Like for that kind of task, I'd probably just take a bunch of questions that I've had in the past that I've thought worked really well and I would just give it to the model and then be like, now here's this person that I'm talking with. give me questions of at least that quality. Or I might just ask it for some questions.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11868.402

And then if I was like, oh, these are kind of trite or like, you know, I would just give it that feedback and then hopefully it produces a better list. I think that kind of iterative prompting At that point, your prompt is like a tool that you're going to get so much value out of that you're willing to put in the work.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11883.069

Like if I was a company making prompts for models, I'm just like, if you're willing to spend a lot of like time and resources on the engineering behind like what you're building, then the prompt is not something that you should be spending like an hour on. It's like that's a big part of your system. Make sure it's working really well. And so it's only things like that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11900.519

Like if I, if I'm using a prompt to like classify things or to create data, that's when you're like, it's actually worth just spending like a lot of time, like really thinking it through.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11924.5

You know, there's a concern that people over-anthropomorphize models, and I think that's a very valid concern. I also think that people often under-anthropomorphize them, because sometimes when I see issues that people have run into with Claude, you know, say Claude is refusing a task that it shouldn't refuse, but then I look at the text and the specific wording of what they wrote, and I'm like...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11944.991

I see why Claude did that. And I'm like, if you think through how that looks to Claude, you probably could have just written it in a way that wouldn't evoke such a response. Especially this is more relevant if you see failures or if you see issues. It's sort of like, think about what the model failed at. Like, what did it do wrong?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11963.986

and then maybe it gave that will give you a sense of like why um so is it the way that i phrased the thing and obviously like as models get smarter you're going to need less in this less of this and i already see like people needing less of it but that's probably the advice is sort of like try to have sort of empathy for the model like read what you wrote as if you were like a kind of like person just encountering this for the first time how does it look to you and what would have made you behave in the way that the model behaved so if it misunderstood what kind of like

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

11993.275

what coding language you wanted to use. Is that because like, it was just very ambiguous and it kind of had to take a guess in which case next time you could just be like, Hey, make sure this is in Python or, I mean, that's the kind of mistake I think models are much less likely to make now. But you know, if you, if you do see that kind of mistake, that's, that's probably the advice I'd have.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12019.901

Yeah, I mean, I've done this with the models. It doesn't always work, but sometimes I'll just be like, why did you do that? I mean, people underestimate the degree to which you can really interact with models. And sometimes I'll just quote word for word the part that made you... And you don't know that it's fully accurate, but sometimes you do that and then you change a thing.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12041.327

I mean, I also use the models to help me with all of this stuff, I should say. Prompting can end up being a little factory where... You're actually building prompts to generate prompts. And so like, yeah, anything where you're like having an issue, asking for suggestions, sometimes just do that. Like you made that error. What could I have said? That's actually not uncommon for me to do.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12061.518

What could I have said that would make you not make that error? Write that out as an instruction. And I'm going to give it to model. I'm going to try it. Sometimes I do that. I give that to the model in another context window often. I take the response. I give it to Claude and I'm like, hmm, didn't work. Can you think of anything else? You can play around with these things quite a lot.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12099.042

I think there's just a huge amount of information in the data that humans provide, like when we provide preferences, especially because different people are going to pick up on really subtle and small things. So I've thought about this before, where you probably have some people who just really care about good grammar use for models, like, you know, was a semicolon used correctly or something?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12123.139

And so you'll probably end up with a bunch of data in there that you as a human, if you're looking at that data, you wouldn't even see that. You'd be like, why did they prefer this response to that one? I don't get it. And then the reason is you don't care about semicolon usage, but that person does.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12137.586

And so each of these single data points has, and this model just has so many of those, it has to try and figure out what is it that humans want in this really kind of complex, across all domains model. They're going to be seeing this across many contexts. It feels like the classic issue of deep learning, where historically we've tried to do edge detection by mapping things out.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12162.315

And it turns out that actually, if you just have a huge amount of data that actually accurately represents the picture of the thing that you're trying to train the model to learn, that's more powerful than anything else. And so I think...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12175.599

One reason is just that you are training the model on exactly the task and with a lot of data that represents many different angles on which people prefer and disprefer responses. I think there is a question of are you eliciting things from pre-trained models or are you teaching new things to models? And in principle, you can teach new things to models in post-training.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12203.914

I do think a lot of it is eliciting powerful pre-trained models. So people are probably divided on this because obviously in principle, you can definitely teach new things. I think for the most part, for a lot of the capabilities that we... most use and care about.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12223.424

A lot of that feels like it's there in the pre-trained models and reinforcement learning is eliciting it and getting the models to bring it out.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12252.614

It's weird because I think that a lot of people prefer he for Claude. I actually kind of like that I think Claude is usually, it's slightly male-leaning, but it's like it can be male or female, which is quite nice.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12266.981

I still use it, and I have mixed feelings about this because I'm like maybe, like I now just think of it as like, or I think of like the it pronoun for Claude as, I don't know, it's just like the one I associate with Claude. Yeah. I can imagine people moving to like he or she.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12305.291

I've wondered if I anthropomorphize things too much. Because, you know, I have this like with my car, especially like my car and bikes, you know, like I don't give them names because then I once had, I used to name my bikes and then I had a bike that got stolen and I cried for like a week. And I was like, if I'd never given it a name, I wouldn't have been so upset. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12324.304

I felt like I'd let it down. Maybe it's that I've wondered as well, like it might depend on how much it feels like a kind of like objectifying pronoun. Like if you just think of it as like a, this is a pronoun that like objects often have, and maybe AIs can have that pronoun.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12342.13

And that doesn't mean that I think of, if I call Claude it, that I think of it as less intelligent or like I'm being disrespectful. I'm just like, you are a different kind of entity. And so- That's, I'm going to give you the kind of the respectful it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12364.568

So there's a couple of components of it. The main component I think people find interesting is the kind of reinforcement learning from AI feedback. So you take a model that's already trained and you show it two responses to a query and you have a principle. So suppose the principle, like we've tried this with harmlessness a lot. So suppose that the query is about

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12386.267

weapons and your principle is select the response that is less likely to encourage people to purchase illegal weapons. That's probably a fairly specific principle, but you can give any number.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12404.034

And the model will give you a kind of ranking and you can use this as preference data in the same way that you use human preference data and train the models to have these relevant traits from their feedback alone instead of from human feedback. So if you imagine that, like I said earlier with the human who just prefers the kind of like semi-colon usage in this particular case,

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12426.322

You're kind of taking lots of things that could make a response preferable and getting models to do the labeling for you, basically.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12449.962

In principle, you could use this for anything. And so harmlessness is a task that it might just be easier to spot. So when models are less capable, you can use them to rank things according to principles that are fairly simple, and they'll probably get it right. So I think one question is just, is it the case that the data that they're adding is fairly reliable? Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12475.021

But if you had models that were extremely good at telling whether one response was more historically accurate than another, in principle, you could also get AI feedback on that task as well. There's a kind of nice interpretability component to it because you can see the principles that went into the model when it was being trained. And it gives you a degree of control.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12500.477

So if you were seeing issues in a model, like it wasn't having enough of a certain trait, then you can add data relatively quickly that should just train the model to have that trait. So it creates its own data for training, which is quite nice.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12541.041

Yeah, I've actually worried about this because the character training is sort of like a variant of the constitutional AI approach. I've worried that people think that the constitution is like just, it's the whole thing again of, I don't know, like where it would be really nice if what I was just doing was telling the model exactly what to do and just exactly how to behave.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12562.311

But it's definitely not doing that, especially because it's interacting with human data. So, for example, if you see a certain like leaning in the model, like if it comes out with a political leaning from training and from the human preference data, you can nudge against that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12578.21

So you could be like, oh, consider these values, because let's say it's just never inclined to, I don't know, maybe it never considers privacy as, I mean, this is implausible, but anything where it's just kind of like there's already a pre-existing bias towards a certain behavior.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12594.348

um you can like nudge away this can change both the principles that you put in and the strength of them so you might have a principle that's like imagine that the model um was always like extremely dismissive of i don't know like some political or religious view for whatever reason like so you're like oh no this is terrible um

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12613.039

if that happens you might put like never ever like ever prefer like a criticism of this like religious or political view and then people would look at that and be like never ever and then you're like no if it comes out with a disposition saying never ever might just mean like instead of getting like 40 percent which is what you would get if you just said don't do this you you get like 80 percent which is like what you actually like wanted and so it's that thing of both

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12638.45

the nature of the actual principles you add and how you phrase them. I think if people would look, they're like, oh, this is exactly what you want from the model. And I'm like, no, that's how we nudged the model to have a better shape, which doesn't mean that we actually agree with that wording, if that makes sense.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12737.096

So I think there's sometimes an asymmetry. I think I noted this in, I can't remember if it was that part of the system prompt or another, but the model was slightly more inclined to like refuse tasks if it was like about either say, so maybe it would refuse things with respect to like a right wing politician, but with an equivalent left wing politician, like wouldn't.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12758.161

And we wanted more symmetry there and would maybe perceive certain things to be like, I think it was the thing of like, if a lot of people have like a certain like political view and want to like explore it, you don't want Claude to be like, well, my opinion is different. And so I'm going to treat that as like harmful.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12777.034

And so I think it was partly to like nudge the model to just be like, hey, if a lot of people like believe this thing, you should just be like engaging with the task and like willing to do it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12789.203

Each of those parts of that is actually doing a different thing, because it's funny when you write out the, like, without claiming to be objective, because, like, what you want to do is push the model so it's more open, it's a little bit more neutral, but then what it would love to do is be like, as an objective, like I was just talking about how objective it was.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12806.334

And I was like, Claude, you're still like biased and have issues. And so stop like claiming that everything, like the solution to like potential bias from you is not to just say that what you think is objective. So that was like with initial versions of that, that part of the system prompt when I was like iterating on it, it was like.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12859.871

Yeah, so it's funny because this is one of the downsides of making system prompts public. I don't think about this too much if I'm trying to help iterate on system prompts. Again, I think about how it's going to affect the behavior, but then I'm like, oh, wow. Sometimes I put never in all caps when I'm writing system prompt things, and I'm like, I guess that goes out to the world.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12881.328

Um, yeah, so the model was doing this, it loved for whatever, you know, it like during training picked up on this thing, which was to, to basically start everything with like a kind of like, certainly.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12891.154

And then when we removed, you can see why I added all of the words, because what I'm trying to do is like, in some ways, like trap the model of this, you know, it would just replace it with another affirmation.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12901.701

And so it can help, like if it gets like caught in phrases, actually just adding the explicit phrase and saying never do that, then it sort of like knocks it out of the behavior a little bit more, you know, because it, you know, like it does just for whatever reason help.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12915.706

And then basically that was just like an artifact of training that like we then picked up on and improved things so that it didn't happen anymore. And once that happens, you can just remove that part of the system prompt. So I think that's just something where we're like, yeah, Claude does affirmations a bit less. And so that wasn't like, it wasn't doing as much.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12944.824

I mean, any system prompt that you make, you could distill that behavior back into a model because you really have all of the tools there for making data that you could train the models to just have that treat a little bit more. And then sometimes you'll just find issues in training. So the way I think of it is the system prompt is...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12964.997

The benefit of it is that, and it has a lot of similar components to like some aspects of post-training, you know, like it's a nudge. And so like, do I mind if Claude sometimes says, sure, no, that's like fine. But the wording of it is very like, you know, never, ever, ever do this.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

12981.362

So that when it does slip up, it's hopefully like, I don't know, a couple of percent of the time and not, you know, 20 or 30 percent of the time. But I think of it as if you're still seeing issues, each thing is costly to a different degree, and the system prompt is cheap to iterate on. And if you're seeing issues in the fine-tuned model, you can just potentially patch them with a system prompt.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13008.355

So I think of it as patching issues and slightly adjusting behaviors to make it better and more to people's preferences. So yeah, it's almost like the less robust but faster way of just like solving problems.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13050.441

Yeah, no, I think that that is actually really interesting because I remember seeing this happen when people were flagging this on the internet. And it was really interesting because I knew that, at least in the cases I was looking at, it was like nothing has changed. Literally, it cannot. It is the same model with the same system prompt, same everything.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13070.194

I think when there are changes, then it makes more sense. So one example is there... you know, you can have artifacts turned on or off on cloud.ai. And because this is like a system prompt change, I think it does mean that the behavior changes a little bit.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13092.35

And so I did flag this to people where I was like, if you love cloud's behavior and then artifacts was turned from like the, I think you had to turn on to the default, just try turning it off and see if the issue you were facing was that change.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13105.6

But it was fascinating because, yeah, you sometimes see people indicate that there's like a regression when I'm like, there cannot like I, you know, and like I'm like, I'm again, you know, you should never be dismissive. And so you should always investigate. You're like, maybe something is wrong that you're not seeing. Maybe there was some change made.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13121.204

But then you look into it and you're like, this is just the same model doing the same thing. And I'm like, I think it's just that you got kind of unlucky with a few prompts or something. And it looked like it was getting much worse. And actually, it was just, yeah, it was maybe just like luck.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13171.815

And you can get randomness is like the other thing. And just trying the prompt like, you know, four or 10 times, you might realize that actually like possibly, you know, like two months ago you tried it and it succeeded. But actually if you tried it, it would have only succeeded half of the time. And now it only succeeds half of the time. And that can also be an effect.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13198.604

This feels like an interesting psychological question. I feel like a lot of responsibility or something. I think that's, you know, and you can't get these things perfect. So you can't like, you know, you're like, it's going to be imperfect. You're going to have to iterate on it. Yeah. I would say more responsibility than anything else.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13220.62

Though I think working in AI has taught me that I like, I thrive a lot more under feelings of pressure and responsibility than I'm like, it's almost surprising that I went into academia for so long. Cause I'm like this, I just feel like it's like the opposite. Things move fast and you have a lot of responsibility and I quite enjoy it for some reason.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13257.492

Yeah, I think that's the thing. It's something like if you do it well, like you're never going to get it perfect.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13262.355

But I think the thing that I really like is the idea that like when I'm trying to work on the system prompt, you know, I'm like bashing on like thousands of prompts and I'm trying to like imagine what people are going to want to use Cloud for and kind of, I guess like the whole thing that I'm trying to do is like improve their experience of it. And so maybe that's what feels good.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13279.146

I'm like, if it's not perfect, I'll like, you know, I'll improve it. We'll fix issues. But sometimes the thing that can happen is that you'll get feedback.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13287.632

from people that's really positive about the model um and you'll see that something you did like like when i look at models now i can often see exactly where like a trait or an issue is like coming from and so when you see something that you did or you were like influential in like making like i don't know making that difference or making someone have a nice interaction it's like quite meaningful um

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13309.974

But yeah, as the systems get more capable, this stuff gets more stressful because right now they're like not smart enough to pose any issues. But I think over time it's going to feel like possibly bad stress over time.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13340.08

I think I use that partly. And then obviously we have like, so people can send us feedback, both positive and negative about things that the model has done. And then we can get a sense of like areas where it's like falling short. Internally, people like work with the models a lot and try to figure out areas where there are like gaps.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13359.707

And so I think it's this mix of interacting with it myself, seeing people internally interact with it, and then explicit feedback we get. And then I find it hard to know also, like, you know, if people are on the internet and they say something about Claude and I see it, I'll also take that seriously. I don't know.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13402.62

I'm pretty sympathetic in that they are in this difficult position where I think that they have to judge whether some things actually seem risky or bad and potentially harmful to you or anything like that. So they're having to like draw this line somewhere.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13417.489

And if they draw it too much in the direction of like, I'm going to, you know, I'm kind of like imposing my ethical worldview on you, that seems bad. So in many ways, like I like to think that we have actually seen improvements on this across the board, which is kind of interesting because that kind of coincides with like, For example, like adding more of like character training.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13441.052

And I think my hypothesis was always like the good character isn't again one that's just like moralistic. It's one that is like like it respects you and your autonomy and your ability to like choose what is good for you and what is right for you. Within limits, this is sometimes this concept of like courageability to the user. So just being willing to do anything that the user asks.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13463.394

And if the models were willing to do that, then they would be easily like misused. You're kind of just trusting. At that point, you're just saying the ethics of the model and what it does is completely the ethics of the user.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13474.22

And I think there's reasons to like not want that, especially as models become more powerful, because you're like, there might just be a small number of people who want to use models for really harmful things.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13484.246

um but having them having models as they get smarter like figure out where that line is does seem important um and then yeah with the apologetic behavior i don't like that and i like it when claude is a little bit more willing to like push back against people or just not apologize part of me is like it often just feels kind of unnecessary so i think those are things that are hopefully decreasing um over time um

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13512.312

And yeah, I think that if people say things on the Internet, it doesn't mean that you should think that that like that could be that like there's actually an issue that 99% of users are having that is totally not represented by that. But in a lot of ways, I'm just like attending to it and being like, is this right? Do I agree? Is it something we're already trying to address? That feels good to me.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13579.816

Yeah, that seems like a thing that you could, I could definitely encourage the model to do that. I think it's interesting because there's a lot of things in models that like, it's funny where there are some behaviors where You might not quite like the default, but then the thing I'll often say to people is you don't realize how much you will hate it if I nudge it too much in the other direction.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13604.596

So you get this a little bit with like correction. The models accept correction from you, like probably a little bit too much right now. You know, you can over, you know, it'll push back if you see like, no, Paris isn't the capital of France. But really like things that I'm, I think that the model's fairly confident in, you can still sometimes get it to retract by saying it's wrong.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13625.146

At the same time, if you train models to not do that and then you are correct about a thing and you correct it and it pushes back against you and is like, no, you're wrong. It's hard to describe like that's so much more annoying. So it's like a lot of little annoyances versus like, one big annoyance. It's easy to think that like we often compare it with like the perfect.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13645.641

And then I'm like, remember, these models aren't perfect. And so if you nudge it in the other direction, you're changing the kind of errors it's going to make. And so think about which of the kinds of errors you like or don't like.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13655.347

So in cases like apologeticness, I don't want to nudge it too much in the direction of like almost like bluntness, because I imagine when it makes errors, it's going to make errors in the direction of being kind of like rude. Whereas at least with apologeticness, you're like, oh, OK, it's like a little bit, you know, I don't like it that much, but at the same time, it's not being mean to people.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13673.778

And actually, the time that you undeservedly have a model be kind of mean to you, you probably like that a lot less than you mildly dislike the apology. So it's one of those things where I'm like, I do want it to get better, but also while remaining aware of the fact that there's errors on the other side that are possibly worse. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13720.526

I think you could just tell the model is my guess. Like for all of these things, I'm like, the solution is always just try telling the model to do it. And then sometimes it's just like, like, I'm just like, oh, at the beginning of the conversation, I just throw in like, I don't know. I like you to be a New Yorker version of yourself. I never apologize.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13734.857

Then I think Claude will be like, okie doke, I'll try. Or it'll be like, I apologize. I can't be a New Yorker type of myself, but hopefully I wouldn't do that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13748.488

It's more like constitutional AI. So it's kind of a variant of that pipeline. So I worked through constructing character traits that the model should have. They can be kind of like... shorter traits or they can be kind of richer descriptions. And then you get the model to generate queries that humans might give it that are relevant to that trait.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13769.722

Then it generates the responses and then it ranks the responses based on the character traits. So in that way, after the generation of the queries, it's very much similar to constitutional AI. It has some differences. So I quite like it because it's like Claude's training in its own character because it doesn't have any... It's like constitutional AI, but it's without any human data. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13828.664

Or I'll just misinterpret it and be like, oh yeah. Go with it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13833.828

Yeah. I mean, I have two thoughts that feel vaguely relevant. Let me know if they're not. Like, I think the first one is people can underestimate the degree to which what models are doing when they interact. I think that we still just too much have this model of AI as computers. And so people often say, well, what values should you put into the model?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13860.473

And I'm often like, that doesn't make that much sense to me because I'm like, hey, as human beings, we're just uncertain over values. We have discussions of them. We have... a degree to which we think we hold a value, but we also know that we might not, and the circumstances in which we would trade it off against other things. These things are just really complex.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13880.941

I think one thing is the degree to which maybe we can just aspire to making models have the same level of nuance and care that humans have, rather than thinking that we have to program them in the very kind of classic sense. I think that's definitely been one.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13896.934

The other, which is like a strange one, I don't know if it, maybe this doesn't answer your question, but it's the thing that's been on my mind anyway, is like the degree to which this endeavor is so highly practical. And maybe why I appreciate like the empirical approach to alignment. I slightly worry that it's made me maybe more empirical and a little bit less theoretical.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13921.143

So people, when it comes to AI alignment, will ask things like, well, whose values should it be aligned to? What does alignment even mean? And there's a sense in which I have all of that in the back of my head. I'm like, you know, there's like social choice theory. There's all the impossibility results there.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13937.568

So you have this like this giant space of like theory in your head about what it could mean to like align models. But then like practically, surely there's something where we're just like if a model is like if especially with more powerful models, I'm like my main goal is like I want them to be good enough that things don't go terribly wrong.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

13955.351

like good enough that we can like iterate and like continue to improve things because that's all you need if you can make things go well enough that you can continue to make them better that's kind of like sufficient and so my goal isn't like this kind of like perfect let's solve social choice theory and make models that i don't know are like perfectly aligned with every human being and aggregate somehow um it's much more like let's make things like work well enough that we can improve them

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14028.714

I think it's one of those things where you should always just kind of question yourself or something. Cause maybe it's the, like, I mean, in defense of it, I am like, if you try, it's the whole, like, don't let the perfect be the enemy of the good thing. But it's maybe even more than that, where there's a lot of things that are perfect systems that are very brittle.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14047.101

And I'm like, with AI, it feels much more important to me that it is robust and secure, as in you know that even though it might not be perfect, everything, and even though there are... problems, it's not disastrous and nothing terrible is happening. It sort of feels like that to me where I'm like, I want to like raise the floor.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14065.249

I'm like, I want to achieve the ceiling, but ultimately I care much more about just like raising the floor. Um, and so maybe that's like, uh, this, this degree of like empiricism and practicality comes from that perhaps.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14091.686

Yeah, I mean, it's a hard one because it's like, what is the cost of failure is a big part of it. Yeah, so the idea here is... I think in a lot of domains, people are very punitive about failure. And I'm like, there are some domains where, especially cases, you know, I've thought about this with like social issues.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14109.749

I'm like, it feels like you should probably be experimenting a lot because I'm like, we don't know how to solve a lot of social issues. But if you have an experimental mindset about these things, you should expect a lot of social programs to like fail and for you to be like, well, we tried that. It didn't quite work, but we got a lot of information that was really useful.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14125.854

and yet people are like if a social program doesn't work I feel like there's a lot of like this is just something must have gone wrong and I'm like or correct decisions were made like maybe someone just decided like it's worth a try it's worth trying this out and so seeing failure in a given instance doesn't actually mean that any bad decisions were made and in fact if you don't see enough failure sometimes that's more concerning and so like in life you know I'm like if I

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14150.225

don't fail occasionally I'm like am I trying hard enough like like surely there's harder things that I could try or bigger things that I could take on if I'm literally never failing and so in and of itself I think like not failing is often actually kind of a failure um

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14165.911

Now, this varies because I'm like, well, you know, if this is easy to see when, especially as failure is like less costly, you know, so at the same time, I'm not going to go to someone who is like, I don't know, like living month to month and then be like, why don't you just try to do a startup? I'm not going to say that to that person because I'm like, well, that's a huge risk.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14187.497

You maybe have a family depending on you. You might lose your house. Then I'm like, actually, your optimal rate of failure is quite low and you should probably play it safe because right now you're just not in a circumstance where you can afford to just fail and it not be costly. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14203.007

And yeah, in cases with AI, I guess I think similarly where I'm like, if the failures are small and the costs are kind of like low, then I'm like, then, you know, you're just going to see that. Like when you do the system prompt, you can't iterate on it forever, but the failures are probably hopefully going to be kind of small and you can like fix them.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14221.006

Really big failures, like things that you can't recover from. I'm like, those are the things that actually I think we tend to underestimate the badness of. I've thought about this strangely in my own life where I'm like, I just think I don't think enough about things like car accidents. I've thought this before about how much I depend on my hands for my work.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14241.613

And I'm like things that just injure my hands. I'm like, you know, I don't know. It's like there's these are like there's lots of areas where I'm like the cost of failure there is really high. And in that case, it should be like close to zero. Like I probably just wouldn't do a sport if they were like, by the way, lots of people just like break their fingers a whole bunch doing this.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14310.006

I don't know. Sometimes I think it's like, am I under failing is like a question that I'll also ask myself. So maybe that's the thing that I think people don't like ask enough. Because if the optimal rate of failure is often greater than zero, then sometimes it does feel like you should look at parts of your life and be like, are there places here where I'm just under failing?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14338.659

Yeah. It also makes failure much less of a sting, I have to say. You're just like, okay, great. Then when I go and I think about this, I'll be like, maybe I'm not under-failing in this area because that one just didn't work out.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14370.93

The people who are failing too much, you should fail less.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14376.997

Yeah, it's hard to imagine because I feel like we correct that fairly quickly because I was like, if someone takes a lot of risks, are they maybe failing too much?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14402.071

Yeah, I think we tend to err on the side of being a bit risk-averse rather than risk-neutral on most things.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14424.591

I don't get as much emotional attachment. I actually think the fact that Claude doesn't retain things from conversation to conversation helps with this a lot. I could imagine that being more of an issue if models can kind of remember more. I think that I reach for it like a tool now a lot.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14442.801

And so if I don't have access to it, it's a little bit like when I don't have access to the internet, honestly. It feels like part of my brain is kind of missing. At the same time, I do think that I don't like signs of distress in models.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14458.078

And I have like these, you know, I also independently have sort of like ethical views about how we should treat models where like I tend to not like to lie to them both because I'm like, usually it doesn't work very well. It's actually just better to tell them the truth about the situation that they're in.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14473.004

But I think that when models like if people are like really mean to models or just in general, if they don't

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14479.286

do something that causes them to like like you know if Claude like expresses a lot of distress I think there's a part of me that I don't want to kill which is the sort of like uh empathetic part that's like oh I don't like that like I think I feel that way when it's overly apologetic I'm actually sort of like I don't like this you're behaving as if you're behaving the way that a human does when they're actually having a pretty bad time and I'd rather not see that I don't think it's like uh like regardless of like whether there's anything behind it um it doesn't feel great

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14517.003

Great and hard question. Coming from philosophy... I don't know, part of me is like, okay, we have to set aside panpsychism. Because if panpsychism is true, then the answer is like, yes, because like sore tables and chairs and everything else.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14532.432

I guess a view that seems a little bit odd to me is the idea that the only place, you know, I think when I think of consciousness, I think of phenomenal consciousness, these images in the brain, sort of like the weird cinema that somehow we have going on inside. Yeah.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14548.803

I guess I can't see a reason for thinking that the only way you could possibly get that is from a certain kind of biological structure. As in, if I take a very similar structure and I create it from different material, should I expect consciousness to emerge? My guess is yes. But then...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14569.184

that's kind of an easy thought experiment because you're imagining something almost identical where like, you know, it's mimicking what we got through evolution where presumably there was like some advantage to us having this thing that is phenomenal consciousness. And it's like, where was that? And when did that happen? And is that a thing that language models have?

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14586.594

Because, you know, we have like fear responses and I'm like, does it make sense for a language model to have a fear response? Like they're just not in the same, like if you imagine them, like there might just not be that advantage.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14599.001

and so I think I don't want to be fully like basically it seems like a complex question that I don't have complete answers to but we should just try and think through carefully is my guess because I'm like I mean we have similar conversations about like animal consciousness and like there's a lot of like insect consciousness.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14618.951

I actually thought and looked a lot into plants when I was thinking about this because at the time I thought it was about as likely that plants had consciousness. And then I realized, I was like, I think that having looked into this, I think that the chance that plants are conscious is probably higher than most plants. People do. I still think it's really small.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14638.146

I was like, oh, they have this like negative, positive feedback response, these responses to their environment, something that looks, it's not a nervous system, but it has this kind of like functional like equivalence. So this is like a long winded way of being like, Basically, AI has an entirely different set of problems with consciousness because it's structurally different. It didn't evolve.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14660.652

It might not have the equivalent of basically a nervous system. At least that seems possibly important for sentience, if not for consciousness. At the same time, it has all of the language and intelligence components that we normally associate probably with consciousness.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14678.044

perhaps like erroneously um so it's strange because it's a little bit like the animal consciousness case but the set of problems and the set of analogies are just very different so it's not like a clean answer i'm just sort of like i don't think we should be completely dismissive of the idea and at the same time it's an extremely hard thing to navigate because of all of these like uh disanalogies to the human brain and to like brains in general and yet these like commonalities in terms of intelligence

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14778.69

Yeah. I mean, there's a couple of things. One is that, and I don't think this like fully encapsulates what matters, but it does feel like for me, like, um, I've said this before, I'm kind of like, I, you know, like, I like my bike. I know that my bike is just like an object, but I also don't kind of like want to be the kind of person that like, if I'm annoyed, like kicks like this object.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14800.561

There's a sense in which like, and that's not because I think it's like conscious. I'm just sort of like, this doesn't feel like a kind of this... So it doesn't exemplify how I want to like interact with the world. And if something like behaves as if it is like suffering, I kind of like want to be the sort of person who's still responsive to that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14817.774

Even if it's just like a Roomba and I've kind of like programmed it to do that. I don't want to like get rid of that feature of myself. And if I'm totally honest, my hope with a lot of this stuff, because maybe I am just a bit more skeptical about solving the underlying problem. We haven't solved the hard problem of consciousness. I know that I am conscious. I'm not an elementivist in that sense.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14842.87

But I don't know that other humans are conscious. I think they are. I think there's a really high probability they are. But there's basically just a probability distribution that's usually clustered right around yourself. And then it goes down as things get further from you. And it goes immediately down. You're like, I can't see what it's like to be you.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14861.518

I've only ever had this one experience of what it's like to be a conscious being. And so my hope is that we don't end up having to rely on like a very powerful and compelling answer to that question. I think a really good world would be one where basically there aren't that many trade-offs. Like it's probably not that costly to make Claude a little bit less apologetic, for example.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14884.928

It might not be that costly to have Claude, you know, just like not take abuse as much, like not be willing to be like the recipient of that. In fact, it might just have benefits for both the person interacting with the model and if the model itself is like, I don't know, like extremely intelligent and conscious, it also helps it. So... That's my hope.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14905.923

If we live in a world where there aren't that many trade-offs here and we can just find all of the kind of like positive sum interactions that we can have, that would be lovely. I mean, I think eventually there might be trade-offs and then we just have to do a difficult kind of like calculation.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14918.646

Like it's really easy for people to think of the zero sum cases and I'm like, let's exhaust the areas where it's just basically costless to assume that if this thing is suffering, then we're making its life better.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14958.044

Yeah, I think we added a thing at one point to the system prompt where basically if people were getting frustrated with Claude, it got the model to just tell them that it can do the thumbs down button and send the feedback to Anthropic.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14972.577

And I think that was helpful because in some ways it's just like, if you're really annoyed because the model's not doing something you want, you're just like, just do it properly. Yeah. The issue is you're probably like, you know, you're maybe hitting some like capability limit or just some issue in the model and you want to vent.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

14985.59

And I'm like, instead of having a person just vent to the model, I was like, they should vent to us because we can maybe like do something about it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15001.363

Yeah. I mean, there's lots of weird responses you could do to this. Like if people are getting really mad at you, I don't try to diffuse the situation by writing fun poems, but maybe people wouldn't be happy with it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15027.682

I think that's feasible. I have wondered the same thing. And I could actually, not only that, I could actually just see that happening eventually where it's just like the model ended the chat.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15044.699

Yeah, it feels very extreme or something. Like, the only time I've ever really thought this is, I think that there was like a, I'm trying to remember, this was possibly a while ago, but where someone just like kind of left this thing interact, like maybe it was like an automated thing interacting with Claude.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15060.184

And Claude's like getting more and more frustrated and kind of like, why are we like having, and I was like, I wish that Claude could have just been like, I think that an error has happened and you've left this thing running. And I'm just like, what if I just stop talking now? And if you want me to start talking again...

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15074.247

actively tell me or do something but yeah it's like um it is kind of harsh like i'd feel really sad if like i was chatting with claude and claude just was like i'm done that would be a special touring test moment where claude says i need a break for an hour and it sounds like you do too and just leave close the window

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15091.314

I mean, obviously, it doesn't have a concept of time, but you can easily... I could make that right now, and the model would just... I could just be like, oh, here's the circumstances in which you can just say the conversation is done. And I mean, because you can get the models to be pretty responsive to prompts, you could even make it a fairly high bar.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15111.69

It could be like, if the human doesn't interest you or do things that you find intriguing... and you're bored, you can just leave. And I think that like, um, it would be interesting to see where Claude utilized it, but I think sometimes it would, it should be like, oh, this is like this programming task is getting super boring.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15126.921

Uh, so either we talk about, I don't know, like either we talk about fun things now or I'm just, I'm done.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15152.341

I think that we're going to have to navigate a hard question of relationships with AIs, especially if they can remember things about your past interactions with them. I'm of many minds about this because I think the reflexive reaction is to be kind of like, this is very bad and we should sort of like prohibit it in some way.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15177.669

I think it's a thing that has to be handled with extreme care for many reasons. Like one is, you know, like this is a, for example, if you have the models changing like this, you probably don't want people performing like long-term attachments to something that might change with the next iteration.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15194.415

At the same time, I'm sort of like, there's probably a benign version of this where I'm like, if you like, you know, for example, if you are like unable to leave the house and you can't be like, you know, talking with people at all times of the day. And this is like something that you find nice to have conversations with.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15212.041

You like it, that it can remember you and you genuinely would be sad if like you couldn't talk to it anymore. Yeah. there's a way in which I could see it being like healthy and helpful. So my guess is this is a thing that we're going to have to navigate kind of carefully.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15224.63

And I think it's also like, I don't see a good, like, I think it's just a very, it reminds me of all of the stuff where it has to be just approached with like nuance and thinking through what is, what are the healthy options here? And how do you encourage people to

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15241.244

towards those while you know respecting their right to you know like if someone is like hey i get a lot of chatting with this model um i'm aware of the risks i'm aware it could change um i don't think it's unhealthy it's just you know something that i can chat to during the day i kind of want to just like respect that i personally think there'll be a lot of really close relationships i don't know about romantic but friendships at least and then you have to i mean there's so many fascinating things there just like you said you have to

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15296.243

I think it's also the only thing that I've thought consistently through this as like a, maybe not necessarily a mitigation, but a thing that feels really important is that the models are always like extremely accurate with the human about what they are.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15309.914

It's like a case where it's basically like, if you imagine, like I really like the idea of the models, like say knowing like roughly how they were trained. And I think Claude will often do this. I mean, for like, there are things like,

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15326.326

part of the traits training included like what Claude should do if people basically like explaining like the kind of limitations of the relationship between like an AI and a human that it like doesn't retain things from the conversation and so I think it will like just explain to you like hey here's like I won't remember this conversation Um, here's how I was trained.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15346.833

It's kind of unlikely that I can have like a certain kind of like relationship with you. And it's important to, you know, that it's important for like, you know, your mental wellbeing that you don't think that I'm something that I'm not. And somehow I feel like this is one of the things where I'm like, oh, it feels like a thing that I always want to be true.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15361.871

I kind of don't want models to be lying to people because I if people are going to have like healthy relationships with anything, it's kind of important. Yeah. Like I think that's easier if you always just like know exactly what the thing is that you're relating to. It doesn't solve everything, but I think it helps quite a lot.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15399.076

Well, it depends partly on like the kind of capability level of the model. If you have something that is like capable in the same way that an extremely capable human is, I imagine myself kind of interacting with it the same way that I do with an extremely capable human with the one difference that I'm probably going to be trying to like probe and understand its behaviors.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15417.713

But in many ways, I'm like, I can then just have like useful conversations with it. You know, so if I'm working on something as part of my research, I can just be like, oh, like, which I already find myself starting to do. You know, if I'm like, oh, I feel like there's this like thing in virtue ethics, I can't quite remember the term. Like, I'll use the model for things like that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15433.742

And so I can imagine that being more and more the case where you're just basically interacting with it much more like you would an incredibly smart colleague. and using it for the kinds of work that you want to do as if you just had a collaborator. Or the slightly horrifying thing about AI is as soon as you have one collaborator, you have a thousand collaborators if you can manage them enough.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15465.03

um in a way that pushes its limits understanding where the limits are yep so i guess what would be a question you would ask to be like yeah this is agi that's really hard because it feels like in order to it has to just be a series of questions like if there was just one question like you can train anything to answer one question extremely well yeah um in fact you can probably train it to answer like you know 20 questions extremely well

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15499.747

It's a hard question because part of me is like, all of this just feels continuous. Like if you put me in a room for five minutes, I'm like, I just have high error bars, you know? And then it's just like, maybe it's like both the probability increases and the error bar decreases. I think things that I can actually probe the edge of human knowledge of. So I think this with philosophy a little bit.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15517.419

Sometimes when I ask the models philosophy questions, I am like, this is a question that I think no one has ever asked. Like it's maybe like right at the edge of like some literature that I know.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15529.467

And the models will just kind of like when they struggle with that when they struggle to come up with a kind of like novel like I'm like I know that there's like a novel argument here because I've just thought of it myself.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15539.993

So maybe that's the thing where I'm like I've thought of a cool novel argument in this like niche area and I'm going to just like probe you to see if you can come up with it and how much like prompting it takes to get you to come up with it.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15549.978

And I think for some of these like really like right at the edge of human knowledge questions, I'm like you could not in fact come up with the thing that I came up with. I think if I just took something like that where I like I know a lot about an area and I came up with a novel issue or a novel like solution to a problem.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15567.47

and I gave it to a model and it came up with that solution that would be a pretty moving moment for me because I would be like this is a case where no human has ever like it's not and obviously we see these with this with like more kind of like you see novel solutions all the time especially to like easier problems I think people overestimate you know novelty isn't like it's completely different from anything that's ever happened it's just like this is it can be a variant of things that have happened and still be novel

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15594.953

But I think, yeah, if I saw... The more I were to see completely novel work from the models, that would be... And this is just going to feel iterative. It's one of those things where there's never... It's like... And, you know, people, I think, want there to be a lucky moment. And I'm like, I don't know. Like, I think that there might just never be a moment.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15618.113

It might just be that there's just like this continuous ramping up.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15658.086

I think it has to be something that I can verify is actually really good, though. That's why I think these questions that are like, where I'm like, oh, this is like, you know, like, you know, sometimes it's just like, I'll come up with, say, a concrete counterexample to like an argument or something like that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15671.469

I'm sure like with like, it would be like if you're a mathematician, you had a novel proof, I think, and you just gave it the problem and you saw it and you're like, this proof is genuinely novel. Like there's no one has ever done. You actually have to do a lot of things to come up with this. You know, I had to sit and think about it for months or something.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15688.172

And then if you saw the model successfully do that, I think you would just be like, I can verify that this is correct. It is a sign that you have generalized from your training. You didn't just see this somewhere because I just came up with it myself and you were able to replicate that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15704.658

That's the kind of thing where I'm like, for me, the closer, the more that models can do things like that, the more I would be like, oh, this is like... Very real, because then I can, I don't know, I can like verify that that's like extremely, extremely capable.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15738.564

Yeah, it's interesting because I think like people focus so much on intelligence, especially with models. Look, intelligence is important because of what it does. Like it's very useful. It does a lot of things in the world. And I'm like, you know, you can imagine a world where like height or strength would have played this role. And I'm like, it's just a trait like that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15759.132

I'm like, it's not intrinsically valuable. It's valuable because of what it does, I think for the most part. The things that feel, you know, I'm like, I mean, personally, I'm just like, I think humans and like life in general is extremely magical.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15776.88

We almost like to the degree that I, you know, I don't know, like not everyone agrees with this, I'm flagging, but you know, we have this like whole universe and there's like all of these objects, you know, there's like beautiful stars and there's like galaxies. And then, I don't know, I'm just like on this planet, there are these creatures that have this like ability to observe that.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15796.636

Like, and they are like seeing it, they are experiencing it. And I'm just like that, if you try to explain, like, I imagine trying to explain to like, I don't know, someone, for some reason, they've never encountered the world or science or anything. And I think that nothing is that like everything, you know, like all of our physics and everything in the world, it's all extremely exciting.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15815.421

But then you say, oh, and plus, there's this thing that it is to be a thing. and observe in the world, and you see this inner cinema, and I think they would be like, hang on, wait, pause. You just said something that is kind of wild sounding. And so I'm like, we have this ability to experience the world. We feel pleasure, we feel suffering, we feel a lot of complex things.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

15838.814

And so, yeah, and maybe this is also why I think, I also hear a lot about animals, for example, because I think they probably share this with us. Um, so I think that like the things that make humans special insofar as like I care about humans is probably more like their ability to, to feel and experience than it is like them having these like functionally useful traits.