Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

4548.211 - 4571.881 Dario Amodei

Yeah, I mean, of course you can hook up the mechanistic interpretability to the model itself, but then you've kind of lost it as a reliable indicator of the model state. There are a bunch of exotic ways you can think of that it might also not be reliable. Like if the model gets smart enough that it can like, jump computers and read the code where you're looking at its internal state.

💬 0

Comments

There are no comments yet.

Back to full episode

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Comments

Login Required