Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity
Gradient descent was the whole time, you were trying to go and do this, gradient descent was actually in the behind the scenes going and searching more efficiently than you could through the space of sparse models and going and learning whatever sparse model was most efficient and then figuring out how to fold it down nicely to go and run conveniently on your GPU, which does, you know, nice dense matrix multiplies and that you just can't beat that.
0
💬
0
Comments
Log in to comment.
There are no comments yet.