Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#434 – Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet

4600.44 - 4631.095 Aravind Srinivas

And like, whether it's cons or attention, I guess attention and transformers make even better use of hardware than cons because they apply more compute per flop. Because in a transformer, the self-attention operator doesn't even have parameters. The QK transpose softmax times V has no parameter, but it's doing a lot of flops. And that's powerful. It learns multi-order dependencies.

0
💬 0

Comments

There are no comments yet.

Log in to comment.