Lex Fridman Podcast
#434 – Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet
Aravind Srinivas
And like, whether it's cons or attention, I guess attention and transformers make even better use of hardware than cons because they apply more compute per flop. Because in a transformer, the self-attention operator doesn't even have parameters. The QK transpose softmax times V has no parameter, but it's doing a lot of flops. And that's powerful. It learns multi-order dependencies.
0
💬
0
Comments
Log in to comment.
There are no comments yet.