Menu
Sign In Add Podcast

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4185.616 - 4206.777 Aman Sanger

Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.

0
💬 0

Comments

There are no comments yet.

Log in to comment.