Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI
Aman Sanger
Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.
0
💬
0
Comments
Log in to comment.
There are no comments yet.