Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.
0
💬
0
Comments
Log in to comment.
There are no comments yet.