Menu
Sign In Add Podcast

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4165.302 - 4184.235 Aman Sanger

You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.

0
💬 0

Comments

There are no comments yet.

Log in to comment.