Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI
Sualeh Asif
there's less of them, but maybe the theory is that you actually want a lot of different, like you want each of the keys and values to actually be different. So one way to reduce the size is you keep one big shared vector for all the keys and values. And then you have smaller vectors for every single token, so that you can store only the smaller thing. There's some sort of low-rank reduction.
0
💬
0
Comments
Log in to comment.
There are no comments yet.