Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.
0
💬
0
Comments
Log in to comment.
There are no comments yet.