Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
Yeah. So like the basic, the size of your KV cache is both the size of all your prompts multiplied by the number of prompts being processed in parallel. So you could increase either those dimensions, right? The batch size or the size of your prompts without degrading the latency of generating tokens.
0
💬
0
Comments
Log in to comment.
There are no comments yet.