Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.
0
💬
0
Comments
Log in to comment.
There are no comments yet.