Menu
Sign In Add Podcast

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1737.552 - 1763.046 Arvid Lundmark

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

0
💬 0

Comments

There are no comments yet.

Log in to comment.