Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

7894.55 - 7917.928 Dylan Patel

But the math is very nice so that when you're doing this repeatedly, this KV cache, this key value matrix, operation, you can keep appending the new values to it. So you keep track of what your previous values you're inferring over in this autoregressive chain. You keep it in memory the whole time. And this is a really crucial thing to manage when serving inference at scale.

0
💬 0

Comments

There are no comments yet.

Log in to comment.