Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Dylan Patel
But the math is very nice so that when you're doing this repeatedly, this KV cache, this key value matrix, operation, you can keep appending the new values to it. So you keep track of what your previous values you're inferring over in this autoregressive chain. You keep it in memory the whole time. And this is a really crucial thing to manage when serving inference at scale.
0
💬
0
Comments
Log in to comment.
There are no comments yet.