Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Nathan Lambert
Correct. But then output tokens, the reason why it's so expensive is because I can't do it in parallel, right? It's autoregressive. Every time I generate a token, I must not only read the whole entire model into memory and activate it, calculate it to generate the next token, I also have to read the entire KV cache.
0
💬
0
Comments
Log in to comment.
There are no comments yet.