Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Nathan Lambert
But then when you're exchanging weights, if you're not able to overlap communications and compute perfectly, there may be a time period where your GPUs are just idle and you're exchanging weights and you're like, hey, the model's updating. So you're exchanging the gradients, you do the model update, and then you start training again. So the power goes And it's super spiky.
0
💬
0
Comments
Log in to comment.
There are no comments yet.