Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Nathan Lambert
When all of it routes to one part of the model, then you can have this overloading of a certain set of the GPU resources or a certain set of the GPUs, and then the rest of the training network sits idle because all of the tokens are just routing to that. This is one of the biggest complexities with running a very sparse mixture of experts model.
0
💬
0
Comments
Log in to comment.
There are no comments yet.