Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2299.286 - 2318.204 Dylan Patel

And where mixture of experts is applied is that this dense model, the dense model holds most of the weights if you count them in a transformer model. So you can get really big gains from those mixture of experts on parameter efficiency at training and inference because you get this efficiency by not activating all of these parameters.

0
💬 0

Comments

There are no comments yet.

Log in to comment.