The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq
Jonathan Ross
Yeah, so MOE stands for mixture of experts. When you use LAMA 70 billion, you actually use every single parameter in that model. When you use Mixtrals 8x7b, you use two of the roughly 8b experts, but it's much smaller. And effectively, while it doesn't correlate exactly, it correlates very closely. The number of parameters effectively tells you how much compute you're performing.
0
💬
0
Comments
Log in to comment.
There are no comments yet.