Menu
Sign In Pricing Add Podcast

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Deepseek Special: Is Deepseek a Weapon of the CCP | How Should OpenAI and the US Government Respond | Why $500BN for Stargate is Not Enough | The Future of Inference, NVIDIA and Foundation Models with Jonathan Ross @ Groq

2320.265 - 2346.316 Jonathan Ross

Yeah, so MOE stands for mixture of experts. When you use LAMA 70 billion, you actually use every single parameter in that model. When you use Mixtrals 8x7b, you use two of the roughly 8b experts, but it's much smaller. And effectively, while it doesn't correlate exactly, it correlates very closely. The number of parameters effectively tells you how much compute you're performing.

0
💬 0

Comments

There are no comments yet.

Log in to comment.