Coder Radio

605: The Democrats Behind DeepSeek

484.188 - 506.695 Chris

Unlike GPT-35, which activates the entire model, MOE only activates the relevant parts or the experts for a given task. GPT-4 does this with 16 experts, each having 110 billion parameters each. DeepSeq MOE in version 2 improved on this by introducing specialized and generalized experts along with better load balancing and routing.

💬 0

Comments

There are no comments yet.

Back to full episode

Coder Radio

605: The Democrats Behind DeepSeek

Comments

Login Required