Menu
Sign In Pricing Add Podcast

Coder Radio

605: The Democrats Behind DeepSeek

484.188 - 506.695 Chris

Unlike GPT-35, which activates the entire model, MOE only activates the relevant parts or the experts for a given task. GPT-4 does this with 16 experts, each having 110 billion parameters each. DeepSeq MOE in version 2 improved on this by introducing specialized and generalized experts along with better load balancing and routing.

0
💬 0

Comments

There are no comments yet.

Log in to comment.