Coder Radio
605: The Democrats Behind DeepSeek
Chris
Unlike GPT-35, which activates the entire model, MOE only activates the relevant parts or the experts for a given task. GPT-4 does this with 16 experts, each having 110 billion parameters each. DeepSeq MOE in version 2 improved on this by introducing specialized and generalized experts along with better load balancing and routing.
0
💬
0
Comments
Log in to comment.
There are no comments yet.