Accidental Tech Podcast
624: Do Less Math in Computers
Casey Liss
The DeepSeek V2 model introduced two important breakthroughs, DeepSeek MOE and DeepSeek MLA. The MOE and DeepSeek MOE refers to mixture of experts. Some models, like GPT-3.5, activate the entire model during both training and inference. It turns out, however, that not every part of the model is necessary for the topic at hand.
0
💬
0
Comments
Log in to comment.
There are no comments yet.