Menu
Sign In Pricing Add Podcast

Accidental Tech Podcast

624: Do Less Math in Computers

4281.684 - 4301.295 Casey Liss

The DeepSeek V2 model introduced two important breakthroughs, DeepSeek MOE and DeepSeek MLA. The MOE and DeepSeek MOE refers to mixture of experts. Some models, like GPT-3.5, activate the entire model during both training and inference. It turns out, however, that not every part of the model is necessary for the topic at hand.

0
💬 0

Comments

There are no comments yet.

Log in to comment.