Accidental Tech Podcast
624: Do Less Math in Computers
Casey Liss
MOE splits the model into multiple quote-unquote experts and only activates the ones that are necessary. GPT-4 was an MOE model that was believed to have 16 experts with approximately 110 billion parameters each. DeepSeq MLA, multi-head latent attention is the MLA there, was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required.
0
💬
0
Comments
Log in to comment.
There are no comments yet.