Coder Radio
605: The Democrats Behind DeepSeek
Chris
Then you combine that with DeepSeq MLA, which tackled the memory issue inference. So typically memory use skyrockets due to the context window. I see this on my laptop. Each token requires a key and a value. DeepSeq MLA... or I guess it's also known as multi-head latent attention, okay, it compresses the key value store. So it significantly reduces the memory demands during inference.
0
💬
0
Comments
Log in to comment.
There are no comments yet.