Menu
Sign In Pricing Add Podcast

Coder Radio

605: The Democrats Behind DeepSeek

507.476 - 531.113 Chris

Then you combine that with DeepSeq MLA, which tackled the memory issue inference. So typically memory use skyrockets due to the context window. I see this on my laptop. Each token requires a key and a value. DeepSeq MLA... or I guess it's also known as multi-head latent attention, okay, it compresses the key value store. So it significantly reduces the memory demands during inference.

0
💬 0

Comments

There are no comments yet.

Log in to comment.