Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Dylan Patel
If you get into the details with this latent attention, it's one of those things I look at and say, okay, they're doing really complex implementations because there's other parts of language models such as embeddings that are used to extend the context length. The common one that DeepSeq uses is rotary positional embeddings, which is called rope.
0
💬
0
Comments
Log in to comment.
There are no comments yet.