Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2448.026 - 2466.519 Dylan Patel

If you get into the details with this latent attention, it's one of those things I look at and say, okay, they're doing really complex implementations because there's other parts of language models such as embeddings that are used to extend the context length. The common one that DeepSeq uses is rotary positional embeddings, which is called rope.

💬 0

Comments

There are no comments yet.

Back to full episode

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Comments

Login Required