Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2448.026 - 2466.519 Dylan Patel

If you get into the details with this latent attention, it's one of those things I look at and say, okay, they're doing really complex implementations because there's other parts of language models such as embeddings that are used to extend the context length. The common one that DeepSeq uses is rotary positional embeddings, which is called rope.

0
💬 0

Comments

There are no comments yet.

Log in to comment.