Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2238.775 - 2257.848 Nathan Lambert

So this model technically has more embedding space for information, right? To compress all of the world's knowledge that's on the internet down. But at the same time, it is only activating around 37 billion of the parameters. So only 37 billion of these parameters actually need to be computed every single time you're training data or inferencing data out of it.

0
💬 0

Comments

There are no comments yet.

Log in to comment.