Lex Fridman Podcast
#434 – Aravind Srinivas: Perplexity CEO on Future of AI, Search & the Internet
Aravind Srinivas
You take Common Crawl and instead of 1 billion, go all the way to 175 billion. But that was done through analysis called the scaling loss, which is for a bigger model, you need to keep scaling the amount of tokens and you train on 300 billion tokens. Now it feels small. These models are being trained on like tens of trillions of tokens. and trillions of parameters.
0
💬
0
Comments
Log in to comment.
There are no comments yet.