The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton
Arvind Narayanan
These benchmarks that models are being tested on don't really capture what we would use them for in the real world. So that's one reason why LLM evaluation is a minefield. And there's also just a very simple factor of contamination. Maybe the model has already trained on the answers that it's being evaluated on in the benchmark. And so if you ask it new questions, it's going to struggle.
0
💬
0
Comments
Log in to comment.
There are no comments yet.