Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Michael Truell

👤 Person
225 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so it's also just the public aspect of these things is tricky.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so it's also just the public aspect of these things is tricky.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

And so it's also just the public aspect of these things is tricky.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.

Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI

Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.