Menu
Sign In Add Podcast
Podcast Image

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Sun, 06 Oct 2024

Description

Aman Sanger, Arvid Lunnemark, Michael Truell, and Sualeh Asif are creators of Cursor, a popular code editor that specializes in AI-assisted programming. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep447-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/cursor-team-transcript CONTACT LEX: Feedback - give feedback to Lex: https://lexfridman.com/survey AMA - submit questions, videos or call-in: https://lexfridman.com/ama Hiring - join our team: https://lexfridman.com/hiring Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: Cursor Website: https://cursor.com Cursor on X: https://x.com/cursor_ai Anysphere Website: https://anysphere.inc/ Aman's X: https://x.com/amanrsanger Aman's Website: https://amansanger.com/ Arvid's X: https://x.com/ArVID220u Arvid's Website: https://arvid.xyz/ Michael's Website: https://mntruell.com/ Michael's LinkedIn: https://bit.ly/3zIDkPN Sualeh's X: https://x.com/sualehasif996 Sualeh's Website: https://sualehasif.me/ SPONSORS: To support this podcast, check out our sponsors & get discounts: Encord: AI tooling for annotation & data management. Go to https://encord.com/lex MasterClass: Online classes from world-class experts. Go to https://masterclass.com/lexpod Shopify: Sell stuff online. Go to https://shopify.com/lex NetSuite: Business management software. Go to http://netsuite.com/lex AG1: All-in-one daily nutrition drinks. Go to https://drinkag1.com/lex OUTLINE: (00:00) - Introduction (09:25) - Code editor basics (11:35) - GitHub Copilot (18:53) - Cursor (25:20) - Cursor Tab (31:35) - Code diff (39:46) - ML details (45:20) - GPT vs Claude (51:54) - Prompt engineering (59:20) - AI agents (1:13:18) - Running code in background (1:17:57) - Debugging (1:23:25) - Dangerous code (1:34:35) - Branching file systems (1:37:47) - Scaling challenges (1:51:58) - Context (1:57:05) - OpenAI o1 (2:08:27) - Synthetic data (2:12:14) - RLHF vs RLAIF (2:14:01) - Fields Medal for AI (2:16:43) - Scaling laws (2:25:32) - The future of programming PODCAST LINKS: - Podcast Website: https://lexfridman.com/podcast - Apple Podcasts: https://apple.co/2lwqZIr - Spotify: https://spoti.fi/2nEwCF8 - RSS: https://lexfridman.com/feed/podcast/ - Podcast Playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4 - Clips Channel: https://www.youtube.com/lexclips

Audio
Transcription

0.109 - 23.161 Lex Fridman

The following is a conversation with the founding members of the Cursor team, Michael Truel, Swale Asif, Arvid Lundmark, and Aman Sanger. Cursor is a code editor based on VS Code that adds a lot of powerful features for AI-assisted coding. It has captivated the attention and excitement of the programming and AI communities.

0
💬 0

24.001 - 50.232 Lex Fridman

So I thought this is an excellent opportunity to dive deep into the role of AI in programming. This is a super technical conversation that is bigger than just about one code editor. It's about the future of programming and in general, the future of human AI collaboration in designing and engineering complicated and powerful systems. And now a quick few second mention of each sponsor.

0
💬 0

50.432 - 66.767 Lex Fridman

Check them out in the description. It's the best way to support this podcast. We've got Encore for unifying your machine learning stack, Masterclass for learning, Shopify for selling stuff online, NetSuite for your business, and AG1 for your health. Choose wisely, my friends.

0
💬 0

67.567 - 86.916 Lex Fridman

Also, if you want to get in touch with me for whatever reason, or take a survey or send me questions for an AMA, all of that would be great. Go to lexgerman.com contact. And now onto the full ad reads. I try to make them interesting, but if you skip them, please still check out our sponsors. I enjoy their stuff. Maybe you will too.

0
💬 0

88.752 - 111.111 Lex Fridman

This episode is brought to you by Encore, a platform that provides data-focused AI tooling for data annotation, curation, management, and for model evaluation. One of the things I love about these guys is they have a great blog that describes cleanly. I mean, it's technical, but it's not too technical, but it's sufficiently technical to where it's actually describing ideas, not BS.

0
💬 0

112.032 - 135.502 Lex Fridman

blog posts on sort of the state-of-the-art, like the OpenAI 01 model that was just released. So sometimes they integrate it into why this is a part of Encore, why this makes sense, and sometimes not. And so I love that. I recommend their blog just in general. That said, when they are looking at state-of-the-art models, they are always looking for ways to integrate it into their platform.

0
💬 0

136.002 - 153.815 Lex Fridman

Basically, it's a place to organize your data, and data is everything. This was true before the popularity and the explosion of attention methods of transformers. And it is still very much true now. Sort of the non-synthetic, the human generated data is extremely important.

0
💬 0

154.175 - 176.57 Lex Fridman

How you generate that data, how you organize that data, how you leverage it, how you train on it, how you fine tune on it, the pre-training, the post-training, all of it, the whole thing. Data is extremely, extremely important. And so Encore takes data very seriously. Anyway, go try out Encore to create, annotate, and manage your AI data at Encore.com slash Lex. That's Encore.com slash Lex.

0
💬 0

178.232 - 203.462 Lex Fridman

This episode is also brought to you by Masterclass, where you can watch over 200 classes from the best people in the world in their respective disciplines. Carlos Santana on guitar, for example. I loved that one. There's a few guitar ones, Tom Morello too. Great, great, great stuff. But Carlos Santana, his instrumental Europa. I haven't quite tried to play that, but it's on my to-do list.

0
💬 0

203.642 - 226.462 Lex Fridman

It's sort of one of those things, you know for sure this is a thing I will play because it's too beautiful. It's too soulful. It feels like once you play, you understand something about the guitar that you didn't before. It's not blues. It's not, I don't know what it is. It's some kind of dreamlike teleportation into a psychedelic world.

0
💬 0

228.043 - 259.354 Lex Fridman

where the tone is warmer than anything else I've ever heard. And still, the guitar can cry. I don't know. I love it. He's a genius. So it's such a gift that you can get a genius like that. to teach us about his secrets. Get unlimited access to every Masterclass and get an additional 15% off an annual membership at masterclass.com slash lexpod. That's masterclass.com slash lexpod.

0
💬 0

261.189 - 287.309 Lex Fridman

This episode is also brought to you by Shopify, a platform designed for anyone to sell anywhere with a great-looking online store, or simple-looking online store, like the one I put together at lexfreeman.com/.store. I have a few shirts on there in case you're interested. And speaking of shirts, I'm reminded of thrift stores, which I very much loved for a long time. I still love thrift stores.

0
💬 0

288.51 - 310.827 Lex Fridman

Or a nice place to get stuff. Like, I don't know, kitchen stuff and clothing. And the kind of clothing you get at thrift stores is actually pretty interesting because there's shirts there that are just unlike anything else you would get anywhere else. So if you're sort of selective, and creative-minded, there's a lot of interesting fashion that's there.

0
💬 0

311.207 - 330.309 Lex Fridman

And in terms of t-shirts, there's just like hilarious t-shirts. T-shirts that are very far away from the kind of trajectories you have taken in life, or are not, but you just haven't thought about it. Like a band that you love, but you never would have thought to wear their t-shirt. Anyway, a little bit, I think of Shopify as the internet's thrift store.

0
💬 0

332.391 - 354.69 Lex Fridman

Of course, you can do super classy, you can do super fancy, or you can do super thrift. All of it is possible. Sign up for a $1 per month trial period at Shopify.com slash Lex. That's all lowercase. Go to Shopify.com slash Lex to take your business to the next level today. This episode is also brought to you by NetSuite, an all-in-one cloud business management system.

0
💬 0

356.593 - 379.195 Lex Fridman

Sometimes I think that NetSuite is supporting this podcast because they're trolling me. They're saying, hey Lex, aren't you doing a little too much talking? Maybe you should be building more. I agree with you, NetSuite. I agree with you. And so every time I do an ad read for NetSuite, it is a chance for me to confront my Jungian shadow.

0
💬 0

379.776 - 401.537 Lex Fridman

Some of the demons emerge from the subconscious and ask questions that I don't have answers to. Questions about one's mortality and that life is short and that one of the most fulfilling things in life is to have a family and kids and all of these things I would very much like to have. And also the reality that I love programming and I love building

0
💬 0

402.277 - 426.339 Lex Fridman

I love creating cool things that people can use and share and that would make their life better. All of that. Of course, I also love listening to podcasts. And I kind of think of this podcast as me listening to a podcast where I can also maybe participate by asking questions. So all these things that you love, but you ask the hard question of like, okay, well, life is slipping away. It's short.

0
💬 0

426.899 - 446.348 Lex Fridman

It really, really is short. What do you want to do with the rest of the minutes and the hours that make up your life? Yeah, so thank you for the existential crisis, Nasweet. I appreciate it. If you're running a business, if you have taken the leap into the unknown and started a company, then you should be using the right tools to manage that company.

0
💬 0

447.108 - 466.976 Lex Fridman

In fact, over 37,000 companies have upgraded to NetSuite. Take advantage of NetSuite's flexible financing plan at netsuite.com slash lex. That's netsuite.com slash lex. This episode is also brought to you by the delicious, the delicious AG1. It's an all-in-one daily drink to support better health and peak performance.

0
💬 0

467.016 - 479.72 Lex Fridman

It's basically a super awesome multivitamin that makes me feel like I have my life together. Even when everything else feels like it's falling apart, at least I have AG1. At least I have that nutritional foundation to my life.

0
💬 0

480.424 - 508.794 Lex Fridman

So all the fasting I'm doing, all the carnivore diets, all the physical endurance events and the mental madness of staying up all night or just the stress of certain things I'm going through, all of that, AG1 is there. At least I have the vitamins. Also, I sometimes wonder, they used to be called Athletic Greens, and now they're called AG1. I always wonder, is AG2 coming? Why is it just one?

0
💬 0

509.234 - 539.526 Lex Fridman

It's an interesting branding decision, like AG1. Me as an OCD kind of programmer type, it's like, okay, is this a versioning thing? Is this like AG 0.1 alpha? When's the final release? Anyway, the thing I like to say and to consume is AG1. They'll give you one month supply of fish oil when you sign up at drinkag1.com. This is the Lex Friedman Podcast.

0
💬 0

539.647 - 576.162 Lex Fridman

To support it, please check out our sponsors in the description. And now, dear friends, here's Michael, Swale, Arvid, and Aman. All right, this is awesome. We have Michael, Aman, Swale, Arvid here from the Cursor team. First up, big ridiculous question. What's the point of a code editor?

0
💬 0

576.562 - 597.247 Michael Truell

So the code editor is largely the place where you build software. And today, or for a long time, that's meant the place where you text edit a formal programming language. And for people who aren't programmers, the way to think of a code editor is like a really souped-up word processor for programmers, where the reason it's souped up is code has a lot of structure.

0
💬 0

597.887 - 608.07 Michael Truell

And so the quote-unquote word processor, the code editor, can actually do a lot for you that word processors sort of in the writing space haven't been able to do for people editing text there.

0
💬 0

608.67 - 626.315 Michael Truell

And so that's everything from giving you visual differentiation of the actual tokens in the code so you can scan it quickly, to letting you navigate around the code base, sort of like you're navigating around the internet with hyperlinks. You're going to sort of definitions of things you're using, to error checking to catch rudimentary bugs.

0
💬 0

628.716 - 645.163 Michael Truell

And so traditionally, that's what a code editor has meant. And I think that what a code editor is is going to change a lot over the next 10 years as what it means to build software maybe starts to look a bit different. I think also a code editor should just be fun.

0
💬 0

646.01 - 668.494 Arvid Lunnemark

Yes, that is very important. That is very important. And it's actually sort of an underrated aspect of how we decide what to build. Like a lot of the things that we build and then we try them out, we do an experiment and then we actually throw them out because they're not fun. And so a big part of being fun is like being fast a lot of the time. Fast is fun.

0
💬 0

669.11 - 673.031 Lex Fridman

Yeah, that should be a t-shirt.

0
💬 0

674.612 - 695.378 Michael Truell

Like fundamentally, I think one of the things that draws a lot of people to building stuff on computers is this like insane integration speed where, you know, in other disciplines, you might be sort of gatecapped by resources or the ability, even the ability, you know, to get a large group together and coding is this like amazing thing where it's you and the computer and that alone, you can build really cool stuff really quickly.

0
💬 0

696.212 - 719.241 Lex Fridman

So for people who don't know, Cursor is this super cool new editor that's a fork of VS Code. It would be interesting to get your kind of explanation of your own journey of editors. I think all of you were big fans of VS Code with Copilot. How did you arrive to VS Code and how did that lead to your journey with Cursor?

0
💬 0

719.561 - 725.755 Aman Sanger

Yeah, so... I think a lot of us, well, all of us were originally Vim users.

0
💬 0

726.335 - 726.995 Sualeh Asif

Pure Vim.

0
💬 0

727.015 - 756.428 Aman Sanger

Pure Vim, yeah. No NeoVim, just pure Vim in a terminal. And at least for myself, it was around the time that Copilot came out. So 2021. that I really wanted to try it. So I went into VS Code, the only platform, the only code editor in which it was available. And even though I really enjoyed using Vim, just the experience of Copilot with VS Code was more than good enough to convince me to switch.

0
💬 0

757.209 - 760.352 Aman Sanger

And so that kind of was the default until we started working on Cursor.

0
💬 0

761.098 - 782.843 Lex Fridman

And maybe we should explain what QuotePilot does. It's like a really nice autocomplete. It suggests, as you start writing a thing, it suggests one or two or three lines how to complete the thing. And there's a fun experience in that, you know, like when you have a close friendship and your friend completes your sentences? Like when it's done well, there's an intimate feeling.

0
💬 0

782.863 - 801.08 Lex Fridman

There's probably a better word than intimate, but there's a cool feeling of like, holy shit. It gets me. And then there's an unpleasant feeling when it doesn't get you. And so there's that kind of friction. But I would say for a lot of people, the feeling that it gets me overpowers that it doesn't.

0
💬 0

801.499 - 815.792 Arvid Lunnemark

And I think actually one of the underrated aspects of GitHub Copilot is that even when it's wrong, it's like a little bit annoying, but it's not that bad because you just type another character and then maybe then it gets you or you type another character and then it gets you. So even when it's wrong, it's not that bad.

0
💬 0

815.812 - 827.622 Sualeh Asif

Yeah, you can sort of iterate and fix it. I mean, the other underrated part of Copilot for me sort of was just the first real AI product. So the first language model consumer product.

0
💬 0

827.822 - 835.649 Lex Fridman

So Copile was kind of like the first killer app for LLMs. Yeah. And like the beta was out in 2021. Right.

0
💬 0

836.109 - 855.406 Michael Truell

Okay. So what's the origin story of Cursor? So around 2020, the scaling loss papers came out from OpenAI. And that was a moment where this looked like clear, predictable progress for the field, where even if we didn't have any more ideas, it looked like you could make these models a lot better if you had more compute and more data.

0
💬 0

856.076 - 871.871 Lex Fridman

By the way, we'll probably talk for three to four hours on the topic of scaling laws. Just to summarize, it's a paper and a set of papers and a set of ideas that say bigger might be better for model size and data size in the realm of machine learning.

0
💬 0

872.111 - 876.715 Sualeh Asif

It's bigger and better, but predictably better. That's another topic of conversation.

0
💬 0

877.676 - 890.485 Michael Truell

So around that time, for some of us, there were a lot of conceptual conversations about what's this going to look like? What's the story going to be for all these different knowledge worker fields about how they're going to be made better by this technology getting better?

0
💬 0

891.519 - 910.512 Michael Truell

And then I think there were a couple of moments where the theoretical gains predicted in that paper started to feel really concrete. And it started to feel like a moment where you could actually go and not do a PhD if you wanted to work on, do useful work in AI. Actually felt like now there was this whole set of systems one could build that were really useful.

0
💬 0

911.192 - 931.172 Michael Truell

And I think that the first moment we already talked about a little bit, which was playing with the early bit of Copilot, that was awesome and magical. I think that the next big moment where everything kind of clicked together was actually getting early access to GPT-4. So it was sort of end of 2022 was when we were tinkering with that model. And the step up in capabilities felt enormous.

0
💬 0

932.154 - 946.764 Michael Truell

And previous to that, we had been working on a couple of different projects. We had been because of Copilot, because of scaling Oz, because of our prior interest in the technology, we had been tinkering around with tools for programmers, but things that are like very specific.

0
💬 0

946.844 - 966.794 Michael Truell

So, you know, we were building tools for financial professionals who have to work within a Jupyter notebook or like, you know, playing around with, can you do static analysis with these models? And then the stuff up in GPT-4 felt like, look, that really made concrete the theoretical gains that we had predicted before. Felt like you could build a lot more just immediately at that point in time.

0
💬 0

967.354 - 985.664 Michael Truell

And also, if we were being consistent, it really felt like this wasn't just going to be a point solution thing. This was going to be all of programming was going to flow through these models. And it felt like that demanded a different type of programming environment, a different type of programming. And so we set off to build that sort of larger vision around that.

0
💬 0

986.224 - 1005.156 Sualeh Asif

There's one that I distinctly remember. So my roommate is an IML Gold winner, and there's a competition in the U.S. called the Putnam, which is sort of the IML for college people, and it's this math competition. He's exceptionally good. So Sheng Tong and Aman, I remember, it's sort of June of 2022.

0
💬 0

1008.098 - 1017.286 Sualeh Asif

had this bet on whether 2024, June or July, you were going to win a gold medal in the IMO with models.

0
💬 0

1017.626 - 1019.348 Lex Fridman

IMO is International Math Olympiad.

0
💬 0

1019.922 - 1041.861 Sualeh Asif

Yeah, I was International Math Olympiad. And so Arvid and I are both, you know, also competed in it. So it was sort of personal. And I remember thinking, Matt, this is not going to happen. This was like, even though I sort of believed in progress, I thought... you know, I'm a girl just like a modest, just delusional.

0
💬 0

1042.281 - 1051.287 Sualeh Asif

That was the, that was the, and to be honest, I mean, I, I was to be clear, very wrong, but that was maybe the most prescient bet in the group.

0
💬 0

1051.724 - 1055.847 Lex Fridman

So the new results from DeepMind, it turned out that you were correct.

0
💬 0

1056.467 - 1058.949 Arvid Lunnemark

That's what the- Well, it was technically not.

0
💬 0

1058.989 - 1071.478 Michael Truell

Technically incorrect, but one point away. Amon was very enthusiastic about this stuff. Yeah. And before, Amon had this, like, Scaling Laws t-shirt that he would walk around with, where it had the, like- charts and like the formulas on it.

0
💬 0

1071.618 - 1074.941 Lex Fridman

So you like felt the AGI or you felt the scaling?

0
💬 0

1074.981 - 1095.698 Aman Sanger

Yeah, I distinctly remember there was this one conversation I had with Michael where before I hadn't thought super deeply and critically about scaling laws. And he kind of posed the question, why isn't scaling all you need or why isn't scaling going to result in massive gains in progress? And I think I went through like the stages of grief.

0
💬 0

1095.798 - 1114.427 Aman Sanger

There is anger, denial, and then finally at the end, just thinking about it, acceptance. And I think I've been quite hopeful and optimistic about progress since. I think one thing I'll caveat is I think it also depends on like which domains you're going to see progress.

0
💬 0

1114.487 - 1133.207 Aman Sanger

Like math is a great domain because especially like formal theorem proving because you get this fantastic signal of actually verifying if the thing was correct. And so this means something like RL can work really, really well. And I think like you could have systems that are perhaps very superhuman in math and still not technically have AGI.

0
💬 0

1134.068 - 1163.728 Lex Fridman

Okay, so can we take it all the way to Cursor? And what is Cursor? It's a fork of VS Code. And VS Code is one of the most popular editors for a long time. Everybody fell in love with it. Everybody left Vim. I left Emacs for it. Sorry. So it unified in some fundamental way the developer community. And then you look at the space of things. You look at the scaling laws. AI is becoming amazing.

0
💬 0

1164.809 - 1188.164 Lex Fridman

And you decided, okay, it's not enough to just write an extension for your VS Code because there's a lot of limitations to that. If AI is going to keep getting better and better and better, we need to really rethink how the AI is going to be part of the editing process. And so you decided to fork VS Code and start to build a lot of the amazing features we'll be able to talk about.

0
💬 0

1188.484 - 1193.167 Lex Fridman

But what was that decision like? Because there's a lot of extensions, including Copilot.

0
💬 0

1194.108 - 1219.214 Michael Truell

of vs code that are doing sort of ai type stuff what was the decision like to just fork vs code so the decision to do an editor seemed kind of self-evident to us for at least what we wanted to do and achieve because when we started working on the editor the idea was these models are going to get much better their capabilities are going to improve and it's going to entirely change how you build software both in a you will have big productivity gains but also radical and not like the act of building software is going to change a lot

0
💬 0

1220.253 - 1233.984 Michael Truell

And so you're very limited in the control you have over a code editor if you're a plugin to an existing coding environment. And we didn't want to get locked in by those limitations. We wanted to be able to just build the most useful stuff.

0
💬 0

1234.504 - 1246.034 Lex Fridman

Okay, well then the natural question is, you know, VS Code is kind of with Copilot a competitor. So how do you win? Is it basically just the speed and the quality of the features?

0
💬 0

1246.614 - 1260.019 Aman Sanger

Yeah, I mean, I think this is a space that is quite interesting, perhaps quite unique, where if you look at previous tech waves, maybe there's kind of one major thing that happened and it unlocked a new wave of companies.

0
💬 0

1260.54 - 1282.072 Aman Sanger

But every single year, every single model capability or jump you get in model capabilities, you now unlock this new wave of features, things that are possible, especially in programming. And so I think in AI programming, being even just a few months ahead, let alone a year ahead, makes your product much, much, much more useful.

0
💬 0

1282.092 - 1301.04 Aman Sanger

I think the cursor a year from now will need to make the cursor of today look obsolete. And I think, you know, Microsoft has done a number of like fantastic things, but I don't think they're in a great place to really keep innovating and pushing on this in the way that a startup can. Just rapidly implementing features.

0
💬 0

1302.185 - 1310.169 Aman Sanger

And kind of doing the research experimentation necessary to really push the ceiling.

0
💬 0

1310.409 - 1336.462 Sualeh Asif

I don't know if I think of it in terms of features as I think of it in terms of capabilities for programmers. It's that as the new one model came out, and I'm sure there are going to be more models of different types, like longer context and maybe faster. There's all these... crazy ideas that you can try. And hopefully 10% of the crazy ideas will make it into something kind of cool and useful.

0
💬 0

1337.143 - 1360.779 Sualeh Asif

And we want people to have that sooner. To rephrase, it's like an underrated fact is we're making it for ourself. When we started Cursor, you really felt this frustration that, you know, models, you could see models getting better. But the COBOL experience had not changed. It was like, man, these guys, the ceiling is getting higher. Why are they not making new things?

0
💬 0

1361.22 - 1381.333 Sualeh Asif

They should be making new things. Where's all the alpha features? There were no alpha features. It was like... I'm sure it was selling well. I'm sure it was a great business, but it didn't feel, I'm one of these people that really want to try and use new things. And it was just, there's no new thing for like a very long while.

0
💬 0

1381.693 - 1391.916 Lex Fridman

Yeah, it's interesting. I don't know how you put that into words, but when you compare Cursor with Copilot, Copilot pretty quickly became, started to feel stale for some reason.

0
💬 0

1392.096 - 1415.543 Arvid Lunnemark

Yeah, I think one thing that I think helps us is that we're sort of doing it all in one, where we're developing the UX and the way you interact with the model at the same time as we're developing how we actually make the model give better answers. So we're like... how you build up the prompt or like how do you find the context and for a cursor tab, like how do you train the model?

0
💬 0

1416.684 - 1423.27 Arvid Lunnemark

So I think that helps us to have all of it like sort of like the same people working on the entire experience end-to-end.

0
💬 0

1423.755 - 1429.921 Sualeh Asif

Yeah, it's like the person making the UI and the person training the model sit 18 feet away.

0
💬 0

1430.682 - 1431.763 Aman Sanger

Often the same person even.

0
💬 0

1432.103 - 1440.472 Sualeh Asif

Yeah, often even the same person. You can create things that are sort of not possible if you're not talking, you're not experimenting.

0
💬 0

1440.692 - 1458.273 Lex Fridman

And you're using, like you said, Cursor to write Cursor. Of course. Oh, yeah. Well, let's talk about some of these features. Let's talk about the all-knowing, the all-powerful, praise B to the tab. You know, autocomplete on steroids. Basically. So how does tab work?

0
💬 0

1458.313 - 1480.751 Michael Truell

What is tab? To highlight and summarize at a high level, I'd say that there are two things that Cursor is pretty good at right now. There are other things that it does. But two things that it helps programmers with. One is this idea of looking over your shoulder and being like a really fast colleague who can kind of jump ahead of you and type and figure out what you're gonna do next.

0
💬 0

1481.511 - 1499.3 Michael Truell

And that was the original idea behind, that was kind of the kernel of the idea behind a good autocomplete was predicting what you're gonna do next. But you can make that concept even more ambitious by not just predicting the characters after your cursor, but actually predicting the next entire change you're gonna make, the next diff, next place you're gonna jump to.

0
💬 0

1501.66 - 1520.025 Michael Truell

And the second thing Kirscher is pretty good at right now, too, is helping you sometimes jump ahead of the AI and tell it what to do and go from instructions to code. And on both of those, we've done a lot of work on making the editing experience for those things ergonomic and also making those things smart and fast.

0
💬 0

1520.936 - 1542.316 Sualeh Asif

One of the things we really wanted was we wanted the model to be able to edit code for us. That was kind of a wish, and we had multiple attempts at it before we had a good model that could edit code for you. Then after we had a good model, I think there'd been a lot of effort to make the inference fast for having a good experience.

0
💬 0

1545.539 - 1563.711 Sualeh Asif

And we've been starting to incorporate, I mean, Michael sort of mentioned this, like, ability to jump to different places. And that jump to different places, I think, came from a feeling of, you know, once you accept an edit, it's like, man, it should be just really obvious where to go next.

0
💬 0

1564.172 - 1574.979 Sualeh Asif

It's like, I made this change, the model should just know that, like, the next place to go to is, like, 18 lines down. Like, if you're a WIM user, you could press 1, 8, JJ, or whatever.

0
💬 0

1576.32 - 1598.532 Sualeh Asif

but like why why even why am i doing this like the model the model should just know it and then so so the idea was you just press tab it would go 18 lines down and then make it show you show you the next edit and you would press tab so it's just you as long as you could keep pressing tab and so the internal competition was how many tabs can we make someone press it once you have like the idea uh more more uh sort of

0
💬 0

1599.492 - 1626.163 Sualeh Asif

abstractly, the thing to think about is how are the edits zero entropy? So once you've expressed your intent and the edit is... There's no new bits of information to finish your thought, but you still have to type some characters to make the computer understand what you're actually thinking. Then maybe the model should just read your mind and all the zero entropy bits should just be tabbed away.

0
💬 0

1626.983 - 1628.804 Sualeh Asif

Yeah, that was that was sort of the abstract.

0
💬 0

1628.824 - 1646.796 Aman Sanger

There's this interesting thing where if you look at language model loss on different domains, I believe the bits per byte, which is kind of character normalized loss for code is lower than language, which means in general, there are a lot of tokens in code that are super predictable, a lot of characters that are super predictable.

0
💬 0

1647.436 - 1668.153 Aman Sanger

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

0
💬 0

1668.733 - 1677.632 Lex Fridman

Well, what's the intuition and what's the technical details of how to do next cursor prediction? That jump. That's not so intuitive, I think, to people.

0
💬 0

1677.853 - 1700.924 Aman Sanger

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

0
💬 0

1701.385 - 1718.597 Aman Sanger

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

0
💬 0

1719.618 - 1726.044 Aman Sanger

These are two, I think, important pieces of what make it quite high quality and very fast.

0
💬 0

1726.704 - 1737.352 Lex Fridman

Okay, so MOE, mixture of experts, the input is huge, the output is small. Okay, so what else can you say about how to make, does caching play a role in this particular?

0
💬 0

1737.552 - 1763.046 Aman Sanger

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

0
💬 0

1763.526 - 1769.889 Aman Sanger

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

0
💬 0

1771.042 - 1790.377 Lex Fridman

Again, what are the things that tab is supposed to be able to do kind of in the near term? Just to like sort of linger on that. Generate code, like fill empty space, also edit code across multiple lines. Yeah. And then jump to different locations inside the same file.

0
💬 0

1790.657 - 1801.384 Sualeh Asif

Yeah. And then like launch. Hopefully jump to different files also. So if you make an edit in one file and... Maybe you have to go to another file to finish your thought. It should go to the second file also.

0
💬 0

1801.404 - 1825.923 Arvid Lunnemark

The full generalization is like next action prediction. Sometimes you need to run a command in the terminal, and it should be able to suggest the command based on the code that you wrote to. Or sometimes you actually need to... Like it suggests something, but it's hard for you to know if it's correct because you actually need some more information to learn.

0
💬 0

1826.024 - 1838.956 Arvid Lunnemark

Like you need to know the type to be able to verify that it's correct. And so maybe it should actually take you to a place that's like the definition of something and then take you back so that you have all the requisite knowledge to be able to accept the next completion.

0
💬 0

1839.577 - 1853.786 Lex Fridman

So providing the human the knowledge. Yes. Right. Can you integrate, like, I just got to know a guy named PrimeGen who I believe has an SS, you can order coffee via SSH.

0
💬 0

1855.046 - 1857.507 Sualeh Asif

Oh, yeah. Oh, we did that. We did that.

0
💬 0

1857.567 - 1865.588 Lex Fridman

So can also the model do that, like feed you and provide you with caffeine? Okay, so that's the general framework.

0
💬 0

1865.608 - 1880.255 Michael Truell

Yeah, and the magic moment would be if... it is programming is this weird discipline where sometimes the next five minutes, not always, but sometimes the next five minutes, what you're going to do is actually predictable from the stuff you've done recently.

0
💬 0

1880.756 - 1894.842 Michael Truell

And so can you get to a world where that next five minutes either happens by you disengaging and it taking you through, or maybe a little bit more of just you seeing next step, what it's going to do. And you're like, okay, that's good. That's good. That's good. That's good. And you can just sort of tap, tap, tap through these big changes.

0
💬 0

1895.422 - 1915.542 Lex Fridman

As we're talking about this, I should mention that one of the really cool and noticeable things about cursor is that there's this whole diff interface situation going on. So like the model suggests with the red and the green of like, here's how we're going to modify the code. And in the chat window, you can apply and it shows you the diff and you can accept the diff.

0
💬 0

1916.282 - 1918.485 Lex Fridman

So maybe can you speak to whatever direction of that?

0
💬 0

1919.042 - 1939.408 Sualeh Asif

We'll probably have like four or five different kinds of diffs. So we have optimized the diff for the autocomplete. So that has a different diff interface than when you're reviewing larger blocks of code. And then we're trying to optimize another diff thing for when you're doing multiple different files.

0
💬 0

1940.949 - 1961.726 Sualeh Asif

And sort of at a high level, the difference is for when you're doing autocomplete, it should be really, really fast to read. Actually, it should be really fast to read in all situations. But in autocomplete, it's sort of, you're really like your eyes focused in one area. You can't be in too many, the humans can't look in too many different places.

0
💬 0

1961.766 - 1963.487 Lex Fridman

So you're talking about on the interface side?

0
💬 0

1963.527 - 1977.517 Sualeh Asif

On the interface side. So it currently has this box on the side. So we have the current box. And if it tries to delete code in some place and tries to add other code, it tries to show you a box on the side. You can maybe show it if we pull it up on cursor.com.

0
💬 0

1977.997 - 1979.018 Aman Sanger

This is what we're talking about.

0
💬 0

1979.858 - 2005.441 Sualeh Asif

So that, that box, it was like three or four different attempts at trying to make this, this thing work where first the attempt was like these blue crossed out lines. So before it was a box on the side, it used to show you the code to delete by showing you like, uh, like Google doc style, you would see like a line through it. Then you would see the new code. That was super distracting.

0
💬 0

2006.361 - 2031.541 Sualeh Asif

And then we tried many different, you know, there was sort of deletions, there was trying to red highlight. Then the next iteration of it, which is sort of funny, you would hold on Mac the option button. So it would sort of highlight a region of code to show you that there might be something coming. So maybe in this example, like the input and the value would all get blue.

0
💬 0

2033.51 - 2053.199 Sualeh Asif

And the blue was to highlight that the AI had a suggestion for you. So instead of directly showing you the thing, it would show you that the AI, it would just hint that the AI had a suggestion. And if you really wanted to see it, you would hold the option button and then you would see the new suggestion. Then if you release the option button, you would then see your original code.

0
💬 0

2053.399 - 2058 Lex Fridman

Mm-hmm. So that's, by the way, that's pretty nice, but you have to know to hold the option button.

0
💬 0

2058.52 - 2058.9 Sualeh Asif

Yeah.

0
💬 0

2058.92 - 2064.821 Lex Fridman

By the way, I'm not a Mac user, but I got it. It's a button, I guess, you people have.

0
💬 0

2066.662 - 2069.823 Sualeh Asif

Again, it's just non-intuitive. I think that's the key thing.

0
💬 0

2070.243 - 2072.743 Aman Sanger

And there's a chance this is also not the final version of it.

0
💬 0

2073.383 - 2102.759 Arvid Lunnemark

I am personally very excited for... making a lot of improvements in this area. We often talk about it as the verification problem, where these diffs are great for small edits. For large edits, or when it's multiple files or something, it's actually a little bit prohibitive to review these diffs. And so there are a couple of different ideas here.

0
💬 0

2103.14 - 2123.138 Arvid Lunnemark

One idea that we have is, okay, parts of the diffs are important. They have a lot of information. And then parts of the diff are just very low entropy. They're the same thing over and over again. And so maybe you can highlight the important pieces and then gray out the not so important pieces. Or maybe you can have a model that

0
💬 0

2124.038 - 2137.425 Arvid Lunnemark

looks at the diff and sees, oh, there's a likely bug here, I will mark this with a little red squiggly and say, you should probably review this part of the diff. And ideas in that vein, I think, are exciting.

0
💬 0

2137.879 - 2149.566 Lex Fridman

Yeah, that's a really fascinating space of UX design engineering. So you're basically trying to guide the human programmer through all the things they need to read and nothing more.

0
💬 0

2150.246 - 2150.406 Arvid Lunnemark

Yeah.

0
💬 0

2150.446 - 2151.227 Lex Fridman

Like optimally.

0
💬 0

2151.727 - 2173.319 Arvid Lunnemark

Yeah, and you want an intelligent model to do it. Like currently, diff algorithms are, they're like... like they're just like normal algorithms. There is no intelligence. There's like intelligence that went into designing the algorithm, but then there's no, like, you don't care if it's about this thing or this thing, as you want a model to do this.

0
💬 0

2173.66 - 2195.051 Sualeh Asif

So I think the general question is like, Matt, these models are going to get much smarter. As the models get much smarter, the changes they will be able to propose are much bigger. So as the changes gets bigger and bigger and bigger, the humans have to do more and more and more verification work. It gets more and more and more hard. Like it's just, you need to help them out.

0
💬 0

2195.551 - 2198.614 Sualeh Asif

It's sort of, I don't want to spend all my time reviewing code.

0
💬 0

2201.878 - 2205.702 Lex Fridman

Can you say a little more across multiple files, Div?

0
💬 0

2206.451 - 2227.85 Aman Sanger

Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...

0
💬 0

2230.359 - 2244.927 Aman Sanger

it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.

0
💬 0

2247.762 - 2268.486 Aman Sanger

I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.

0
💬 0

2268.506 - 2289.259 Aman Sanger

And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.

0
💬 0

2289.519 - 2308.921 Arvid Lunnemark

Just one idea there is I think ordering matters. Generally, when you review a PR, you have this list of files and you're reviewing them from top to bottom. But actually, you actually want to understand this part first, because that came logically first. And then you want to understand the next part. And you don't want to have to figure out that yourself.

0
💬 0

2309.101 - 2311.464 Arvid Lunnemark

You want a model to guide you through the thing.

0
💬 0

2312.714 - 2318.477 Lex Fridman

And is the step of creation going to be more and more natural language is the goal versus with actual writing?

0
💬 0

2318.537 - 2338.866 Arvid Lunnemark

I think sometimes. I don't think it's going to be the case that all of programming will be natural language. And the reason for that is, you know, if I'm pair programming with Swala and Swala is at the computer and the keyboard. And sometimes, if I'm driving, I want to say to Swallow, hey, implement this function. And that works.

0
💬 0

2339.446 - 2355.812 Arvid Lunnemark

And then sometimes it's just so annoying to explain to Swallow what I want him to do. And so I actually take over the keyboard and I show him. I write part of the example. And then... it makes sense. And that's the easiest way to communicate. And so I think that's also the case for AI.

0
💬 0

2356.072 - 2373.817 Arvid Lunnemark

Sometimes the easiest way to communicate with the AI will be to show an example, and then it goes and does the thing everywhere else. Or sometimes if you're making a website, for example, the easiest way to show to the AI what you want is not to tell it what to do, but drag things around or draw things. And Yeah.

0
💬 0

2373.918 - 2386.267 Arvid Lunnemark

And like maybe eventually we will get to like brain machine interfaces or whatever and kind of like understand what you're thinking. And so I think natural language will have a place. I think it will not definitely not be the way most people program most of the time.

0
💬 0

2386.287 - 2396.733 Lex Fridman

I'm really feeling the AGI with this editor. It feels like there's a lot of machine learning going on underneath. Tell me about some of the ML stuff that makes it all work.

0
💬 0

2397.554 - 2416.111 Aman Sanger

Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.

0
💬 0

2416.741 - 2443.337 Aman Sanger

The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.

0
💬 0

2444.238 - 2468.26 Aman Sanger

You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.

0
💬 0

2468.86 - 2485.659 Lex Fridman

And we should say that apply is the model looks at your code. It gives you a really damn good suggestion of what new things to do. And the seemingly for humans trivial step of Combining the two you're saying is not so trivial.

0
💬 0

2485.759 - 2489.22 Sualeh Asif

Contrary to popular perception. It is not a deterministic algorithm.

0
💬 0

2489.501 - 2514.734 Aman Sanger

Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.

0
💬 0

2515.735 - 2540.309 Aman Sanger

So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.

0
💬 0

2540.649 - 2562.701 Aman Sanger

And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.

0
💬 0

2563.002 - 2565.884 Sualeh Asif

Maybe we should talk about how to make it fast. Yeah.

0
💬 0

2565.904 - 2568.687 Aman Sanger

Fast is always an interesting detail. Fast is good.

0
💬 0

2569.147 - 2570.689 Lex Fridman

Yeah. How do you make it fast?

0
💬 0

2571.465 - 2596.622 Aman Sanger

Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.

0
💬 0

2596.642 - 2613.591 Aman Sanger

If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.

0
💬 0

2615.752 - 2635.521 Aman Sanger

So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.

0
💬 0

2635.541 - 2653.465 Aman Sanger

So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.

0
💬 0

2654.085 - 2675.29 Aman Sanger

where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.

0
💬 0

2675.33 - 2687.797 Aman Sanger

So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.

0
💬 0

2688.277 - 2701.851 Sualeh Asif

And then the advantage is that while it's streaming, you can just also start reviewing the code before it's done, so there's no big loading screen. So maybe that is part of the advantage.

0
💬 0

2702.812 - 2705.275 Lex Fridman

So the human can start reading before the thing is done.

0
💬 0

2705.871 - 2719.991 Sualeh Asif

I think the interesting riff here is something like, like speculation is a fairly common idea nowadays. It's like not only in language models, I mean, there's obviously speculation in CPUs and there's like speculation for databases and speculation all over the place.

0
💬 0

2721.021 - 2737.288 Lex Fridman

Let me ask the ridiculous question of which LLM is better at coding. GPT, Claude, who wins in the context of programming? And I'm sure the answer is much more nuanced because it sounds like every single part of this involves a different model.

0
💬 0

2738.408 - 2767.077 Aman Sanger

Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.

0
💬 0

2767.578 - 2786.436 Aman Sanger

Our one's really interesting and it's really good at reasoning. So if you give it really hard, uh, programming interview style problems or lead code problems. It can do quite, quite well on them. But it doesn't feel like it kind of understands your rough intent as well as Sonnet does.

0
💬 0

2787.777 - 2803.588 Aman Sanger

Like, if you look at a lot of the other frontier models, one qualm I have is it feels like they're not necessarily over, I'm not saying they train on benchmarks. But they perform really well in benchmarks relative to kind of everything that's kind of in the middle.

0
💬 0

2804.149 - 2817.515 Aman Sanger

So if you try it in all these benchmarks and things that are in the distribution of the benchmarks they're evaluated on, you know, they'll do really well. But when you push them a little bit outside of that, Sonnet's I think the one that kind of does best at kind of maintaining that same capability.

0
💬 0

2817.795 - 2822.917 Aman Sanger

Like you kind of have the same capability in the benchmark as when you try to instruct it to do anything with coding.

0
💬 0

2823.706 - 2835.249 Lex Fridman

What, another ridiculous question, is the difference between the normal programming experience versus what benchmarks represent? Like where do benchmarks fall short, do you think, when we're evaluating these models?

0
💬 0

2835.609 - 2862.052 Sualeh Asif

By the way, that's like a really, really hard, it's like critically important detail, like how different like benchmarks are versus like real coding. Where real coding, it's not interview style coding. It's you're doing these, You know, humans are saying, like, half-broken English sometimes, and sometimes you're saying, like, oh, do what I did before. Sometimes you're saying...

0
💬 0

2863.817 - 2887.052 Sualeh Asif

you know, go add this thing and then do this other thing for me and then make this UI element. And then, you know, it's just like a lot of things are sort of context dependent. You really want to like understand the human and then do what the human wants as opposed to sort of this, maybe the way to put it is sort of abstractly is the interview problems are very well-specified.

0
💬 0

2888.59 - 2896.316 Sualeh Asif

they lean a lot on specification while the human stuff is less specified. Yeah.

0
💬 0

2896.956 - 2916.594 Michael Truell

I think that this benchmark question is both complicated by what Svali just mentioned, and then also to... What Aman was getting into is that even if you like, you know, there's this problem of like the skew between what can you actually model in a benchmark versus real programming. And that can be sometimes hard to encapsulate because it's like real programming is like very messy.

0
💬 0

2916.654 - 2934.043 Michael Truell

And sometimes things aren't super well specified what's correct or what isn't. But then it's also doubly hard because of this public benchmark problem. And that's both because public benchmarks are sometimes kind of hill-climbed on, but then it's really, really hard to also get the data from the public benchmarks out of the models.

0
💬 0

2934.684 - 2954.71 Michael Truell

And so, for instance, one of the most popular agent benchmarks, SweetBench, is really, really contaminated in the training data of these foundation models. And so if you ask these foundation models to do a sweet bench problem, but you actually don't give them the context of a code base, they can like hallucinate the right file pass, they can hallucinate the right function names.

0
💬 0

2955.27 - 2959.111 Michael Truell

And so it's also just the public aspect of these things is tricky.

0
💬 0

2959.731 - 2980.236 Aman Sanger

Yeah, like in that case, it could be trained on the literal issues or pull requests themselves. And maybe the labs will start to do a better job, or they've already done a good job at decontaminating those things. But they're not going to emit the actual training data of the repository itself. Like these are all like some of the most popular Python repositories, like SymPy is one example.

0
💬 0

2981.096 - 2990.223 Aman Sanger

I don't think they're going to handicap their models on SymPy and all these popular Python repositories in order to get true evaluation scores in these benchmarks.

0
💬 0

2990.543 - 3009.818 Michael Truell

I think that given the dearths and benchmarks, there have been a few interesting crutches that places that build systems with these models or build these models actually use to get a sense of are they going in the right direction or not. And in a lot of places, people will actually just have humans play with the things and give qualitative feedback on these things.

0
💬 0

3010.599 - 3025.609 Michael Truell

like one or two of the foundation model companies, they have people who, that's a big part of their role. And, you know, internally we also, you know, qualitatively assess these models and actually lean on that a lot in addition to like private evals that we have. It's like the vibe. The vibe, yeah. It's like the vibe.

0
💬 0

3026.269 - 3051.487 Lex Fridman

The vibe benchmark, human benchmark. Yeah. You pull in the humans to do a vibe check. Yeah. Okay. I mean, that's kind of what I do, like just like reading online forums and Reddit and X, just like, Well, I don't know how to properly load in people's opinions because they'll say things like, I feel like Claude or GPT has gotten dumber or something.

0
💬 0

3051.507 - 3059.533 Lex Fridman

They'll say, I feel like, and then I sometimes feel like that too, but I wonder if it's the model's problem or mine.

0
💬 0

3060.356 - 3086.808 Aman Sanger

Yeah, with Claude, there's an interesting take I heard where I think AWS has different chips. And I suspect they have slightly different numerics than NVIDIA GPUs. And someone speculated that Claude's degraded performance had to do with maybe using the quantized version that existed on AWS Bedrock versus whatever was running on Anthropix GPUs.

0
💬 0

3087.128 - 3092.131 Lex Fridman

I interview a bunch of people that have conspiracy theories, so I'm glad you spoke to this conspiracy theory.

0
💬 0

3092.151 - 3114.044 Sualeh Asif

Well, it's not like conspiracy theory as much. They're just, they're like, they're, you know, humans, humans are humans and there's, there's these details and, you know, you're doing like this crazy amount of flops and, you know, chips are messy and man, you can just have bugs. Like bugs are, it's, it's hard to overstate how hard bugs are to avoid. Yeah.

0
💬 0

3115.143 - 3136.213 Lex Fridman

What's the role of a good prompt in all of this? We mentioned that benchmarks have really structured, well-formulated prompts. What should a human be doing to maximize success? And what's the importance of what the human, you wrote a blog post on, you called it prompt design.

0
💬 0

3137.15 - 3159.024 Arvid Lunnemark

Yeah, I think it depends on which model you're using. And all of them are slightly different and they respond differently to different prompts. But I think the original GPT-4 and the original sort of pre-double models last year, they were quite sensitive to the prompts. And they also had a very small context window.

0
💬 0

3160.004 - 3177.007 Arvid Lunnemark

And so we have all of these pieces of information around the code base that would maybe be relevant in the prompt. Like you have the docs, you have the files that you add, you have the conversation history. And then there's a problem like how do you decide what you actually put in the prompt and when you have a limited space.

0
💬 0

3177.047 - 3195.436 Arvid Lunnemark

And even for today's models, even when you have long context, filling out the entire context window means that it's slower. It means that sometimes the model actually gets confused and some models get more confused than others. And we have this one system internally that we call pre-empt, which helps us with that a little bit.

0
💬 0

3196.418 - 3228.243 Arvid Lunnemark

And I think it was built for the era before where we had 8,000 token context windows. And it's a little bit similar to when you're making a website. You want it to work on mobile. You want it to work on a desktop screen. And you have this dynamic information, which you don't have, for example, if you're designing a print magazine. You know exactly where you can put stuff.

0
💬 0

3228.483 - 3243.614 Arvid Lunnemark

But when you have a website or when you have a prompt, you have these inputs. And then you need to format them to always work. Even if the input is really big, then you might have to cut something down. And so the idea was, okay, let's take some inspiration. What's the best way to design websites?

0
💬 0

3243.814 - 3272.893 Arvid Lunnemark

Well, the thing that we really like is React and the declarative approach where you use JSX in JavaScript, and then you declare, this is what I want, and I think this has higher priority, or this has higher z-index than something else. And then you have this rendering engine. In web design, it's like Chrome, and in our case, it's a preempt renderer, which then fits everything onto the page.

0
💬 0

3273.534 - 3291.342 Arvid Lunnemark

And as you clearly decide what you want, and then it figures out what you want. And so we have found that to be quite helpful. And I think the role of it has sort of shifted over time, where initially it was to fit to these small context windows. Now it's really useful because it helps us with...

0
💬 0

3292.302 - 3315.097 Arvid Lunnemark

splitting up the data that goes into the prompt and the actual rendering of it and so it's easier to debug because you can change the rendering of the prompt and then try it on old prompts because you have the raw data that went into their prompt and then you can see did my change actually improve it for for like this entire eval set so do you literally prompt with jsx Yes, yes.

0
💬 0

3315.618 - 3331.248 Arvid Lunnemark

So it kind of looks like React. There are components. We have one component that's a file component, and it takes in the cursor. Usually there's one line where the cursor is in your file, and that's probably the most important line because that's the one you're looking at. And so then you can give priorities.

0
💬 0

3331.288 - 3342.976 Arvid Lunnemark

So that line has the highest priority, and then you subtract one for every line that is farther away. And then eventually when it's rendered, it figures out how many lines can actually fit, and it centers around that thing.

0
💬 0

3343.416 - 3356.268 Aman Sanger

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

0
💬 0

3357.269 - 3369.08 Lex Fridman

So should humans, when they ask questions, also try to use something like that? Like, would it be beneficial to write JSX in the problem? Or the whole idea is it should be loose and messy and...

0
💬 0

3369.839 - 3382.974 Arvid Lunnemark

I think our goal is kind of that you should just do whatever is the most natural thing for you. And then we, our job is to figure out how do we actually like retrieve the relative event thing so that your thing actually makes sense.

0
💬 0

3383.257 - 3392.365 Lex Fridman

Well, this is sort of the discussion I had with Arvin of perplexity. It's like his whole idea is like you should let the person be as lazy as he wants.

0
💬 0
0
💬 0

3393.206 - 3409.4 Lex Fridman

But like, yeah, that's a beautiful thing. But I feel like you're allowed to ask more of programmers, right? Yes. So like if you say just do what you want, I mean, humans are lazy. There's a kind of tension between just being lazy versus like provide more as –

0
💬 0

3411.192 - 3425.202 Lex Fridman

be prompted, almost like the system pressuring you or inspiring you to be articulate, not in terms of the grammar of the sentences, but in terms of the depth of thoughts that you convey inside the prompts.

0
💬 0

3425.322 - 3453.039 Aman Sanger

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

0
💬 0

3454.935 - 3463.501 Aman Sanger

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

0
💬 0

3463.521 - 3481.973 Lex Fridman

How hard is it for the model to choose to talk back? Sort of versus generating. It's hard. It's sort of like how to deal with the uncertainty. Do I choose to ask for more information to reduce the ambiguity?

0
💬 0

3482.458 - 3514.938 Sualeh Asif

So, I mean, one of the things we do is, it's like a recent addition, is try to suggest files that you can add. So while you're typing, one can guess what the uncertainty is and maybe suggest that like, you know, maybe you're writing your API And we can guess using the commits that you've made previously in the same file that the client and the server is super useful.

0
💬 0

3515.659 - 3534.292 Sualeh Asif

And there's like a hard technical problem of how do you resolve it across all commits? Which files are the most important given your current prompt? And we're still sort of initial version is ruled out and I'm sure we can make it much more accurate. It's very experimental.

0
💬 0

3534.352 - 3550.464 Sualeh Asif

But then the idea is we show you, do you just want to add this file, this file, this file also to tell the model to edit those files for you? Because if maybe you're making the API, you should also edit the client and the server that is using the API and the other one resolving the API.

0
💬 0

3550.544 - 3560.172 Sualeh Asif

So that'll be kind of cool as both there's the phase where you're writing the prompt and there's before you even click enter, maybe we can help resolve some of the uncertainty.

0
💬 0

3560.714 - 3564.377 Lex Fridman

To what degree do you use agentic approaches? How useful are agents?

0
💬 0

3565.438 - 3596.383 Arvid Lunnemark

We think agents are really, really cool. I think agents is like... It's like it resembles sort of like a human. It's sort of like you can kind of feel that you're getting closer to AGI because you see a demo where it acts as a human would. And it's really, really cool. I think... agents are not yet super useful for many things. I think we're getting close to where they will actually be useful.

0
💬 0

3597.204 - 3620.508 Arvid Lunnemark

And so I think there are certain types of tasks where having an agent would be really nice. I would love to have an agent. For example, we have a bug where you sometimes can't command C and command V inside our chat input box, and that's a task that's super well specified. I just want to say in two sentences, this does not work, please fix it.

0
💬 0

3620.788 - 3628.616 Arvid Lunnemark

And then I would love to have an agent that just goes off, does it, and then a day later I come back and I review the thing.

0
💬 0

3629.196 - 3631.718 Lex Fridman

You mean it goes, finds the right file?

0
💬 0

3631.818 - 3650.814 Arvid Lunnemark

Yeah, it finds the right files, it tries to reproduce the bug, it fixes the bug, and then it verifies that it's correct. And this could be a process that takes a long time. And so I think I would love to have that. And then I think a lot of programming, there is often this belief that agents will take over all of programming.

0
💬 0

3652.555 - 3669.027 Arvid Lunnemark

I don't think we think that that's the case because a lot of programming, a lot of the value is in iterating or you don't actually want to specify something upfront because you don't really know what you want until you've seen an initial version and then you want to iterate on that and then you provide more information.

0
💬 0

3669.467 - 3677.693 Arvid Lunnemark

And so for a lot of programming, I think you actually want a system that's instant that gives you an initial version instantly back and then you can iterate super, super quickly.

0
💬 0

3678.736 - 3691.382 Lex Fridman

What about something like that recently came out, Replit Agent, that does also like setting up the development environment, installing software packages, configuring everything, configuring the databases, and actually deploying the app?

0
💬 0

3692.062 - 3692.162 Arvid Lunnemark

Yeah.

0
💬 0

3692.182 - 3695.524 Lex Fridman

Is that also in the set of things you dream about?

0
💬 0

3696.264 - 3701.526 Arvid Lunnemark

I think so. I think that would be really cool. For certain types of programming, it would be really cool.

0
💬 0

3701.586 - 3703.227 Lex Fridman

Is that within scope of Cursor?

0
💬 0

3704.113 - 3723.936 Arvid Lunnemark

Yeah, we aren't actively working on it right now. But it's definitely like, we want to make the programmer's life easier and more fun. And some things are just really tedious and you need to go through a bunch of steps and you want to delegate that to an agent. And then some things you can actually have an agent in the background while you're working.

0
💬 0

3724.176 - 3743.927 Arvid Lunnemark

Like, let's say you have a PR that's both backend and frontend, and you're working in the frontend, and then you can have a background agent that does some work and figure out kind of what you're doing. And then when you get to the backend part of your PR, then you have some like initial piece of code that you can iterate on. And so that would also be really cool.

0
💬 0

3745.155 - 3767.159 Lex Fridman

One of the things we already talked about is speed. But I wonder if we can just linger on that some more in the various places that the technical details involved in making this thing really fast. So every single aspect of Cursor, most aspects of Cursor feel really fast. Like I mentioned, the apply is probably the slowest thing. And for me, I'm sorry, the pain.

0
💬 0

3768.239 - 3772.1 Arvid Lunnemark

It's a pain. It's a pain that we're feeling and we're working on fixing it.

0
💬 0

3772.12 - 3795.107 Lex Fridman

Yeah. Yeah, I mean, it says something that something that feels I don't know what it is like one second or two seconds. That feels slow. That means that's actually shows that everything else is just really, really fast. So is there some technical details about how to make some of these models hot to make the chat fast how to make the diffs fast? Is there something that just jumps to mind?

0
💬 0

3795.515 - 3821.761 Aman Sanger

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

0
💬 0

3822.241 - 3836.913 Aman Sanger

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

0
💬 0

3837.013 - 3839.135 Lex Fridman

Can you explain how KV cache works?

0
💬 0

3839.395 - 3858.467 Aman Sanger

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

0
💬 0

3858.988 - 3885.378 Aman Sanger

And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.

0
💬 0

3885.698 - 3895.864 Aman Sanger

That's a lot of matrix multiplies that happen, and that is really, really slow. Instead, if you have already done that, and you stored the keys and values, and you keep that in the GPU...

0
💬 0

3896.884 - 3915.834 Aman Sanger

Then when I'm, let's say I have sorted for the last n tokens, if I now want to compute the output token for the n plus one token, I don't need to pass those first n tokens through the entire model because I already have all those keys and values. And so you just need to do the forward pass through that last token.

0
💬 0

3915.854 - 3925.659 Aman Sanger

And then when you're doing attention, you're reusing those keys and values that have been computed, which is the only kind of sequential part or sequentially dependent part of the transformer.

0
💬 0

3926.3 - 3932.463 Lex Fridman

Is there like higher level caching of like caching of the prompts or that kind of stuff? I see help.

0
💬 0

3933.003 - 3957.293 Aman Sanger

Yeah, that that there's other types of caching you can kind of do. One interesting thing that you can do for cursor tab is you can basically predict ahead as if the user would have accepted the suggestion and then trigger another request. And so then you've cached, you've done this speculative, it's a mix of speculation and caching, right?

0
💬 0

3957.313 - 3980.478 Aman Sanger

Because you're speculating what would happen if they accepted it. And then you have this value that is cached, this suggestion. And then when they press tab, the next one would be waiting for them immediately. It's a kind of clever heuristic slash trick. that uses a higher level caching and can give the... It feels fast despite there not actually being any changes in the model.

0
💬 0

3980.698 - 4000.396 Sualeh Asif

And if you can make the KV cache smaller, one of the advantages you get is like, maybe you can speculate even more. Maybe you can guess, here's the 10 things that... could be useful. Like, predict the next 10, and it's possible the user hits the one of the 10. It's a much higher chance than the user hits the exact one that you show them.

0
💬 0

4001.197 - 4028.846 Sualeh Asif

Maybe they type another character, and we sort of hit something else in the cache. So there's all these tricks where... The general phenomena here is... I think it's also super useful for RL is... maybe a single sample from the model isn't very good. But if you predict like 10 different things, it turns out that one of the 10, that's right, is the probability is much higher.

0
💬 0

4028.946 - 4051.141 Sualeh Asif

There's these passive key curves. And, you know, part of RL, like what RL does is you can exploit this pass at k phenomena to make many different predictions. And one way to think about this, the model sort of knows internally, has some uncertainty over which of the k things is correct, or which of the k things does the human want.

0
💬 0

4051.161 - 4076.557 Sualeh Asif

So when we RL our cursor tab model, one of the things we're doing is we're predicting which of the hundred different suggestions the model produces is more amenable for humans? Like, which of them do humans more like than other things? Maybe, like, there's something where the model can predict very far ahead versus, like, a little bit and maybe somewhere in the middle and...

0
💬 0

4079.019 - 4094.43 Sualeh Asif

And then you can give a reward to the things that humans would like more and sort of punish the things that it won't like and sort of then train the model to output the suggestions that humans would like more. You have these like RL loops that are very useful that exploit these passive K-curves. Oman maybe can go into even more detail.

0
💬 0

4095 - 4117.425 Aman Sanger

Yeah, it is a little different than speed. But, I mean, like, technically you tie it back in because you can get away with the smaller model if you RL your smaller model and it gets the same performance as the bigger one. That's, like, and while I was mentioning stuff about... about reducing the size of your KB cache. There are other techniques there as well that are really helpful for speed.

0
💬 0

4119.066 - 4142.317 Aman Sanger

So kind of back in the day, like all the way two years ago, people mainly use multi-head attention. And I think there's been a migration towards more efficient attention schemes like group query or multi-query attention. And this is really helpful for then with larger batch sizes, being able to generate the tokens much faster.

0
💬 0

4143.041 - 4164.841 Aman Sanger

The interesting thing here is this now has no effect on that time to first token pre-fill speed. The thing this matters for is now generating tokens. And why is that? Because when you're generating tokens, instead of... being bottlenecked by doing these super-paralyzable matrix multiplies across all your tokens.

0
💬 0

4165.302 - 4184.235 Aman Sanger

You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.

0
💬 0

4185.616 - 4206.777 Aman Sanger

Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.

0
💬 0

4207.662 - 4225.857 Aman Sanger

With group query, you instead preserve all the query heads, and then your keys and values are kind of... There are fewer heads for the keys and values, but you're not reducing it to just one. But anyways, the whole point here is you're just reducing the size of your KV cache.

0
💬 0

4226.818 - 4227.398 Arvid Lunnemark

And then there is MLA.

0
💬 0

4228.814 - 4245.545 Aman Sanger

Yeah, multi-latent. That's a little more complicated. And the way that this works is it kind of turns the entirety of your keys and values across all your heads into this kind of one latent vector that is then kind of expanded inference time.

0
💬 0

4246.185 - 4266.31 Sualeh Asif

But MLA is from this company called DeepSeek. It's quite an interesting algorithm. Maybe the key idea is sort of in both MQA and in other places, what you're doing is you're sort of reducing the number of KV heads. The advantage you get from that is

0
💬 0

4268.613 - 4295.882 Sualeh Asif

there's less of them, but maybe the theory is that you actually want a lot of different, like you want each of the keys and values to actually be different. So one way to reduce the size is you keep one big shared vector for all the keys and values. And then you have smaller vectors for every single token, so that you can store only the smaller thing. There's some sort of low-rank reduction.

0
💬 0

4296.443 - 4309.113 Sualeh Asif

And the low-rank reduction, at the end of the time, when you eventually want to compute the final thing, remember that you're memory bound, which means that you still have some compute left that you can use for these things. So if you can expand the

0
💬 0

4310.414 - 4336.634 Sualeh Asif

um the latent vector back out and and somehow like this is far more efficient because just like you're reducing like for example maybe like reducing like 32 or something like the size of the vector that you're keeping yeah there's perhaps some richness in having a separate uh set of keys and values and query that kind of pairwise match up versus compressing that all into one and that interaction at least

0
💬 0

4338.4 - 4341.512 Lex Fridman

All of that is dealing with being memory bound. Yeah.

0
💬 0

4344.016 - 4367.346 Aman Sanger

what i mean ultimately how does that map to the user experience trying to get the yeah the two things that it maps to is you can now make your cash a lot larger because you've less space allocated for the kv cash you can maybe cash a lot more aggressively and a lot more things so you get more cash hits which are helpful for reducing the time to first token for the reasons that were kind of described earlier and then the second being when you

0
💬 0

4368.678 - 4377.587 Aman Sanger

start doing inference with more and more requests and larger and larger batch sizes, you don't see much of a slowdown in as it's generating the tokens, the speed of that.

0
💬 0

4378.046 - 4380.648 Sualeh Asif

But it also allows you to make your prompt bigger for certain.

0
💬 0

4380.668 - 4397.863 Aman Sanger

Yeah. So like the basic, the size of your KV cache is both the size of all your prompts multiplied by the number of prompts being processed in parallel. So you could increase either those dimensions, right? The batch size or the size of your prompts without degrading the latency of generating tokens.

0
💬 0

4398.344 - 4403.809 Lex Fridman

Arvid, you wrote a blog post, Shadow of a Workspace, iterating on code in the background. Yeah. So what's going on?

0
💬 0

4404.877 - 4427.798 Arvid Lunnemark

So to be clear, we want there to be a lot of stuff happening in the background, and we're experimenting with a lot of things. Right now, we don't have much of that happening, other than the cache warming or figuring out the right context that goes into your command key prompts, for example. But the idea is, if you can actually spend computation in the background, then you can help...

0
💬 0

4429.74 - 4453.133 Arvid Lunnemark

help the user maybe at a slightly longer time horizon than just predicting the next few lines that you're going to make. But actually, in the next 10 minutes, what are you going to make? And by doing it in the background, you can spend more computation doing that. And so the idea of the shadow workspace that we implemented, and we use it internally for experiments, is that

0
💬 0

4454.033 - 4473.901 Arvid Lunnemark

to actually get advantage of doing stuff in the background, you want some kind of feedback signal to give back to the model. Because otherwise, like you can get higher performance by just letting the model think for longer. And so like O1 is a good example of that. But another way you can improve performance is by letting the model... iterate and get feedback.

0
💬 0

4474.821 - 4497.886 Arvid Lunnemark

And so one very important piece of feedback when you're a programmer is the language server, which is this thing that exists for most different languages, and there's like a separate language server per language. And it can tell you, you know, you're using the wrong type here, and then gives you an error, or it can allow you to go to definition and sort of understands the structure of your code.

0
💬 0

4498.266 - 4509.712 Arvid Lunnemark

So language servers are extensions developed by, like there's a TypeScript language server developed by the TypeScript people, a Rust language server developed by the Rust people, and then they all interface over the language server protocol to VS Code.

0
💬 0

4509.852 - 4517.095 Arvid Lunnemark

So that VS Code doesn't need to have all of the different languages built into VS Code, but rather you can use the existing compiler infrastructure.

0
💬 0

4517.215 - 4518.956 Lex Fridman

For linting purposes?

0
💬 0

4519.256 - 4525.179 Arvid Lunnemark

It's for linting. It's for going to definition and for like seeing the right types that you're using.

0
💬 0

4525.199 - 4527.36 Lex Fridman

So it's doing like type checking also?

0
💬 0

4527.788 - 4538.877 Arvid Lunnemark

Yes, type checking and going to references. And that's like, when you're working in a big project, you kind of need that. If you don't have that, it's like really hard to code in a big project.

0
💬 0

4539.118 - 4546.384 Lex Fridman

Can you say again how that's being used inside Cursor, the language server protocol communication thing?

0
💬 0

4546.809 - 4566.608 Arvid Lunnemark

So it's being used in Cursor to show to the programmer, just like in VS Code. But then the idea is you want to show that same information to the models, the IOM models. And you want to do that in a way that doesn't affect the user because you want to do it in background. And so the idea behind the shadow workspace was, okay, like one way we can do this is

0
💬 0

4568.108 - 4590.139 Arvid Lunnemark

we spawn a separate window of Cursor that's hidden. And so you can set this flag and Electron is hidden. There is a window, but you don't actually see it. And inside of this window, the AI agents can modify code however they want, as long as they don't save it because it's still the same folder, and then can get feedback from the linters and go to definition and iterate on their code.

1
💬 0

4590.439 - 4594.282 Lex Fridman

So like literally run everything in the background, like as if, right.

0
💬 0

4594.942 - 4595.162 Arvid Lunnemark

Yeah.

0
💬 0

4595.302 - 4596.843 Lex Fridman

Maybe even run the code?

0
💬 0

4596.944 - 4618.258 Arvid Lunnemark

So that's the eventual version. Okay. That's what you want. And a lot of the blog post is actually about how do you make that happen? Because it's a little bit tricky. You want it to be on the user's machine so that it exactly mirrors the user's environment. And then on Linux, you can do this cool thing where you can actually mirror the file system and have the...

0
💬 0

4619.779 - 4642.914 Arvid Lunnemark

AI make changes to the files and it thinks that it's operating on the file level, but actually that's stored in memory and you can create this kernel extension to make it work. Whereas on Mac and Windows it's a little bit more difficult, but it's a fun technical problem so that's why.

0
💬 0

4643.554 - 4653.458 Aman Sanger

One maybe hacky but interesting idea that I like is holding a lock on saving. And so basically, you can then have the language model kind of hold the lock on saving to disk.

0
💬 0

4653.838 - 4664.342 Aman Sanger

And then instead of you operating in the ground truth version of the files that are saved to disk, you actually are operating what was the shadow workspace before and these unsaved things that only exist in memory that you still get linter errors for and you can

0
💬 0

4664.762 - 4683.739 Aman Sanger

code in and then when you try to maybe run code it's just like there's a small warning that there's a lock and then you kind of will take back the lock from the language server if you're trying to do things concurrently or from the shadow workspace if you're trying to do things concurrently that's such an exciting future by the way it's a bit of a tangent but like to allow a model to change files

0
💬 0

4684.813 - 4698.402 Lex Fridman

It's scary for people, but like, it's really cool to be able to just like let the agent do a set of tasks and you come back the next day and kind of observe like it's a colleague or something like that.

0
💬 0

4698.422 - 4711.653 Aman Sanger

Yeah. And I think there may be different versions of like run ability where you For the simple things where you're doing things in the span of a few minutes on behalf of the user as they're programming, it makes sense to make something work locally in their machine.

0
💬 0

4711.673 - 4723.244 Aman Sanger

I think for the more aggressive things where you're making larger changes that take longer periods of time, you'll probably want to do this in some sandbox remote environment. And that's another incredibly tricky problem of how do you

0
💬 0

4724.105 - 4742.949 Aman Sanger

exactly reproduce or mostly reproduce to the point of it being effectively equivalent for running code the user's environment with this remote remote sandbox i'm curious what kind of agency you want for for coding did you want them to find bugs do you want them to like implement new features like what agency you want

0
💬 0

4743.069 - 4758.81 Lex Fridman

So, by the way, when I think about agents, I don't think just about coding. I think, so, for the practice of this particular podcast, there's video editing, and a lot of, if you look in Adobe, a lot of, there's code behind, it's very poorly documented code, but you can

0
💬 0

4759.551 - 4783.254 Lex Fridman

interact with premiere for example using code and basically all the uploading everything i do on youtube everything as you could probably imagine i do all that through code and so and including translation and overdubbing all this so i envision all those kinds of tasks so automating many of the tasks that don't have to do directly with the editing So that, okay. That's what I was thinking about.

0
💬 0

4783.274 - 4804.24 Lex Fridman

But in terms of coding, I would be fundamentally thinking about bug finding, like many levels of kind of bug finding and also bug finding like logical bugs, not logical, like spiritual bugs or something. One's like sort of big directions of implementation, that kind of stuff.

0
💬 0

4805.017 - 4806.197 Sualeh Asif

Let's opine on bug finding.

0
💬 0

4806.418 - 4817.301 Aman Sanger

Yeah. I mean, it's really interesting that these models are so bad at bug finding when just naively prompted to find a bug. They're incredibly poorly calibrated.

0
💬 0

4817.722 - 4819.062 Arvid Lunnemark

Even the smartest models.

0
💬 0

4819.122 - 4821.823 Aman Sanger

Exactly. Even 01. How do you explain that?

0
💬 0

4821.843 - 4823.864 Lex Fridman

Is there a good intuition?

0
💬 0

4825.086 - 4845.802 Aman Sanger

I think these models are really strong reflection of the pre-training distribution. And, you know, I do think they, they generalize as the loss gets lower and lower, but I don't think the, the loss and the scale is quite, or the loss is low enough such that they're like really fully generalizing in code. Like the things that we use these things for, uh, the frontier models, uh,

0
💬 0

4846.604 - 4870.362 Aman Sanger

that they're quite good at are really code generation and question answering. And these things exist in massive quantities and pre-training with all of the code on GitHub on the scale of many, many trillions of tokens and questions and answers on things like stack overflow and maybe GitHub issues. And so when you try to push into these things that really don't exist, uh,

0
💬 0

4871.075 - 4894.86 Aman Sanger

very much online, like, for example, the cursor tap objective of predicting the next edit given the edits done so far. The brittleness kind of shows. And then bug detection is another great example where there aren't really that many examples of actually detecting real bugs and then proposing fixes. And the models just really struggle at it. But I think it's a question of transferring the model.

0
💬 0

4895 - 4911.224 Aman Sanger

In the same way that you get this fantastic transfer from pre-trained models just on code in general to the cursor tab objective, you'll see a very, very similar thing with generalized models that are really good at code to bug detection. It just takes a little bit of nudging in that direction.

0
💬 0

4911.816 - 4932.757 Sualeh Asif

To be clear, I think they sort of understand code really well. While they're being pre-trained, the representation that's being built up, almost certainly somewhere in the stream, the model knows that maybe there's something sketchy going on. It sort of has some sketchiness, but actually eliciting the sketchiness to...

0
💬 0

4936.961 - 4955.6 Sualeh Asif

Part of it is that humans are really calibrated on which bugs are really important. It's not just actually saying there's something sketchy. It's like, is this sketchy trivial? Is this sketchy like you're going to take the server down? Part of it is maybe the cultural knowledge of... Like, why is a staff engineer a staff engineer?

0
💬 0

4955.66 - 4976.236 Sualeh Asif

A staff engineer is good because they know that three years ago, like, someone wrote a really, you know, sketchy piece of code that took the server down. And as opposed to, like... As opposed to maybe just, like, you know, you just... this thing is like an experiment. So like a few bugs are fine. Like you're just trying to experiment and get the feel of the thing.

0
💬 0

4976.756 - 4993.762 Sualeh Asif

And so if the model gets really annoying when you're writing an experiment, that's really bad. But if you're writing something for super production, you're like writing a database, right? You're writing code in Postgres or Linux or whatever. Like your Linus Torvalds, it's sort of unacceptable to have even an edge case. And just having the calibration of like,

0
💬 0

4995.612 - 5003.599 Aman Sanger

how paranoid is the user? But even then, if you're putting in a maximum paranoia, it still just doesn't quite get it.

0
💬 0

5003.619 - 5004.6 Sualeh Asif

Yeah, yeah, yeah.

0
💬 0

5005.2 - 5022.495 Lex Fridman

I mean, but this is hard for humans too to understand which line of code is important and which is not. I think one of your principles on a website says if a code can do a lot of damage, one should add a comment that say this line of code is dangerous.

0
💬 0

5023.382 - 5026.143 Arvid Lunnemark

And all caps repeated 10 times.

0
💬 0

5027.083 - 5048.969 Lex Fridman

No, you say like for every single line of code inside the function, you have to, and that's quite profound. That says something about human beings because the engineers move on, even the same person might just forget how it can sync the Titanic, a single function. Like you don't, you might not intuit that quite clearly by looking at the single piece of code.

0
💬 0

5049.309 - 5066.575 Arvid Lunnemark

Yeah, and I think that one is also partially also for today's AI models, where if you actually write dangerous, dangerous, dangerous in every single line, the models will pay more attention to that and will be more likely to find bugs in that region.

0
💬 0

5066.835 - 5073.838 Lex Fridman

That's actually just straight up a really good practice of labeling code of how much damage this can do.

0
💬 0

5074.679 - 5080.342 Arvid Lunnemark

Yeah, I mean, it's controversial. Some people think it's ugly. Swallowed does not like it.

0
💬 0

5080.762 - 5097.832 Sualeh Asif

In fact, I actually think this is one of the things I learned from Arvid is, you know, sort of aesthetically, I don't like it. But I think there's certainly something where, like, it's useful for the models. And humans just forget a lot. And it's really easy to make a small mistake and cause, like,

0
💬 0

5099.468 - 5124.51 Aman Sanger

bring down you know like just bring down the server and like you like of course we like test a lot and whatever but there's always these things that you have to be very careful yeah like with just normal doc strings i think people will often just skim it when making a change and think oh this i know how to do this um and you kind of really need to point it out to them so that doesn't slip through yeah you have to be reminded that you can do a lot of damage

0
💬 0

5126.069 - 5133.835 Lex Fridman

That's like, we don't really think about that. You think about, okay, how do I figure out how this works so I can improve it? You don't think about the other direction.

0
💬 0

5135.096 - 5144.883 Arvid Lunnemark

Until we have formal verification for everything, then you can do whatever you want and you know for certain that you have not introduced a bug if the proof passed.

0
💬 0

5145.264 - 5147.526 Aman Sanger

But concretely, what do you think that future would look like?

0
💬 0

5148.402 - 5170.238 Arvid Lunnemark

I think people will just not write tests anymore. And the model will suggest, like you write a function, the model will suggest a spec and you review the spec. And in the meantime, smart reasoning model computes a proof that the implementation follows the spec. And I think that happens for most functions.

0
💬 0

5170.678 - 5184.633 Michael Truell

Don't you think this gets at a little bit some of the stuff you were talking about earlier with the difficulty of specifying intent for what you want with software? Where sometimes it might be because the intent is really hard to specify, it's also then going to be really hard to prove that it's actually matching whatever your intent is.

0
💬 0

5184.754 - 5186.996 Arvid Lunnemark

Like you think that spec is hard to generate?

0
💬 0

5188.157 - 5200.93 Michael Truell

Yeah, or just for a given spec, maybe you can... I think there is a question of can you actually do the formal verification? Is that possible? I think that there's more to dig into there.

0
💬 0

5201.351 - 5205.046 Arvid Lunnemark

But then also... Even if you have the spec? If you have the spec. But how do you map the spec?

0
💬 0

5205.066 - 5208.267 Michael Truell

Even if you have the spec. Is the spec written in natural language? Yeah, how do you map the spec?

0
💬 0

5208.427 - 5210.288 Arvid Lunnemark

No, the spec would be formal.

0
💬 0

5211.228 - 5216.59 Michael Truell

But how easy would that be to draw? So then I think that you care about things that are not going to be easily well-specified in the spec language.

0
💬 0

5217.21 - 5217.69 Arvid Lunnemark

I see, I see.

0
💬 0

5217.99 - 5221.932 Michael Truell

Yeah. Maybe an argument against formal verification is all you need.

0
💬 0

5222.392 - 5227.353 Aman Sanger

Yeah. The worry is there's this massive document. Replacing something like unit tests, sure.

0
💬 0

5227.473 - 5240.734 Arvid Lunnemark

Yeah, yeah. I think you can probably also evolve the spec languages to capture some of the things that they don't really capture right now. I don't know. I think it's very exciting.

0
💬 0

5241.375 - 5246.038 Lex Fridman

And you're speaking not just about single functions. You're speaking about entire code bases.

0
💬 0

5246.498 - 5268.411 Arvid Lunnemark

I think entire code bases is harder, but that is what I would love to have. And I think it should be possible. Because you can even... There's a lot of work recently where you can prove... formally verified down to the hardware. So you formally verify the C code, and then you formally verify through the GCC compiler, and then through the Verilog down to the hardware.

0
💬 0

5269.552 - 5286.4 Arvid Lunnemark

And that's an incredibly big system, but it actually works. And I think big code bases are sort of similar in that they're a multi-layered system. And if you can decompose it and formally verify each part, then I think it should be possible. I think the specification problem is a real problem, but... How do you handle side effects?

0
💬 0

5287.34 - 5311.513 Aman Sanger

or how do you handle i guess external dependencies like calling the stripe api maybe stripe would write a spec for but like you can't do this for everything like can you do this for everything you use like how do you how do you do it for if there's a language like maybe maybe like people will use language models as primitives in the programs they write and there's like a dependence on it and like how how do you now include that i think you might be able to prove prove that still

0
💬 0

5312.267 - 5313.468 Aman Sanger

Prove what about language models?

0
💬 0

5313.969 - 5327.103 Arvid Lunnemark

I think it feels possible that you could actually prove that a language model is aligned, for example. Or like you can prove that it actually gives the right answer. That's the dream.

0
💬 0

5327.683 - 5349.747 Lex Fridman

Yeah, that is, I mean, if it's possible, that's your, I have a dream speech. If it's possible, that will certainly help with, you know, making sure your code doesn't have bugs and making sure AI doesn't destroy all of human civilization. So the full spectrum of AI safety to just bug finding. So you said the models struggle with bug finding. What's the hope?

0
💬 0

5350.187 - 5369.673 Sualeh Asif

You know, my hope initially is, and I can let Michael chime in too, but it was like, there's, It should, you know, first help with the stupid bugs. Like, it should very quickly catch the stupid bugs. Like, off-by-one errors, like, sometimes you write something in a comment and do it the other way. It's, like, very common. Like, I do this.

0
💬 0

5369.753 - 5381.285 Sualeh Asif

I write, like, less than in a comment and, like, I maybe write the greater than sign or something like that. And the model is like, yeah, you look sketchy. Like, are you sure you want to do that? But eventually it should be able to catch harder bugs, too.

0
💬 0

5382.512 - 5399.38 Michael Truell

Yeah. And I think that it's also important to note that this is having good bug finding models feels necessary to get to the highest reaches of having AI do more and more programming for you, where you're going to, you know, if the AI is building more and more of the system for you, you need to not just generate, but also verify.

0
💬 0

5400.42 - 5418.66 Michael Truell

And without that, some of the problems that we've talked about before with programming with these models will just become untenable. So it's not just for humans, like you write a bug, I write a bug, find the bug for me, but it's also being able to verify the AI's code and check it is really important.

0
💬 0

5418.92 - 5435.372 Arvid Lunnemark

Yeah. And then how do you actually do this? Like we have had a lot of contentious dinner discussions of how do you actually train a bug model? But one very popular idea is, you know, it's kind of potentially easy to introduce a bug than actually finding the bug. And so you can train a model to introduce bugs in existing code.

0
💬 0

5436.313 - 5448.861 Arvid Lunnemark

And then you can train a reverse bug model then that can find bugs using this synthetic data. So that's like one example, but yeah, there are lots of ideas for how to do this.

0
💬 0

5449.921 - 5468.514 Michael Truell

You can also do a bunch of work, not even at the model level, of taking the biggest models and then maybe giving them access to a lot of information that's not just the code. it's kind of a hard problem to stare at a file and be like, where's the bug? And that's hard for humans often, right? And so often you have to run the code and being able to see things like traces and step through a debugger.

0
💬 0

5469.615 - 5480.164 Michael Truell

There's a whole other direction where it kind of tends toward that. And it could also be that there are kind of two different product form factors here. It could be that you have a really specialty model that's quite fast that's kind of running in the background and trying to spot bugs.

0
💬 0

5480.184 - 5492.591 Michael Truell

And it might be that sometimes, sort of to Arvid's earlier example about some nefarious input box bug, it might be that sometimes you want to like You know there's a bug. You're not just checking hypothesis-free. You're like, this is a problem. I really want to solve it.

0
💬 0

5492.951 - 5498.552 Michael Truell

And you zap that with tons and tons and tons of compute, and you're willing to put in $50 to solve that bug or something even more.

0
💬 0

5499.192 - 5515.036 Lex Fridman

Have you thought about integrating money into this whole thing? I would pay probably a large amount of money for if you found a bug or even generated code that I really appreciated. I had a moment a few days ago when I started using Cursor where it generated a perfect...

0
💬 0

5517.4 - 5539.116 Lex Fridman

like perfect three functions for interacting with the YouTube API to update captions and for localization in different languages. The API documentation is not very good. And the code across, like if I Googled it for a while, I couldn't find exactly, there's a lot of confusing information, and Cursor generated it perfectly.

0
💬 0

5539.616 - 5563.335 Lex Fridman

And I was like, I just sat back, I read the code, I was like, this is correct, I tested it, it's correct. I was like, I want a tip. On a button that goes, here's $5. One that's really good just to support the company and support what the interface is. And the other is that probably sends a strong signal, like, good job. Right? There's a much stronger signal than just accepting the code, right?

0
💬 0

5563.355 - 5579.881 Lex Fridman

You just actually send, like, a strong good job. That, and for bug finding, obviously, like, there's a lot of people... that would pay a huge amount of money for a bug, like a bug bounty thing. Right? Do you guys think about that?

0
💬 0

5580.121 - 5605.945 Arvid Lunnemark

Yeah, it's a controversial idea inside the company. I think it sort of depends on how much you believe in humanity, almost. I think it would be really cool if you spend nothing to try to find a bug, and if it doesn't find a bug, you spend $0. And then if it does find a bug and you click Accept, then it also shows in parentheses $1. And so you spend $1 to accept the bug.

0
💬 0

5606.866 - 5619.584 Arvid Lunnemark

And then, of course, there's the worry like, okay, we spent a lot of computation. Maybe people will just copy-paste. I think that's a worry. And then there is also the worry that introducing money into the product makes it kind of...

0
💬 0

5620.725 - 5646.171 Arvid Lunnemark

you know like it doesn't feel as fun anymore like you have to like think about money and and you all you want to think about is like the code and so maybe it actually makes more sense to separate it out and like you pay some fee like every month and then you get all of these things for free but there could be a tipping component which is not like it yes but it still has that like dollar symbol i think it's fine but i i also see the point where like maybe you don't want to introduce it

0
💬 0

5646.476 - 5652.898 Aman Sanger

Yeah, I was going to say the moment that feels like people do this is when they share it, when they have this fantastic example, they just kind of share it with their friends.

0
💬 0

5653.238 - 5666.722 Michael Truell

There is also a potential world where there's a technical solution to this like honor system problem too, where if we can get to a place where we understand the output of the system more, I mean, to the stuff we were talking about with like, you know, error checking with the LSP and then also running the code.

0
💬 0

5667.062 - 5674.864 Michael Truell

But if you could get to a place where you could actually somehow verify, oh, I have fixed the bug, maybe then the bounty system doesn't need to rely on the honor system too.

0
💬 0

5675.564 - 5701.368 Lex Fridman

How much interaction is there between the terminal and the code? How much information is gained if you run the code in the terminal? Can you do a loop where it runs the code and suggests how to change the code if the code in runtime gives an error? Because right now they're separate worlds completely. I know you can do Ctrl-K inside the terminal to help you write the code.

0
💬 0

5701.628 - 5718.991 Aman Sanger

You can use terminal context as well inside of check, mank, kind of everything. We don't have the looping part yet, though we suspect something like this could make a lot of sense. There's a question of whether it happens in the foreground too, or if it happens in the background, like what we've been discussing.

0
💬 0

5719.111 - 5727.973 Lex Fridman

Sure. The background is pretty cool. Like we do running the code in different ways. Plus there's a database side to this, which how do you protect it from not modifying the database? But okay. Yeah.

0
💬 0

5729.613 - 5749.358 Sualeh Asif

I mean, there's certainly cool solutions there. There's this new API that is being developed for... It's not in AWS, but, you know, it certainly is. I think it's in PlanetScale. I don't know if PlanetScale was the first one to add it. It's this ability to sort of add branches to a database, which is...

0
💬 0

5750.459 - 5767.659 Sualeh Asif

Like if you're working on a feature and you want to test against a broad database, but you don't actually want to test against a broad database, you could sort of add a branch to the database. And the way to do that is to add a branch to the write-ahead log. And there's obviously a lot of technical complexity in doing it correctly. I guess database companies need new things to do.

0
💬 0

5768.9 - 5796.828 Sualeh Asif

They have good databases now. And I think TurboBuffer, which is one of the databases we use, is going to add maybe branching to the Red Hat log. And so maybe the AI agents will use branching. They'll test against some branch, and it's sort of going to be a requirement for the database to support branching or something.

0
💬 0

5796.908 - 5799.029 Aman Sanger

It would be really interesting if you could branch a file system.

0
💬 0

5799.329 - 5802.77 Sualeh Asif

Right. Yeah. I feel like everything needs branching. Yeah.

0
💬 0

5803.43 - 5810.392 Lex Fridman

Yeah. That's the problem with the multiverse, right? If you branch on everything, that's a lot.

0
💬 0

5810.768 - 5817.89 Sualeh Asif

I mean, there's obviously these super clever algorithms to make sure that you don't actually use a lot of space or CPU or whatever.

0
💬 0

5818.711 - 5830.875 Lex Fridman

Okay, this is a good place to ask about infrastructure. So you guys mostly use AWS. What are some interesting details? What are some interesting challenges? Why did you choose AWS? Why is AWS still winning? Hashtag.

0
💬 0

5831.403 - 5848.364 Arvid Lunnemark

AWS is just really, really good. It's really good. Whenever you use an AWS product, you just know that it's going to work. It might be absolute hell to go through the steps to set it up.

0
💬 0

5848.504 - 5849.726 Lex Fridman

Why is the interface so horrible?

0
💬 0

5850.583 - 5875.346 Arvid Lunnemark

because it's just so good it doesn't need the nature of winning i think it's exactly it's just nature they're winning yeah yeah but aws you can always trust like it will always work and if there is a problem it's probably your problem uh yeah okay is there some interesting like challenges to you guys a pretty new startup to get scaling to like to so many people and

0
💬 0

5875.683 - 5893.801 Michael Truell

Yeah, I think that it has been an interesting journey adding each extra zero to the request per second. You run into all of these with the general components you're using for caching and databases run into issues as you make things bigger and bigger. And now we're at the scale where we get int overflows on our tables and things like that.

0
💬 0

5894.541 - 5910.303 Michael Truell

And then also, there have been some custom systems that we've built, like, for instance, our retrieval system for computing a semantic index of your codebase and answering questions about a codebase that have continually, I feel like, been one of the trickier things to scale.

0
💬 0

5910.943 - 5931.839 Sualeh Asif

I have a few friends who are super senior engineers, and one of their lines is like, it's very hard to predict where systems will break when you scale them. You can sort of try to predict in advance, but there's always something weird that's going to happen when you add this extra zero. You thought you thought through everything, but you didn't actually think through everything.

0
💬 0

5932.74 - 5959.976 Sualeh Asif

But I think for that particular system, we've... So for concrete details, the thing we do is obviously we upload, we chunk up all of your code and then we send up sort of the code for embedding and we embed the code. And then we store the embeddings in a database, but we don't actually store any of the code. And then there's reasons around making sure that

0
💬 0

5960.996 - 5983.302 Sualeh Asif

We don't introduce client bugs because we're very, very paranoid about client bugs. We store much of the details on the server, like everything is sort of encrypted. So one of the technical challenges is always making sure that the local index, the local code base state is the same as the state that is on the server.

0
💬 0

5984.395 - 6006.833 Sualeh Asif

And the way sort of technically we ended up doing that is, so for every single file, you can sort of keep this hash. And then for every folder, you can sort of keep a hash, which is the hash of all of its children. And you can sort of recursively do that until the top. And why do something complicated? One thing you could do is you could keep a hash for every file.

0
💬 0

6007.294 - 6027.094 Sualeh Asif

Then every minute you could try to download the hashes that are on the server, figure out what are the files that don't exist on the server. Maybe you just created a new file. Maybe you just deleted a file. Maybe you checked out a new branch and try to reconcile the state between the client and the server. But that introduces, like, absolutely ginormous network overhead.

0
💬 0

6027.494 - 6054.367 Sualeh Asif

Both on the client side, I mean, nobody really wants us to hammer their Wi-Fi all the time if you're using Cursor. But also, like, I mean, it would introduce, like, ginormous overhead in the database. I mean, it would sort of be reading this... Tens of terabytes database sort of approaching like 20 terabytes or something database like every second. That's just kind of crazy.

0
💬 0

6054.407 - 6070.715 Sualeh Asif

You definitely don't want to do that. So what you do, you sort of, you just try to reconcile the single hash, which is at the root of the project. And then if something mismatches, then you go, you find where all the things disagree. Maybe you look at the children and see if the hashes match. And if the hashes don't match, go look at their children and so on.

0
💬 0

6070.995 - 6075.997 Sualeh Asif

But you only do that in the scenario where things don't match. And for most people, most of the time, the hashes match.

0
💬 0

6076.417 - 6080.861 Lex Fridman

So it's kind of like hierarchical reconciliation. Yeah, something like that.

0
💬 0

6081.201 - 6082.402 Aman Sanger

Yeah, it's called the Merkel tree.

0
💬 0

6082.742 - 6088.066 Lex Fridman

Yeah, Merkel. Yeah. I mean, so yeah, this is cool to see that you kind of have to think through all these problems.

0
💬 0

6088.226 - 6093.651 Sualeh Asif

And I mean, the point of, like, the reason it's gotten hard is just because. Like, the number of people using it and...

0
💬 0

6094.051 - 6113.982 Sualeh Asif

You know, if some of your customers have really, really large code bases to the point where, you know, we originally reordered our code base, which is big, but I mean, it's just not the size of some company that's been there for 20 years and sort of has a ginormous number of files. And you sort of want to scale that across programmers.

0
💬 0

6114.522 - 6134.469 Sualeh Asif

There's all these details where like building a simple thing is easy, but scaling it to a lot of people, like a lot of companies is obviously a difficult problem. Which is sort of independent of actually, so there's part of this scaling our current solution is also coming up with new ideas that obviously we're working on. But then scaling all of that in the last few weeks, months.

0
💬 0

6134.75 - 6146.856 Aman Sanger

Yeah. And there are a lot of clever things, like additional things that go into this indexing system. For example, the bottleneck in terms of costs is not storing things in the vector database or the database. It's actually embedding the code.

0
💬 0

6147.396 - 6157.984 Aman Sanger

And you don't want to re-embed the code base for every single person in a company that is using the same exact code, except for maybe they're in a different branch with a few different files or they've made a few local changes.

0
💬 0

6158.725 - 6182.643 Aman Sanger

And so because, again, embeddings are the bottleneck, you can do one clever trick and not have to worry about the complexity of dealing with branches and the other databases where you just... have some cache on the actual vectors computed from the hash of a given chunk. And so this means that when the nth person at a company goes and invents their code base, it's really, really fast.

0
💬 0

6183.083 - 6191.25 Aman Sanger

And you do all this without actually storing any code on our servers at all. No code data is stored. We just store the vectors in the vector database and the vector cache.

0
💬 0

6191.89 - 6211.888 Lex Fridman

What's the biggest gains at this time you get from indexing the code base? I could just out of curiosity, like what What benefit do users have? It seems like longer term, there'll be more and more benefit, but in the short term, just asking questions of the code base, what's the usefulness of that?

0
💬 0

6211.908 - 6229.538 Arvid Lunnemark

I think the most obvious one is just you want to find out where something is happening in your large code base. And you sort of have a fuzzy memory of, okay, I want to find the place where we do X. But you don't exactly know what to search for in a normal text search.

0
💬 0

6229.858 - 6238.162 Arvid Lunnemark

And so you ask a chat, you hit command enter to ask with the codebase chat, and then very often it finds the right place that you were thinking of.

0
💬 0

6238.944 - 6251.633 Aman Sanger

I think, like you mentioned, in the future, I think this is only going to get more and more powerful where we're working a lot on improving the quality of our retrieval. And I think the ceiling for that is really, really much higher than people give it credit for.

0
💬 0

6252.574 - 6267.124 Lex Fridman

One question that's good to ask here, have you considered and why haven't you much done sort of local stuff to where you can do the... I mean, it seems like everything we just discussed is exceptionally difficult to do. To go to the cloud, you have to think about all these things with the caching and the...

0
💬 0

6268.584 - 6284.752 Lex Fridman

you know large code base with a large number of programmers are using the same code base you have to figure out the puzzle of that a lot of it you know most software just does stuff this heavy computational stuff locally have you considered doing sort of embeddings locally

0
💬 0

6285.245 - 6310.436 Arvid Lunnemark

Yeah, we thought about it, and I think it would be cool to do it locally. I think it's just really hard. And one thing to keep in mind is that some of our users use the latest MacBook Pro, but most of our users, like more than 80% of our users, are in Windows machines, and many of them are not very powerful. And so... local models really only works on the latest computers.

0
💬 0

6310.656 - 6333.782 Arvid Lunnemark

And it's also a big overhead to build that in. And so even if we would like to do that, it's currently not something that we are able to focus on. And I think there are some people that do that, and I think that's great. But especially as models get bigger and bigger and you want to do fancier things with like bigger models, it becomes even harder to do it locally.

0
💬 0
0
💬 0

6334.803 - 6351.799 Sualeh Asif

And it's not a problem of like weaker computers. It's just that, for example, if you're some big company, you have big company code base, it's just really hard to process big company code base, even on the beefiest MacBook Pros. So even if it's not even a matter of like, if you're just like,

0
💬 0

6353.06 - 6366.913 Sualeh Asif

a student or something I think if you're like the best programmer at a big company you're still going to have a horrible experience if you do everything locally I mean you could you could do edge and sort of scrape by but like again it wouldn't be fun anymore

0
💬 0

6367.193 - 6387.538 Aman Sanger

Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.

0
💬 0

6388.739 - 6414.176 Aman Sanger

One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,

0
💬 0

6415.262 - 6436.888 Aman Sanger

It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.

0
💬 0

6437.668 - 6441.99 Sualeh Asif

Don't you want the most capable model? You want Sonnet?

0
💬 0

6442.49 - 6466.117 Lex Fridman

And also with O1. I like how you're pitching me. Would you be satisfied with an inferior model? Listen, yes, I'm one of those, but there's some people that like to do stuff locally, especially like... There's a whole, obviously, open source movement that kind of resists. And it's good that they exist, actually, because you want to resist the power centers that are growing our...

0
💬 0

6466.377 - 6489.28 Arvid Lunnemark

There's actually an alternative to local models that I am particularly fond of. I think it's still very much in the research stage, but you could imagine to do homomorphic encryption for language model inference. So you encrypt your input on your local machine, then you send that up, and then the server can use loss of computation.

0
💬 0

6489.3 - 6508.519 Arvid Lunnemark

They can run models that you cannot run locally on this encrypted data, but they cannot see what the data is. And then they send back the answer and you decrypt the answer and only you can see the answer. So I think that's still very much research and all of it is about trying to make the overhead lower because right now the overhead is really big.

0
💬 0

6509.139 - 6531.715 Arvid Lunnemark

But if you can make that happen, I think that would be really, really cool. And I think it would be really, really impactful. Because I think one thing that's actually kind of worrisome is that as these models get better and better, they're going to become more and more economically useful. And so more and more of the world's information and data will flow through one or two centralized actors.

0
💬 0

6532.696 - 6556.049 Arvid Lunnemark

And then there are worries about there can be traditional hacker attempts, but it also creates this kind of scary part where If all of the world's information is flowing through one node in plain text, you can have surveillance in very bad ways. And sometimes that will happen for, you know, initially will be like good reasons.

0
💬 0

6556.149 - 6579.052 Arvid Lunnemark

Like people will want to try to protect against like bad actors using AI models in bad ways. And then you will add in some surveillance code and then someone else will come in and, you know, you're on a slippery slope and then you start... doing bad things with a lot of the world's data. And so I'm very hopeful that we can solve homomorphic encryption for language model inference.

0
💬 0

6579.072 - 6597.342 Lex Fridman

Doing privacy preserving machine learning. But I would say that's the challenge we have with all software these days. It's like... There's so many features that can be provided from the cloud and all of us increasingly rely on it and make our life awesome, but there's downsides. And that's why you rely on really good security to protect from basic attacks.

0
💬 0

6597.943 - 6609.313 Lex Fridman

But there's also only a small set of companies that are controlling that data. you know, and they, they obviously have leverage and they could be infiltrated in all kinds of ways. That's the world we live in.

0
💬 0

6610.093 - 6633.129 Sualeh Asif

Yeah. I mean, the thing I'm just actually quite worried about is sort of the world where, I mean, so Anthropic has this responsible scaling policy and so we're, we're on like the low, low ASLs, which is the Anthropic security level or whatever, of like, of the models. But as we get to like, quote unquote, ASL three, ASL four, whatever models, which are sort of very powerful, but,

0
💬 0

6634.71 - 6661.183 Sualeh Asif

For mostly reasonable security reasons, you would want to monitor all the prompts. But I think that's reasonable and understandable where everyone is coming from. But Matt, it'd be really horrible if all the world's information is monitored that heavily. It's way too centralized. It's like this really fine line you're walking where On the one side, you don't want the models to go rogue.

0
💬 0

6661.523 - 6669.566 Sualeh Asif

On the other side, it's humans. I don't know if I trust all the world's information to pass through three model providers.

0
💬 0

6671.066 - 6673.307 Aman Sanger

Why do you think it's different than cloud providers?

0
💬 0

6673.987 - 6696.237 Arvid Lunnemark

Because I think a lot of this data would never have gone to the cloud providers in the first place. Where... This is often like you want to give more data to the EIA models. You want to give personal data that you would never have put online in the first place to these companies or to these models.

0
💬 0

6697.418 - 6717.706 Arvid Lunnemark

And it also centralizes control where right now for cloud, you can often use your own encryption keys and it just can't really do much. but here it's just centralized actors that see the exact plain text of everything.

0
💬 0

6719.067 - 6736.281 Lex Fridman

On the topic of context, that's actually been a friction for me. When I'm writing code in Python, there's a bunch of stuff imported. You could probably intuit the kind of stuff I would like to include in the context. How hard is it to auto-figure out the context?

0
💬 0

6737.495 - 6749.059 Michael Truell

It's tricky. I think we can do a lot better at computing the context automatically in the future. One thing that's important to note is there are trade-offs with including automatic context.

0
💬 0

6750.059 - 6765.925 Michael Truell

So the more context you include for these models, first of all, the slower they are and the more expensive those requests are, which means you can then do less model calls and do less fancy stuff in the background. Also, for a lot of these models, they get confused if you have a lot of information in the prompt.

0
💬 0

6766.585 - 6794.174 Michael Truell

So the bar for accuracy and for relevance of the context you include should be quite high. But already we do some automatic context in some places within the product. It's definitely something we want to get a lot better at. And I think that there are a lot of cool ideas to try there, both on the learning better retrieval systems, like better embedding models, better re-rankers.

0
💬 0

6794.835 - 6813.428 Michael Truell

I think that there are also cool academic ideas, stuff we've tried out internally, but also the field is grappling with writ large, about can you get language models to a place where you can actually just have the model itself, like understand a new corpus of information, And the most popular talked about version of this is, can you make the context windows infinite?

0
💬 0

6813.929 - 6826.463 Michael Truell

Then if you make the context windows infinite, can you make the model actually pay attention to the infinite context? And then after you can make it pay attention to the infinite context, to make it somewhat feasible to actually do it, can you then do caching for that infinite context? You don't have to recompute that all the time.

0
💬 0

6827.341 - 6844.094 Michael Truell

But there are other cool ideas that are being tried that are a little bit more analogous to fine-tuning of actually learning this information and the weights of the model. And it might be that you actually get sort of a qualitatively different type of understanding if you do it more at the weight level than if you do it at the in-context learning level.

0
💬 0

6844.114 - 6858.305 Michael Truell

I think the jury is still a little bit out on how this is all going to work in the end. But in the interim, us as a company, we are really excited about better retrieval systems and picking the parts of the code base that are most relevant to what you're doing. We could do that a lot better.

0
💬 0

6858.905 - 6881.867 Aman Sanger

Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.

0
💬 0

6882.508 - 6903.59 Aman Sanger

So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?

0
💬 0

6905.151 - 6918.176 Aman Sanger

It's an open research question, one that we're quite interested in. And then there's also uncertainty of like, do you want the model to be the thing that end to end is doing everything, i.e. it's doing the retrieval and its internals and then kind of answering the question, creating the code?

0
💬 0

6918.236 - 6940.137 Aman Sanger

Or do you want to separate the retrieval from the frontier model where maybe, you know, you'll get some really capable models that are much better than like the best open source ones in a handful of months? Yeah. And then you'll want to separately train a really good open source model to be the retriever, to be the thing that feeds in the context to these larger models.

0
💬 0

6940.643 - 6948.671 Lex Fridman

Can you speak a little more to the post-training model to understand the code base? What do you mean by that? Is this a synthetic data direction?

0
💬 0

6948.751 - 6969.349 Aman Sanger

Is this... Yeah, I mean, there are many possible ways you could try doing it. There's certainly no shortage of ideas. It's just a question of going in and trying all of them and being empirical about which one works best. One very naive thing is to try to replicate what's done with VS Code and these frontier models.

0
💬 0

6969.429 - 6978.958 Aman Sanger

So let's continue pre-training, some kind of continued pre-training that includes general code data, but also throws in a lot of the data of some particular repository that you care about.

0
💬 0

6979.498 - 7000.884 Aman Sanger

And then in post-training, meaning in, let's just start with instruction fine-tuning, you have like a normal instruction fine-tuning data set about code, but you throw in a lot of questions about code in that repository. So you could either get ground truth ones, which might be difficult, or you could do what you kind of hinted at or suggested using synthetic data, i.e.,

0
💬 0

7002.344 - 7024.682 Aman Sanger

kind of having the model ask questions about various pieces of the code. So you kind of take the pieces of the code, then prompt the model or have a model propose a question for that piece of code, and then add those as instruction finds new data points. And then in theory, this might unlock the model's ability to answer questions about that code base.

0
💬 0

7025.748 - 7033.313 Lex Fridman

Let me ask you about OpenAI 01. What do you think is the role of that kind of test time compute system in programming?

0
💬 0

7033.673 - 7053.085 Aman Sanger

I think test time compute is really, really interesting. So there's been the pre-training regime, which will kind of, as you scale up the amount of data and the size of your model, get you better and better performance, both on loss and then on downstream benchmarks and just general performance when we use it for coding or other tasks.

0
💬 0

7055.327 - 7061.852 Aman Sanger

We're starting to hit a bit of a data wall, meaning it's going to be hard to continue scaling up this regime.

0
💬 0

7062.473 - 7079.747 Aman Sanger

And so scaling up test time compute is an interesting way of now, you know, increasing the number of inference time flops that we use, but still getting like, like, yeah, as you increase the number of flops use inference time getting corresponding improvements in the performance of these models tremendously.

0
💬 0

7079.787 - 7098.304 Aman Sanger

Traditionally, we just had to literally train a bigger model that always used that many more flops. But now we could perhaps use the same size model and run it for longer to be able to get an answer at the quality of a much larger model. And so the really interesting thing I like about this is there are some problems that perhaps require

0
💬 0

7099.691 - 7113.43 Aman Sanger

hundred trillion parameter model intelligence trained on a hundred trillion tokens. Um, but that's like maybe 1%, maybe like 0.1% of all queries. So are you going to spend all of this effort, all of this compute training model, uh,

0
💬 0

7114.551 - 7132.986 Aman Sanger

that costs that much and then run it so infrequently, it feels completely wasteful when instead you get the model that can, that you train the model that's capable of doing the 99.9% of queries, then you have a way of inference time running it longer for those few people that really, really want max intelligence.

0
💬 0

7134.427 - 7147.777 Lex Fridman

How do you figure out which problem requires what level of intelligence? Is that possible to dynamically figure out when to use GPT-4, when to use a small model, and when you need the O1?

0
💬 0

7149.598 - 7170.535 Aman Sanger

I mean, yeah, that's an open research problem, certainly. I don't think anyone's actually cracked this model routing problem quite well. We'd like to. We have initial implementations of this for something like CursorTab. But at the level of going between 4.0 Sonnet to O1, It's a bit trickier.

0
💬 0

7171.255 - 7184.986 Aman Sanger

There's also a question of what level of intelligence do you need to determine if the thing is too hard for the four-level model? Maybe you need the O1-level model. It's really unclear.

0
💬 0

7185.746 - 7196.374 Lex Fridman

But you mentioned there's a pre-training process, then there's post-training, and then there's test-time compute that FAIR does sort of separate. Where's the biggest gains there?

0
💬 0

7197.08 - 7219.466 Aman Sanger

Um, well, it's weird because like test time compute, there's like a whole training strategy needed to get test time to compute to work. And the really, the other really weird thing about this is no one like outside of the big labs and maybe even just open AI, no one really knows how it works. Like there've been some really interesting papers that, uh, show hints of what they might be doing.

0
💬 0

7220.146 - 7231.231 Aman Sanger

And so perhaps they're doing something with tree search using process reward models. But yeah, I just I think the issue is, we don't quite know exactly what it looks like.

0
💬 0

7231.251 - 7243.637 Aman Sanger

So it would be hard to kind of comment on like where it fits in, I would put it in post training, but maybe like the compute spent for this kind of for getting test time compute to work for a model is going to dwarf pre training eventually.

0
💬 0

7245.191 - 7253.143 Lex Fridman

So we don't even know if O1 is using just like chain of thought, RL. We don't know how they're using any of these. We don't know anything.

0
💬 0

7253.684 - 7254.566 Aman Sanger

It's fun to speculate.

0
💬 0

7256.969 - 7260.295 Lex Fridman

Like if you were to build a competing model, what would you do?

0
💬 0

7261.468 - 7281.73 Aman Sanger

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

0
💬 0

7281.79 - 7295.217 Aman Sanger

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

0
💬 0

7295.697 - 7322.752 Aman Sanger

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

0
💬 0

7322.792 - 7345.104 Aman Sanger

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

0
💬 0

7345.144 - 7359.191 Aman Sanger

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

0
💬 0

7360.487 - 7372.439 Lex Fridman

Yeah, when the quality of the branch is somehow strongly correlated with the quality of the outcome at the very end. So you have a good model of knowing which branch to take. So not just in the short term, like in the long term. Yeah.

0
💬 0

7372.864 - 7386.377 Aman Sanger

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

0
💬 0

7387.718 - 7398.068 Aman Sanger

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

0
💬 0

7399.05 - 7414.769 Lex Fridman

This is kind of an AI safety, maybe a bit of a philosophy question. So OpenAI says that they're hiding the chain of thought from the user. And they've said that that was a difficult decision to make. Instead of showing the chain of thought, they're asking the model to summarize the chain of thought.

0
💬 0

7415.646 - 7423.516 Lex Fridman

They're also in the background saying they're going to monitor the chain of thought to make sure the model is not trying to manipulate the user, which is a fascinating possibility.

0
💬 0

7424.096 - 7441.358 Michael Truell

But anyway, what do you think about hiding the chain of thought? One consideration for open AI, and this is completely speculative, could be that they want to make it hard for people to distill these capabilities out of their model. it might actually be easier if you had access to that hidden chain of thought to replicate the technology.

0
💬 0

7441.378 - 7448.129 Michael Truell

Because that's pretty important data, like seeing the steps that the model took to get to the final result. So you could probably train on that also.

0
💬 0

7448.7 - 7470.355 Michael Truell

And there was sort of a mirror situation with this, with some of the large language model providers, and also this is speculation, but some of these APIs used to offer easy access to log probabilities for all the tokens that they're generating, and also log probabilities for the prompt tokens. And then some of these APIs took those away. And again, complete speculation, but...

0
💬 0

7472.416 - 7492.011 Michael Truell

One of the thoughts is that the reason those were taken away is if you have access log probabilities, similar to this hidden train of thought, that can give you even more information to try and distill these capabilities out of the APIs, out of these biggest models, into models you control. As an asterisk on also the previous discussion about us integrating O1.

0
💬 0

7492.511 - 7511.585 Michael Truell

I think that we're still learning how to use this model. So we made O1 available in Cursor because when we got the model, we were really interested in trying it out. I think a lot of programmers are going to be interested in trying it out. But O1 is not part of the default cursor experience in any way yet.

0
💬 0

7512.066 - 7535.446 Michael Truell

And we still haven't found a way to yet integrate it into the editor in a way that we reach for sort of every hour, maybe even every day. And so I think the jury's still out on how to use the model. And we haven't seen examples yet of people releasing things where... It seems really clear, like, oh, that's like now the use case.

0
💬 0

7536.106 - 7548.594 Michael Truell

The obvious one to turn to is maybe this can make it easier for you to have these background things running, right? To have these models in loops, to have these models be agentic. But we're still discovering.

0
💬 0

7548.874 - 7555.779 Sualeh Asif

To be clear, we have ideas. We just need to try and get something incredibly useful before we put it out there.

0
💬 0

7555.939 - 7579.846 Aman Sanger

But it has these significant limitations. Even barring capabilities, it does not stream. And that means it's really, really painful to use for things where you want to supervise the output. And instead, you're just waiting for the wall of text to show up. Also, it does feel like the early innings of test time compute and search, where it's just very, very much a v0.

0
💬 0

7581.006 - 7598.655 Aman Sanger

And there's so many things that... like don't feel quite right. And I suspect in parallel to people increasing the amount of pre-training data and the size of the models and pre-training and finding tricks there, you'll now have this other thread of getting search to work better and better.

0
💬 0

7600.136 - 7621.023 Lex Fridman

So let me ask you about Strawberry tomorrow eyes. So it looks like GitHub copilot might be integrating 01 in some kind of way. And I think some of the comments are saying, does this mean cursor is done? I think I saw one comment saying that.

0
💬 0

7621.423 - 7624.706 Arvid Lunnemark

I saw a time to shut down cursor. Time to shut down cursor, thank you.

0
💬 0

7625.847 - 7642.962 Michael Truell

So it's a time to shut down cursor. I think this space is a little bit different from past software spaces over the 2010s, where I think that the ceiling here is really, really, really incredibly high. And so I think that the best product in three to four years will just be so much more useful than the best product today.

0
💬 0

7643.883 - 7667.139 Michael Truell

And you can wax poetic about moats this and brand that, and this is our advantage. But I think in the end, just if you don't have, if you stop innovating on the product, you will lose. And that's also great for startups. That's great for people trying to enter this market because it means you have an opportunity to win against people who have lots of users already.

0
💬 0

7668.059 - 7683.199 Michael Truell

By just building something better. And so I think, yeah, over the next few years, it's just about building the best product, building the best system. And that both comes down to the modeling engine side of things. And it also comes down to the to the editing experience.

0
💬 0

7683.854 - 7706.632 Aman Sanger

Yeah, I think most of the additional value from Cursor versus everything else out there is not just integrating the new model fast like 01. It comes from all of the kind of depth that goes into these custom models that you don't realize are working for you in kind of every facet of the product, as well as like the really thoughtful UX with every single feature.

0
💬 0

7708.126 - 7714.369 Lex Fridman

All right, from that profound answer, let's descend back down to the technical. You mentioned you have a taxonomy of synthetic data.

0
💬 0

7714.89 - 7715.29 Aman Sanger

Oh, yeah.

0
💬 0

7716.07 - 7716.911 Lex Fridman

Can you please explain?

0
💬 0

7716.931 - 7740.594 Aman Sanger

Yeah, I think there are three main kinds of synthetic data. The first is, so what is synthetic data first? So there's normal data, like non-synthetic data, which is just data that's naturally created, i.e. usually it'll be from humans having done things. So from some human process, you get this data. Synthetic data, the first one would be distillation.

0
💬 0

7741.174 - 7757.182 Aman Sanger

So having a language model kind of output tokens or probability distributions over tokens. And then you can train some less capable model on this. This approach is not gonna get you a net, like more capable model than the original one that has produced the tokens.

0
💬 0

7759.022 - 7777.449 Aman Sanger

but it's really useful for if there's some capability you want to elicit from some really expensive high latency model, you can then distill that down into some smaller task specific model. The second kind is when like one direction of the problem is easier than the reverse.

0
💬 0

7778.389 - 7802.117 Aman Sanger

And so a great example of this is bug detection, like we mentioned earlier, where it's a lot easier to introduce reasonable looking bugs than it is to actually detect them. And this is probably the case for humans too. And so what you can do is you can get a model that's not trained in that much data, that's not that smart, to introduce a bunch of bugs in code.

0
💬 0

7802.137 - 7823.337 Aman Sanger

And then you can use that to then train, use a synthetic data to train a model that can be really good at detecting bugs. The last category, I think, is, I guess, the main one that it feels like the big labs are doing for synthetic data, which is... producing text with language models that can then be verified easily.

0
💬 0

7823.357 - 7838.666 Aman Sanger

So like, you know, extreme example of this is if you have a verification system that can detect if language is Shakespeare level and then you have a bunch of monkeys typing in typewriters, like you can eventually get enough training data to train a Shakespeare level language model.

0
💬 0

7839.146 - 7860.34 Aman Sanger

And I mean, this is the case, like very much the case for math where verification is, is, is actually really, really easy for formal, um, formal languages, and then what you can do is you can have an okay model, generate a ton of rollouts, and then choose the ones that you know have actually proved the ground truth theorems and then train that further.

0
💬 0

7861 - 7876.772 Aman Sanger

There's similar things you can do for code with leetcode-like problems, where if you have some set of tests that you know correspond to, if something passes these tests, it has actually solved the problem. You can do the same thing where you verify that it's passed the test and then train the model and the output set of passed the tests.

0
💬 0

7878.072 - 7895.023 Aman Sanger

I think it's gonna be a little tricky getting this to work in all domains or just in general. Like having the perfect verifier feels really, really hard to do with just like open-ended miscellaneous tasks you give the model or more like long horizon tasks, even in coding.

0
💬 0

7895.443 - 7902.288 Lex Fridman

That's because you're not as optimistic as Arvid. But yeah. So yeah, so that third category requires having a verifier.

0
💬 0

7902.958 - 7916.268 Aman Sanger

Yeah. Verification, it feels like it's best when you know for a fact that it's correct. And then it wouldn't be using a language model to verify. It would be using tests or formal systems. Or running the thing, too.

0
💬 0

7917.029 - 7920.311 Michael Truell

Doing the human form of verification where you just do manual quality control.

0
💬 0

7920.711 - 7921.512 Aman Sanger

Yeah, yeah.

0
💬 0
0
💬 0

7928.26 - 7934.027 Aman Sanger

I think that that's the category that is, um, most likely to result in like massive gains.

0
💬 0

7934.687 - 7944.879 Lex Fridman

What about RL with feedback side, RLHF versus RLAIF? Um, what's the role of that in, um, getting better performance on the models?

0
💬 0

7946.505 - 7966.463 Aman Sanger

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

0
💬 0

7967.843 - 7988.649 Aman Sanger

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

0
💬 0

7988.989 - 8013.903 Aman Sanger

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

0
💬 0

8013.983 - 8040.502 Aman Sanger

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

0
💬 0

8042.022 - 8051.147 Lex Fridman

What's your intuition when you compare generation and verification or generation and ranking? Is ranking way easier than generation?

0
💬 0

8052.228 - 8077.329 Aman Sanger

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

0
💬 0

8077.99 - 8085.717 Lex Fridman

That'd be a whatever Fields Medal by AI. Who gets the credit? Another open philosophical question.

0
💬 0

8085.877 - 8097.347 Sualeh Asif

Whoever prompted it. I'm actually surprisingly curious what a good bet for when AI will get the Fields Medal will be.

0
💬 0

8097.668 - 8098.949 Lex Fridman

Isn't this Amon's specialty?

0
💬 0

8099.77 - 8101.251 Sualeh Asif

I don't know what Amon's bet here is.

0
💬 0

8101.91 - 8103.991 Lex Fridman

Oh, sorry, Nobel Prize or Fields Medal first?

0
💬 0

8104.011 - 8107.394 Sualeh Asif

Fields Medal. Oh, Fields Medal level. Fields Medal comes first, I think.

0
💬 0

8107.634 - 8110.055 Lex Fridman

Fields Medal comes first. Well, you would say that, of course.

0
💬 0

8110.575 - 8113.657 Arvid Lunnemark

But it's also this, like, isolated system in Verify.

0
💬 0

8113.958 - 8116.657 Sualeh Asif

No, sure. Like, I don't even know if I don't need to do.

0
💬 0

8116.717 - 8133.665 Aman Sanger

I feel like I have much more to do there. It felt like the path to get to IMO was a little bit more clear because it already could get a few IMO problems. And there are a bunch of like there's a bunch of low hanging fruit given the literature at the time of like what what tactics people could take. I think I'm one much less versed in the space of theorem proving now.

0
💬 0

8134.145 - 8140.889 Aman Sanger

And two, yeah, less intuition about how close we are to solving these really, really hard open problems.

0
💬 0

8141.969 - 8147.933 Lex Fridman

So you think you'll be Fieldsman at first? It won't be, like, in physics or in... Oh, 100%.

0
💬 0

8148.013 - 8164.263 Sualeh Asif

I think that's probably more likely. Like, it's probably much more likely that it'll get there. Yeah, yeah, yeah. Well, I think it goes to, like, I don't know, like, BSD, which is a bird's-wing-turned-diode conjecture, or, like, Riemann iPods, or any one of these, like, hard, hard math problems that are just, like, actually really hard.

0
💬 0

8164.843 - 8171.287 Sualeh Asif

It's sort of unclear what the path to get even a solution looks like. Like, we don't even know what a path looks like, let alone...

0
💬 0

8172.648 - 8181.716 Arvid Lunnemark

And you don't buy the idea that this is like an isolated system and you can actually, you have a good reward system and it feels like it's easier to train for that.

0
💬 0

8182.417 - 8184.238 Aman Sanger

I think we might get feels metal before AGI.

0
💬 0

8184.919 - 8195.929 Sualeh Asif

I mean, I'd be very happy. I'd be very happy. But I don't know if I, I think 2028, 2030. What feels metal? Feels metal. All right.

0
💬 0

8199.394 - 8219.966 Lex Fridman

It feels like forever from now, given how fast things have been going. Speaking of how fast things have been going, let's talk about scaling laws. So for people who don't know, maybe it's good to talk about this whole idea of scaling laws. What are they? Where do you think stand? And where do you think things are going?

0
💬 0

8219.986 - 8232.046 Aman Sanger

I think it's interesting. The original scaling laws paper by OpenAI was slightly wrong because I think of some issues they did with learning rate schedules. And then Chinchilla showed a more correct version.

0
💬 0

8232.986 - 8258.829 Aman Sanger

And then from then people have again kind of deviated from doing the compute optimal thing because people start now optimizing more so for making the thing work really well given an inference budget. And I think there are a lot more dimensions to these curves than what we originally used of just compute number of parameters and data. like inference compute is the obvious one.

0
💬 0

8258.849 - 8275.153 Aman Sanger

I think context length is another obvious one. So if you care, like, let's say you care about the two things of inference compute and then context window, maybe the thing you want to train is some kind of SSM because they're much, much cheaper and faster at super, super long context.

0
💬 0

8275.493 - 8293.4 Aman Sanger

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

0
💬 0

8293.92 - 8305.907 Lex Fridman

So yeah, I mean, you speak to the multiple dimensions, obviously. The original conception was just looking at the variables of the size of the model as measured by parameters and the size of the data as measured by the number of tokens and looking at the ratio of the two.

0
💬 0
0
💬 0

8306.747 - 8320.325 Lex Fridman

And it's kind of a compelling notion that there is a number. or at least a minimum, and it seems like one was emerging. Do you still believe that there is a kind of bigger is better?

0
💬 0

8322.085 - 8327.168 Aman Sanger

I mean, I think bigger is certainly better for just raw performance.

0
💬 0

8327.949 - 8328.75 Sualeh Asif

And raw intelligence.

0
💬 0

8328.85 - 8343.94 Aman Sanger

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

0
💬 0

8344.72 - 8366.409 Aman Sanger

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

0
💬 0

8369.105 - 8386.136 Aman Sanger

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

0
💬 0

8386.677 - 8389.979 Lex Fridman

And the distillation gives you just a faster model. Smaller means faster.

0
💬 0

8390.179 - 8405.676 Aman Sanger

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

0
💬 0

8405.736 - 8417.559 Aman Sanger

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

0
💬 0

8417.967 - 8435.57 Lex Fridman

So if I gave you $10 trillion, how would you spend it? I mean, you can't buy an island or whatever. How would you allocate it? in terms of improving the big model versus maybe paying for HF in the RLHF?

0
💬 0

8435.87 - 8451.535 Aman Sanger

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

0
💬 0

8452.555 - 8475.724 Aman Sanger

uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and

0
💬 0

8476.977 - 8483.359 Lex Fridman

If we look in how to invest money for the next five years in terms of maximizing what you called raw intelligence.

0
💬 0

8483.8 - 8500.766 Sualeh Asif

I mean, isn't the answer like really simple. You just, you just try to get as much compute as possible. Like, like at the end of the day, all you need to buy is the GPUs. And then the researchers can find, find all the, all like they can sort of, you know, you can tune whether you want to pre-train a big model or a small model. Like,

0
💬 0

8501.533 - 8506.978 Aman Sanger

Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?

0
💬 0

8508.58 - 8518.45 Sualeh Asif

I'm more privy to Arvid's belief that we're sort of idea limited, but there's always... But if you have a lot of compute, you can run a lot of experiments.

0
💬 0

8519.07 - 8524.494 Lex Fridman

So you would run a lot of experiments versus like use that computer to train a gigantic model.

0
💬 0

8525.014 - 8530.778 Arvid Lunnemark

I would, but I do believe that we are limited in terms of ideas that we have.

0
💬 0

8531.018 - 8553.676 Aman Sanger

I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.

0
💬 0

8553.696 - 8563.065 Aman Sanger

And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...

0
💬 0

8563.968 - 8584.52 Aman Sanger

kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?

0
💬 0

8584.78 - 8602.446 Aman Sanger

Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.

0
💬 0

8603.527 - 8628.837 Aman Sanger

There's just so much engineering effort that has to go into all of these things to make it work. If you really brought that cost down to... like, you know, maybe not zero, but just made it 10X easier, made it super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting 50, 40% utilization on the GPUs.

0
💬 0

8629.258 - 8632.942 Aman Sanger

I think that would just speed up research by a ton.

0
💬 0

8634.005 - 8660.231 Sualeh Asif

I mean, I think if you see a clear path to improvement, you should always sort of take the low-hanging fruit first, right? And I think probably OpenAI and all the other labs did the right thing to pick off the low-hanging fruit, where the low-hanging fruit is like sort of... You could scale up to a GPT 4.25 scale and you just keep scaling and things keep getting better.

0
💬 0

8661.872 - 8681.503 Sualeh Asif

There's no point of experimenting with new ideas when everything is working. And you should sort of bang on and try to get as much juice out of it as possible. And then maybe when you really need new ideas for... I think if you're spending $10 trillion, you probably want to spend some... Then actually re-evaluate your ideas. Probably your idea limited at that point.

0
💬 0

8681.923 - 8701.238 Aman Sanger

I think all of us believe new ideas are probably needed to get all the way there to HEI. And... All of us also probably believe there exist ways of testing out those ideas at smaller scales and being fairly confident that they'll play out.

0
💬 0

8702.198 - 8720.731 Aman Sanger

It's just quite difficult for the labs in their current position to dedicate their very limited research and engineering talent to exploring all these other ideas when there's this core thing that will probably improve performance for some decent amount of time.

0
💬 0

8722.509 - 8746.284 Lex Fridman

Yeah, but also these big labs like winning. So they're just going wild. Okay. So big question looking out into the future. You're now at the center of the programming world. How do you think programming, the nature of programming changes in the next... few months, in the next year, in the next two years, next five years, ten years?

0
💬 0

8747.264 - 8768.428 Michael Truell

I think we're really excited about a future where the programmers in the driver's seat for a long time. And you've heard us talk about this a little bit, but one that emphasizes speed and agency for the programmer and control, the ability to modify anything you want to modify, the ability to iterate really fast on what you're building. And

0
💬 0

8769.821 - 8795.962 Michael Truell

This is a little different, I think, than where some people are jumping to in this space, where I think one idea that's captivated people is, can you talk to your computer? Can you have it build software for you as if you're talking to an engineering department or an engineer over Slack? And can it just be this sort of isolated text box? And part of the reason we're not excited about that

0
💬 0

8797.328 - 8808.177 Michael Truell

is some of the stuff we've talked about with latency. But then a big reason we're not excited about that is because that comes with giving up a lot of control. It's much harder to be really specific when you're talking in the text box.

0
💬 0

8808.978 - 8823.807 Michael Truell

And if you're necessarily just going to communicate with a thing like you would be communicating with an engineering department, you're actually abdicating tons and tons of really important decisions to this bot. And this kind of gets at fundamentally what engineering is.

0
💬 0

8825.028 - 8842.434 Michael Truell

I think that some people who are a little bit more removed from engineering might think of it as, you know, the spec is completely written out and then the engineers just come and they just implement. And it's just about making the thing happen in code and making the thing exist. But I think a lot of the best engineering, the engineering we enjoy,

0
💬 0

8844.02 - 8854.024 Michael Truell

involves tons of tiny micro decisions about what exactly you're building and about really hard trade-offs between, you know, speed and cost and just all the other things involved in a system.

0
💬 0

8854.944 - 8876.235 Michael Truell

And we want, as long as humans are actually the ones making, you know, designing the software and the ones specifying what they want to be built, and it's not just like company run by all AIs, we think you'll really want the human in a driver's seat. dictating these decisions. And so the jury's still out on kind of what that looks like.

0
💬 0

8877.076 - 8902.492 Michael Truell

I think that one weird idea for what that could look like is it could look like you can control the level of abstraction you view a codebase at. And you can point at specific parts of a codebase that maybe you digest a codebase by looking at it in the form of pseudocode. And you can actually edit that pseudo code too, and then have changes get me down at the sort of formal programming level.

0
💬 0

8902.992 - 8920.676 Michael Truell

And you keep the, like, you know, you can gesture at any piece of logic in your software component of programming. You keep the inflow text editing component of programming. You keep the control of, you can even go down into the code. You can go at higher levels of abstraction while also giving you these big productivity gains.

0
💬 0

8920.996 - 8924.857 Lex Fridman

It'd be nice if you can go up and down the abstraction stack. Yeah.

0
💬 0

8925.168 - 8933.91 Michael Truell

And there are a lot of details to figure out there that's sort of like a fuzzy idea. Time will tell if it actually works. But these principles of control and speed in the human and the driver's seat, we think are really important.

0
💬 0

8935.07 - 8949.154 Michael Truell

We think for some things, like Arvid mentioned before, for some styles of programming, you can kind of hand it off chatbot style, you know, if you have a bug that's really well specified. But that's not most of programming. And that's also not most of the programming we think a lot of people value.

0
💬 0

8950.114 - 8969.771 Lex Fridman

What about like the fundamental skill of programming? There's a lot of people like, young people right now kind of scared, like thinking, because they like love programming, but they're scared about like, will I be able to have a future if I pursue this career path? Do you think the very skill of programming will change fundamentally?

0
💬 0

8970.444 - 8999.295 Michael Truell

I actually think this is a really, really exciting time to be building software. Like, we remember what programming was like in, you know, 2013, 2012, whatever it was. And there was just so much more cruft and boilerplate and, you know... looking up something really gnarly. That stuff still exists. It's definitely not at zero. But programming today is way more fun than back then.

0
💬 0

8999.495 - 9019.144 Michael Truell

We're really getting down to the delight concentration. All the things that really draw people to programming, for instance, this element of being able to build things really fast and speed and also individual control, all those are just being turned up a ton. Um, and so I think it's just gonna be, I think it's gonna be a really, really fun time for people who build software.

0
💬 0

9019.804 - 9040.19 Michael Truell

Um, I think that the skills will probably change too. I, I think that people's taste in creative ideas will be magnified and it will be less about maybe less a little bit about boilerplate text editing, maybe even a little bit less about carefulness, which I think is really important today. If you're a programmer, I think it'll be a lot more fun. What do you guys think?

0
💬 0

9041.536 - 9063.282 Arvid Lunnemark

I agree. I'm very excited to be able to change. One thing that happened recently was we wanted to do a relatively big migration to our codebase. We were using async local storage in Node.js, which is known to be not very performant, and we wanted to migrate to our context object. And this is a big migration and affects the entire codebase.

0
💬 0

9064.022 - 9088.383 Arvid Lunnemark

And Swal and I spent, I don't know, five days working through this, even with today's AI tools. And I am really excited for a future where I can just show a couple of examples and then the AI applies that to all of the locations. And then it highlights, oh, this is a new example. Like, what should I do? And then I show exactly what to do there. And then that can be done in like 10 minutes.

0
💬 0

9089.203 - 9107.098 Arvid Lunnemark

And then you can iterate much, much faster. Then you can... then you don't have to think as much upfront and stand at the blackboard and think exactly like, how are we going to do this? Because the cost is so high. But you can just try something first and you realize, oh, this is not actually exactly what I want. And then you can change it instantly again after.

0
💬 0

9107.178 - 9112.582 Arvid Lunnemark

And so, yeah, I think being a programmer in the future is going to be a lot of fun.

0
💬 0
0
💬 0

9113.803 - 9117.807 Aman Sanger

I really like that point about, it feels like a lot of the time with programming, they're

0
💬 0

9118.527 - 9140.869 Aman Sanger

two ways you can go about it one is like you think really hard carefully up front about the best possible way to do it and then you spend your limited time of engineering to actually implement it uh but i much prefer just getting in the code and like you know taking a crack at it seeing how it kind of lays out and then iterating really quickly on that that feels more fun um

0
💬 0

9142.21 - 9166.168 Lex Fridman

Yeah, like just speaking to generate the boilerplate is great. So you just focus on the difficult design, nuanced difficult design decisions. Migration, I feel like this is a cool one. Like it seems like large language model is able to basically translate from one program language to another or like translate, like migrate in the general sense of what migrate is. But that's in the current moment.

0
💬 0

9167.089 - 9185.549 Lex Fridman

So I mean, the fear has to do with like, okay, as these models get better and better, then you're doing less and less creative decisions. And is it going to kind of move to a place where you're operating in the design space of natural language, where natural language is the main programming language? And I guess I could ask that by way of advice.

0
💬 0

9185.609 - 9199.114 Lex Fridman

Like, if somebody's interested in programming now, what do you think they should learn? Like, you guys started in Java, and... I forget the other. Oh, some PHP.

0
💬 0

9199.414 - 9200.075 Michael Truell

Objective-C.

0
💬 0

9200.495 - 9221.949 Lex Fridman

Objective-C. There you go. I mean, in the end, we all know JavaScript is going to win. And not TypeScript. It's going to be like vanilla JavaScript. It's going to eat the world. And maybe a little bit of PHP. And, I mean, it also brings up the question of, like, I think Don Knuth has this idea that some percent of the population is geeks.

0
💬 0

9223.072 - 9237.015 Lex Fridman

And like there's a particular kind of psychology in mind required for programming. And it feels like more and more that expands. The kind of person that should be able to can do great programming might expand.

0
💬 0

9238.995 - 9252.867 Aman Sanger

I think different people do programming for different reasons. But I think the true, maybe like the best programmers are the ones that really love just like absolutely love programming.

0
💬 0

9252.907 - 9278.594 Aman Sanger

For example, there are folks on our team who literally when they get back from work, they go and then they boot up Cursor and then they start coding on their side projects for the entire night and they stay up till 3 a.m. doing that. And when they're sad, they said, I just really need to code. And I think like,

0
💬 0

9280.114 - 9294.977 Aman Sanger

You know, there's that level of programmer where like this obsession and love of programming, I think makes really the best programmers. And I think these types of people will really get into the details of how things work.

0
💬 0

9295.817 - 9308.119 Lex Fridman

I guess the question I'm asking, that exact program, let's think about that person. When the super tab, the super awesome praise be the tab succeeds, and you keep pressing tab,

0
💬 0

9308.795 - 9311.738 Sualeh Asif

that person in the team loves to curse a tab more than anybody else.

0
💬 0

9312.118 - 9331.154 Arvid Lunnemark

Yeah. And it's also not just like, like pressing tab is like the, just pressing tab. That's like the easy way to say it. And the, the catch catchphrase, you know, uh, but what you're actually doing when you're pressing tab is that you're, you're injecting intent, uh, all the time while you're doing it. You're, you're, uh, sometimes you're rejecting it. Sometimes you're typing a few more characters.

0
💬 0

9331.774 - 9343.814 Arvid Lunnemark

Um, and, and that's the way that you're, um, you're sort of shaping the things that's being created. And I think programming will change a lot to just what is it that you want to make?

0
💬 0

9343.974 - 9353.597 Sualeh Asif

It's sort of higher bandwidth. The communication to the computer just becomes higher and higher bandwidth as opposed to just typing is much lower bandwidth than communicating intent.

0
💬 0

9354.09 - 9376.33 Lex Fridman

I mean, this goes to your manifesto titled Engineering Genius. We are an applied research lab building extraordinary productive human AI systems. So speaking to this hybrid element. To start, we're building the engineer of the future, a human AI programmer that's an order of magnitude more effective than any one engineer.

0
💬 0

9377.071 - 9398.323 Lex Fridman

This hybrid engineer will have effortless control over their code base and no low entropy keystrokes. They will iterate at the speed of their judgment, even in the most complex systems. Using a combination of AI and human ingenuity, They will outsmart and out-engineer the best pure AI systems. We are a group of researchers and engineers.

0
💬 0

9398.423 - 9413.377 Lex Fridman

We build software and models to invent at the edge of what's useful and what's possible. Our work has already improved the lives of hundreds of thousands of programmers. And on the way to that, we'll at least make programming more fun. So thank you for talking today. Thank you.

0
💬 0

9413.537 - 9414.598 Michael Truell

Thanks for having us. Thank you.

0
💬 0

9416.291 - 9441.427 Lex Fridman

Thanks for listening to this conversation with Michael, Swale, Arvid, and Aman. To support this podcast, please check out our sponsors in the description. And now, let me leave you with a random, funny, and perhaps profound programming quote I saw on Reddit. Nothing is as permanent as a temporary solution that works. Thank you for listening, and hope to see you next time.

0
💬 0
Comments

There are no comments yet.

Please log in to write the first comment.