Menu
Sign In Pricing Add Podcast

Aman Sanger

Appearances

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1074.981

Yeah, I distinctly remember there was this one conversation I had with Michael where before I hadn't thought super deeply and critically about scaling laws. And he kind of posed the question, why isn't scaling all you need or why isn't scaling going to result in massive gains in progress? And I think I went through like the stages of grief.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1095.798

There is anger, denial, and then finally at the end, just thinking about it, acceptance. And I think I've been quite hopeful and optimistic about progress since. I think one thing I'll caveat is I think it also depends on like which domains you're going to see progress.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1114.487

Like math is a great domain because especially like formal theorem proving because you get this fantastic signal of actually verifying if the thing was correct. And so this means something like RL can work really, really well. And I think like you could have systems that are perhaps very superhuman in math and still not technically have AGI.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1246.614

Yeah, I mean, I think this is a space that is quite interesting, perhaps quite unique, where if you look at previous tech waves, maybe there's kind of one major thing that happened and it unlocked a new wave of companies.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1260.54

But every single year, every single model capability or jump you get in model capabilities, you now unlock this new wave of features, things that are possible, especially in programming. And so I think in AI programming, being even just a few months ahead, let alone a year ahead, makes your product much, much, much more useful.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1282.092

I think the cursor a year from now will need to make the cursor of today look obsolete. And I think, you know, Microsoft has done a number of like fantastic things, but I don't think they're in a great place to really keep innovating and pushing on this in the way that a startup can. Just rapidly implementing features.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1302.185

And kind of doing the research experimentation necessary to really push the ceiling.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1430.682

Often the same person even.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1628.824

There's this interesting thing where if you look at language model loss on different domains, I believe the bits per byte, which is kind of character normalized loss for code is lower than language, which means in general, there are a lot of tokens in code that are super predictable, a lot of characters that are super predictable.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1647.436

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1677.853

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1701.385

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1719.618

These are two, I think, important pieces of what make it quite high quality and very fast.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1737.552

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1763.526

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

1977.997

This is what we're talking about.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2070.243

And there's a chance this is also not the final version of it.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2206.451

Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2230.359

it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2247.762

I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2268.506

And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2397.554

Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2416.741

The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2444.238

You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2489.501

Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2515.735

So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2540.649

And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2565.904

Fast is always an interesting detail. Fast is good.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2571.465

Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2596.642

If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2615.752

So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2635.541

So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2654.085

where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2675.33

So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2738.408

Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2767.578

Our one's really interesting and it's really good at reasoning. So if you give it really hard, uh, programming interview style problems or lead code problems. It can do quite, quite well on them. But it doesn't feel like it kind of understands your rough intent as well as Sonnet does.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2787.777

Like, if you look at a lot of the other frontier models, one qualm I have is it feels like they're not necessarily over, I'm not saying they train on benchmarks. But they perform really well in benchmarks relative to kind of everything that's kind of in the middle.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2804.149

So if you try it in all these benchmarks and things that are in the distribution of the benchmarks they're evaluated on, you know, they'll do really well. But when you push them a little bit outside of that, Sonnet's I think the one that kind of does best at kind of maintaining that same capability.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2817.795

Like you kind of have the same capability in the benchmark as when you try to instruct it to do anything with coding.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2959.731

Yeah, like in that case, it could be trained on the literal issues or pull requests themselves. And maybe the labs will start to do a better job, or they've already done a good job at decontaminating those things. But they're not going to emit the actual training data of the repository itself. Like these are all like some of the most popular Python repositories, like SymPy is one example.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

2981.096

I don't think they're going to handicap their models on SymPy and all these popular Python repositories in order to get true evaluation scores in these benchmarks.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3060.356

Yeah, with Claude, there's an interesting take I heard where I think AWS has different chips. And I suspect they have slightly different numerics than NVIDIA GPUs. And someone speculated that Claude's degraded performance had to do with maybe using the quantized version that existed on AWS Bedrock versus whatever was running on Anthropix GPUs.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3343.416

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3425.322

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3454.935

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3795.515

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3822.241

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3839.395

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3858.988

And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3885.698

That's a lot of matrix multiplies that happen, and that is really, really slow. Instead, if you have already done that, and you stored the keys and values, and you keep that in the GPU...

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3896.884

Then when I'm, let's say I have sorted for the last n tokens, if I now want to compute the output token for the n plus one token, I don't need to pass those first n tokens through the entire model because I already have all those keys and values. And so you just need to do the forward pass through that last token.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3915.854

And then when you're doing attention, you're reusing those keys and values that have been computed, which is the only kind of sequential part or sequentially dependent part of the transformer.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3933.003

Yeah, that that there's other types of caching you can kind of do. One interesting thing that you can do for cursor tab is you can basically predict ahead as if the user would have accepted the suggestion and then trigger another request. And so then you've cached, you've done this speculative, it's a mix of speculation and caching, right?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

3957.313

Because you're speculating what would happen if they accepted it. And then you have this value that is cached, this suggestion. And then when they press tab, the next one would be waiting for them immediately. It's a kind of clever heuristic slash trick. that uses a higher level caching and can give the... It feels fast despite there not actually being any changes in the model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4095

Yeah, it is a little different than speed. But, I mean, like, technically you tie it back in because you can get away with the smaller model if you RL your smaller model and it gets the same performance as the bigger one. That's, like, and while I was mentioning stuff about... about reducing the size of your KB cache. There are other techniques there as well that are really helpful for speed.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4119.066

So kind of back in the day, like all the way two years ago, people mainly use multi-head attention. And I think there's been a migration towards more efficient attention schemes like group query or multi-query attention. And this is really helpful for then with larger batch sizes, being able to generate the tokens much faster.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4143.041

The interesting thing here is this now has no effect on that time to first token pre-fill speed. The thing this matters for is now generating tokens. And why is that? Because when you're generating tokens, instead of... being bottlenecked by doing these super-paralyzable matrix multiplies across all your tokens.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4165.302

You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4185.616

Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4207.662

With group query, you instead preserve all the query heads, and then your keys and values are kind of... There are fewer heads for the keys and values, but you're not reducing it to just one. But anyways, the whole point here is you're just reducing the size of your KV cache.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4228.814

Yeah, multi-latent. That's a little more complicated. And the way that this works is it kind of turns the entirety of your keys and values across all your heads into this kind of one latent vector that is then kind of expanded inference time.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4344.016

what i mean ultimately how does that map to the user experience trying to get the yeah the two things that it maps to is you can now make your cash a lot larger because you've less space allocated for the kv cash you can maybe cash a lot more aggressively and a lot more things so you get more cash hits which are helpful for reducing the time to first token for the reasons that were kind of described earlier and then the second being when you

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4368.678

start doing inference with more and more requests and larger and larger batch sizes, you don't see much of a slowdown in as it's generating the tokens, the speed of that.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4380.668

Yeah. So like the basic, the size of your KV cache is both the size of all your prompts multiplied by the number of prompts being processed in parallel. So you could increase either those dimensions, right? The batch size or the size of your prompts without degrading the latency of generating tokens.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4643.554

One maybe hacky but interesting idea that I like is holding a lock on saving. And so basically, you can then have the language model kind of hold the lock on saving to disk.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4653.838

And then instead of you operating in the ground truth version of the files that are saved to disk, you actually are operating what was the shadow workspace before and these unsaved things that only exist in memory that you still get linter errors for and you can

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4664.762

code in and then when you try to maybe run code it's just like there's a small warning that there's a lock and then you kind of will take back the lock from the language server if you're trying to do things concurrently or from the shadow workspace if you're trying to do things concurrently that's such an exciting future by the way it's a bit of a tangent but like to allow a model to change files

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4698.422

Yeah. And I think there may be different versions of like run ability where you For the simple things where you're doing things in the span of a few minutes on behalf of the user as they're programming, it makes sense to make something work locally in their machine.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4711.673

I think for the more aggressive things where you're making larger changes that take longer periods of time, you'll probably want to do this in some sandbox remote environment. And that's another incredibly tricky problem of how do you

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4724.105

exactly reproduce or mostly reproduce to the point of it being effectively equivalent for running code the user's environment with this remote remote sandbox i'm curious what kind of agency you want for for coding did you want them to find bugs do you want them to like implement new features like what agency you want

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4806.418

Yeah. I mean, it's really interesting that these models are so bad at bug finding when just naively prompted to find a bug. They're incredibly poorly calibrated.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4819.122

Exactly. Even 01. How do you explain that?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4825.086

I think these models are really strong reflection of the pre-training distribution. And, you know, I do think they, they generalize as the loss gets lower and lower, but I don't think the, the loss and the scale is quite, or the loss is low enough such that they're like really fully generalizing in code. Like the things that we use these things for, uh, the frontier models, uh,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4846.604

that they're quite good at are really code generation and question answering. And these things exist in massive quantities and pre-training with all of the code on GitHub on the scale of many, many trillions of tokens and questions and answers on things like stack overflow and maybe GitHub issues. And so when you try to push into these things that really don't exist, uh,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4871.075

very much online, like, for example, the cursor tap objective of predicting the next edit given the edits done so far. The brittleness kind of shows. And then bug detection is another great example where there aren't really that many examples of actually detecting real bugs and then proposing fixes. And the models just really struggle at it. But I think it's a question of transferring the model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4895

In the same way that you get this fantastic transfer from pre-trained models just on code in general to the cursor tab objective, you'll see a very, very similar thing with generalized models that are really good at code to bug detection. It just takes a little bit of nudging in that direction.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

4995.612

how paranoid is the user? But even then, if you're putting in a maximum paranoia, it still just doesn't quite get it.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5099.468

bring down you know like just bring down the server and like you like of course we like test a lot and whatever but there's always these things that you have to be very careful yeah like with just normal doc strings i think people will often just skim it when making a change and think oh this i know how to do this um and you kind of really need to point it out to them so that doesn't slip through yeah you have to be reminded that you can do a lot of damage

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5145.264

But concretely, what do you think that future would look like?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5222.392

Yeah. The worry is there's this massive document. Replacing something like unit tests, sure.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5287.34

or how do you handle i guess external dependencies like calling the stripe api maybe stripe would write a spec for but like you can't do this for everything like can you do this for everything you use like how do you how do you do it for if there's a language like maybe maybe like people will use language models as primitives in the programs they write and there's like a dependence on it and like how how do you now include that i think you might be able to prove prove that still

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5312.267

Prove what about language models?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5646.476

Yeah, I was going to say the moment that feels like people do this is when they share it, when they have this fantastic example, they just kind of share it with their friends.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5701.628

You can use terminal context as well inside of check, mank, kind of everything. We don't have the looping part yet, though we suspect something like this could make a lot of sense. There's a question of whether it happens in the foreground too, or if it happens in the background, like what we've been discussing.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

5796.908

It would be really interesting if you could branch a file system.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6081.201

Yeah, it's called the Merkel tree.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6134.75

Yeah. And there are a lot of clever things, like additional things that go into this indexing system. For example, the bottleneck in terms of costs is not storing things in the vector database or the database. It's actually embedding the code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6147.396

And you don't want to re-embed the code base for every single person in a company that is using the same exact code, except for maybe they're in a different branch with a few different files or they've made a few local changes.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6158.725

And so because, again, embeddings are the bottleneck, you can do one clever trick and not have to worry about the complexity of dealing with branches and the other databases where you just... have some cache on the actual vectors computed from the hash of a given chunk. And so this means that when the nth person at a company goes and invents their code base, it's really, really fast.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6183.083

And you do all this without actually storing any code on our servers at all. No code data is stored. We just store the vectors in the vector database and the vector cache.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6238.944

I think, like you mentioned, in the future, I think this is only going to get more and more powerful where we're working a lot on improving the quality of our retrieval. And I think the ceiling for that is really, really much higher than people give it credit for.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6367.193

Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6388.739

One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6415.262

It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6671.066

Why do you think it's different than cloud providers?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6858.905

Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6882.508

So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6905.151

It's an open research question, one that we're quite interested in. And then there's also uncertainty of like, do you want the model to be the thing that end to end is doing everything, i.e. it's doing the retrieval and its internals and then kind of answering the question, creating the code?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6918.236

Or do you want to separate the retrieval from the frontier model where maybe, you know, you'll get some really capable models that are much better than like the best open source ones in a handful of months? Yeah. And then you'll want to separately train a really good open source model to be the retriever, to be the thing that feeds in the context to these larger models.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6948.751

Is this... Yeah, I mean, there are many possible ways you could try doing it. There's certainly no shortage of ideas. It's just a question of going in and trying all of them and being empirical about which one works best. One very naive thing is to try to replicate what's done with VS Code and these frontier models.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6969.429

So let's continue pre-training, some kind of continued pre-training that includes general code data, but also throws in a lot of the data of some particular repository that you care about.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

6979.498

And then in post-training, meaning in, let's just start with instruction fine-tuning, you have like a normal instruction fine-tuning data set about code, but you throw in a lot of questions about code in that repository. So you could either get ground truth ones, which might be difficult, or you could do what you kind of hinted at or suggested using synthetic data, i.e.,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7002.344

kind of having the model ask questions about various pieces of the code. So you kind of take the pieces of the code, then prompt the model or have a model propose a question for that piece of code, and then add those as instruction finds new data points. And then in theory, this might unlock the model's ability to answer questions about that code base.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7033.673

I think test time compute is really, really interesting. So there's been the pre-training regime, which will kind of, as you scale up the amount of data and the size of your model, get you better and better performance, both on loss and then on downstream benchmarks and just general performance when we use it for coding or other tasks.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7055.327

We're starting to hit a bit of a data wall, meaning it's going to be hard to continue scaling up this regime.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7062.473

And so scaling up test time compute is an interesting way of now, you know, increasing the number of inference time flops that we use, but still getting like, like, yeah, as you increase the number of flops use inference time getting corresponding improvements in the performance of these models tremendously.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7079.787

Traditionally, we just had to literally train a bigger model that always used that many more flops. But now we could perhaps use the same size model and run it for longer to be able to get an answer at the quality of a much larger model. And so the really interesting thing I like about this is there are some problems that perhaps require

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7099.691

hundred trillion parameter model intelligence trained on a hundred trillion tokens. Um, but that's like maybe 1%, maybe like 0.1% of all queries. So are you going to spend all of this effort, all of this compute training model, uh,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7114.551

that costs that much and then run it so infrequently, it feels completely wasteful when instead you get the model that can, that you train the model that's capable of doing the 99.9% of queries, then you have a way of inference time running it longer for those few people that really, really want max intelligence.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7149.598

I mean, yeah, that's an open research problem, certainly. I don't think anyone's actually cracked this model routing problem quite well. We'd like to. We have initial implementations of this for something like CursorTab. But at the level of going between 4.0 Sonnet to O1, It's a bit trickier.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7171.255

There's also a question of what level of intelligence do you need to determine if the thing is too hard for the four-level model? Maybe you need the O1-level model. It's really unclear.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

719.561

Yeah, so... I think a lot of us, well, all of us were originally Vim users.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7197.08

Um, well, it's weird because like test time compute, there's like a whole training strategy needed to get test time to compute to work. And the really, the other really weird thing about this is no one like outside of the big labs and maybe even just open AI, no one really knows how it works. Like there've been some really interesting papers that, uh, show hints of what they might be doing.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7220.146

And so perhaps they're doing something with tree search using process reward models. But yeah, I just I think the issue is, we don't quite know exactly what it looks like.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7231.251

So it would be hard to kind of comment on like where it fits in, I would put it in post training, but maybe like the compute spent for this kind of for getting test time compute to work for a model is going to dwarf pre training eventually.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7253.684

It's fun to speculate.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7261.468

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

727.015

Pure Vim, yeah. No NeoVim, just pure Vim in a terminal. And at least for myself, it was around the time that Copilot came out. So 2021. that I really wanted to try it. So I went into VS Code, the only platform, the only code editor in which it was available. And even though I really enjoyed using Vim, just the experience of Copilot with VS Code was more than good enough to convince me to switch.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7281.79

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7295.697

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7322.792

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7345.144

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7372.864

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7387.718

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7555.939

But it has these significant limitations. Even barring capabilities, it does not stream. And that means it's really, really painful to use for things where you want to supervise the output. And instead, you're just waiting for the wall of text to show up. Also, it does feel like the early innings of test time compute and search, where it's just very, very much a v0.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

757.209

And so that kind of was the default until we started working on Cursor.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7581.006

And there's so many things that... like don't feel quite right. And I suspect in parallel to people increasing the amount of pre-training data and the size of the models and pre-training and finding tricks there, you'll now have this other thread of getting search to work better and better.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7683.854

Yeah, I think most of the additional value from Cursor versus everything else out there is not just integrating the new model fast like 01. It comes from all of the kind of depth that goes into these custom models that you don't realize are working for you in kind of every facet of the product, as well as like the really thoughtful UX with every single feature.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7716.931

Yeah, I think there are three main kinds of synthetic data. The first is, so what is synthetic data first? So there's normal data, like non-synthetic data, which is just data that's naturally created, i.e. usually it'll be from humans having done things. So from some human process, you get this data. Synthetic data, the first one would be distillation.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7741.174

So having a language model kind of output tokens or probability distributions over tokens. And then you can train some less capable model on this. This approach is not gonna get you a net, like more capable model than the original one that has produced the tokens.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7759.022

but it's really useful for if there's some capability you want to elicit from some really expensive high latency model, you can then distill that down into some smaller task specific model. The second kind is when like one direction of the problem is easier than the reverse.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7778.389

And so a great example of this is bug detection, like we mentioned earlier, where it's a lot easier to introduce reasonable looking bugs than it is to actually detect them. And this is probably the case for humans too. And so what you can do is you can get a model that's not trained in that much data, that's not that smart, to introduce a bunch of bugs in code.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7802.137

And then you can use that to then train, use a synthetic data to train a model that can be really good at detecting bugs. The last category, I think, is, I guess, the main one that it feels like the big labs are doing for synthetic data, which is... producing text with language models that can then be verified easily.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7823.357

So like, you know, extreme example of this is if you have a verification system that can detect if language is Shakespeare level and then you have a bunch of monkeys typing in typewriters, like you can eventually get enough training data to train a Shakespeare level language model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7839.146

And I mean, this is the case, like very much the case for math where verification is, is, is actually really, really easy for formal, um, formal languages, and then what you can do is you can have an okay model, generate a ton of rollouts, and then choose the ones that you know have actually proved the ground truth theorems and then train that further.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7861

There's similar things you can do for code with leetcode-like problems, where if you have some set of tests that you know correspond to, if something passes these tests, it has actually solved the problem. You can do the same thing where you verify that it's passed the test and then train the model and the output set of passed the tests.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7878.072

I think it's gonna be a little tricky getting this to work in all domains or just in general. Like having the perfect verifier feels really, really hard to do with just like open-ended miscellaneous tasks you give the model or more like long horizon tasks, even in coding.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7902.958

Yeah. Verification, it feels like it's best when you know for a fact that it's correct. And then it wouldn't be using a language model to verify. It would be using tests or formal systems. Or running the thing, too.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7928.26

I think that that's the category that is, um, most likely to result in like massive gains.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7946.505

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7967.843

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7988.989

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8013.983

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8052.228

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8116.717

I feel like I have much more to do there. It felt like the path to get to IMO was a little bit more clear because it already could get a few IMO problems. And there are a bunch of like there's a bunch of low hanging fruit given the literature at the time of like what what tactics people could take. I think I'm one much less versed in the space of theorem proving now.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8134.145

And two, yeah, less intuition about how close we are to solving these really, really hard open problems.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8182.417

I think we might get feels metal before AGI.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8219.986

I think it's interesting. The original scaling laws paper by OpenAI was slightly wrong because I think of some issues they did with learning rate schedules. And then Chinchilla showed a more correct version.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8232.986

And then from then people have again kind of deviated from doing the compute optimal thing because people start now optimizing more so for making the thing work really well given an inference budget. And I think there are a lot more dimensions to these curves than what we originally used of just compute number of parameters and data. like inference compute is the obvious one.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8258.849

I think context length is another obvious one. So if you care, like, let's say you care about the two things of inference compute and then context window, maybe the thing you want to train is some kind of SSM because they're much, much cheaper and faster at super, super long context.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8275.493

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8322.085

I mean, I think bigger is certainly better for just raw performance.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8328.85

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8344.72

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8369.105

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8390.179

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8405.736

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8435.87

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8452.555

uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8501.533

Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8531.018

I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8553.696

And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8563.968

kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8584.78

Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8603.527

There's just so much engineering effort that has to go into all of these things to make it work. If you really brought that cost down to... like, you know, maybe not zero, but just made it 10X easier, made it super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting 50, 40% utilization on the GPUs.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8629.258

I think that would just speed up research by a ton.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8681.923

I think all of us believe new ideas are probably needed to get all the way there to HEI. And... All of us also probably believe there exist ways of testing out those ideas at smaller scales and being fairly confident that they'll play out.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8702.198

It's just quite difficult for the labs in their current position to dedicate their very limited research and engineering talent to exploring all these other ideas when there's this core thing that will probably improve performance for some decent amount of time.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

9113.803

I really like that point about, it feels like a lot of the time with programming, they're

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

9118.527

two ways you can go about it one is like you think really hard carefully up front about the best possible way to do it and then you spend your limited time of engineering to actually implement it uh but i much prefer just getting in the code and like you know taking a crack at it seeing how it kind of lays out and then iterating really quickly on that that feels more fun um

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

9238.995

I think different people do programming for different reasons. But I think the true, maybe like the best programmers are the ones that really love just like absolutely love programming.

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

9252.907

For example, there are folks on our team who literally when they get back from work, they go and then they boot up Cursor and then they start coding on their side projects for the entire night and they stay up till 3 a.m. doing that. And when they're sad, they said, I just really need to code. And I think like,

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

9280.114

You know, there's that level of programmer where like this obsession and love of programming, I think makes really the best programmers. And I think these types of people will really get into the details of how things work.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1074.981

Yeah, I distinctly remember there was this one conversation I had with Michael where before I hadn't thought super deeply and critically about scaling laws. And he kind of posed the question, why isn't scaling all you need or why isn't scaling going to result in massive gains in progress? And I think I went through like the stages of grief.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1095.798

There is anger, denial, and then finally at the end, just thinking about it, acceptance. And I think I've been quite hopeful and optimistic about progress since. I think one thing I'll caveat is I think it also depends on like which domains you're going to see progress.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1114.487

Like math is a great domain because especially like formal theorem proving because you get this fantastic signal of actually verifying if the thing was correct. And so this means something like RL can work really, really well. And I think like you could have systems that are perhaps very superhuman in math and still not technically have AGI.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1246.614

Yeah, I mean, I think this is a space that is quite interesting, perhaps quite unique, where if you look at previous tech waves, maybe there's kind of one major thing that happened and it unlocked a new wave of companies.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1260.54

But every single year, every single model capability or jump you get in model capabilities, you now unlock this new wave of features, things that are possible, especially in programming. And so I think in AI programming, being even just a few months ahead, let alone a year ahead, makes your product much, much, much more useful.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1282.092

I think the cursor a year from now will need to make the cursor of today look obsolete. And I think, you know, Microsoft has done a number of like fantastic things, but I don't think they're in a great place to really keep innovating and pushing on this in the way that a startup can. Just rapidly implementing features.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1302.185

And kind of doing the research experimentation necessary to really push the ceiling.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1628.824

There's this interesting thing where if you look at language model loss on different domains, I believe the bits per byte, which is kind of character normalized loss for code is lower than language, which means in general, there are a lot of tokens in code that are super predictable, a lot of characters that are super predictable.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1647.436

And this is, I think, even magnified when you're not just trying to autocomplete code, but predicting what the user is going to do next in their editing of existing code. And so, you know, the goal of cursor taps, let's eliminate all the low entropy actions you take inside of the editor. When the intent is effectively determined, let's just jump you forward in time, skip you forward.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1677.853

Yeah. I think I can speak to a few of the details on how to make these things work. They're incredibly low latency, so you need to train small models on this task. In particular... they're incredibly pre-filled token hungry. What that means is they have these really, really long prompts where they see a lot of your code and they're not actually generating that many tokens.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1701.385

And so the perfect fit for that is using a sparse model, meaning an MOE model. Um, so that was kind of one, one breakthrough, one breakthrough we made that substantially improved performance at longer context. The other being, um, a variant of speculative decoding that we kind of built out called speculative edits.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1719.618

These are two, I think, important pieces of what make it quite high quality and very fast.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1737.552

Caching plays a huge role. Because you're dealing with this many input tokens, if every single keystroke that you're typing in a given line, you had to rerun the model on all of those tokens passed in, you're just going to, one, significantly degrade latency, two, you're going to kill your GPUs with load. So you need to design the actual prompts used for the model such that they're caching aware.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

1763.526

And then, yeah, you need to reuse the KV cache across requests just so that you're spending less work, less compute.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2070.243

And there's a chance this is also not the final version of it.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2206.451

Yeah, I mean, so GitHub tries to solve this, right, with code review. When you're doing code review, you're reviewing multiple diffs across multiple files. But like Arvid said earlier, I think you can do much better than code review. You know, code review kind of sucks. Like, you spend a lot of time trying to grok this code that's often quite unfamiliar to you, and...

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2230.359

it often doesn't even actually catch that many bugs. And I think you can significantly improve that review experience using language models, for example, using the kinds of tricks that Arvind had described of maybe pointing you towards the regions that actually matter.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2247.762

I think also if the code is produced by these language models and it's not produced by someone else, like the code review experience is designed for both the reviewer and the person that produced the code. In the case where the person that produced the code is the language model, You don't have to care that much about their experience.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2268.506

And you can design the entire thing around the reviewers such that the reviewer's job is as fun, as easy, as productive as possible. And I think that feels like the issue with just kind of naively trying to make these things look like code review. I think you can be a lot more creative and push the boundary on what's possible.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2397.554

Well, Cursor really works via this ensemble of custom models that we've trained alongside the frontier models that are fantastic at the reasoning intense things. And so CursorTab, for example, is a great example of where you can specialize this model to be even better than even frontier models if you look at evals on the task we set it at.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2416.741

The other domain, which it's kind of surprising that it requires custom models, but it's kind of necessary and works quite well, is in apply. So I think these models are like the frontier models are quite good at sketching out plans for code and generating like rough sketches of like the change. But actually, Creating diffs is quite hard for frontier models, for your training models.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2444.238

You try to do this with Sonnet, with O1, any frontier model, and it really messes up stupid things like counting line numbers, especially in super, super large files. And so what we've done to alleviate this is we let the model kind of sketch out this rough code block that indicates what the change will be. And we train a model to then apply that change to the file.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2489.501

Yeah. I think like you see shallow copies of apply, um, elsewhere and it just breaks like most of the time, because you think you can kind of try to do some deterministic matching and then it fails, you know, at least 40% of the time. And that just results in a terrible product experience. Um, I think in general, this regime of you are going to get smarter and smarter models.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2515.735

So one other thing that Apply lets you do is it lets you use fewer tokens with the most intelligent models. This is both expensive in terms of latency for generating all these tokens and cost. So you can give this very, very rough sketch and then have your small models go and implement it because it's a much easier task to implement this very, very sketched out code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2540.649

And I think that this regime will continue where you can use smarter and smarter models to do the planning and then maybe the implementation details can be handled by the less intelligent ones. Perhaps you'll have, you know, maybe O1, maybe it'll be even more capable models given an even higher level plan that is kind of recursively implemented applied by Sonnet and an Eply model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2565.904

Fast is always an interesting detail. Fast is good.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2571.465

Yeah, so one big component of making it fast is speculative edits. So speculative edits are a variant of speculative decoding. And maybe it'd be helpful to briefly describe speculative decoding. With speculative decoding, what you do is you can kind of take advantage of the fact that most of the time, and I'll add the caveat that it would be when you're memory bound in language model generation.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2596.642

If you... process multiple tokens at once, it is faster than generating one token at a time. So this is the same reason why if you look at tokens per second with prompt tokens versus generated tokens, it's much, much faster for prompt tokens.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2615.752

So what we do is instead of using what speculative decoding normally does, which is using a really small model to predict these draft tokens that your larger model will then go in and verify, With code edits, we have a very strong prior of what the existing code will look like. And that prior is literally the same exact code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2635.541

So what you can do is you can just feed chunks of the original code back into the model. And then the model will just pretty much agree most of the time that, okay, I'm just going to spit this code back out. And so you can process all of those lines in parallel. And you just do this with sufficiently many chunks. And then eventually you'll reach a point of disagreement.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2654.085

where the model will now predict text that is different from the ground truth original code. It'll generate those tokens, and then we kind of will decide after enough tokens match the original code to restart speculating in chunks of code. What this actually ends up looking like is just a much faster version of normal editing code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2675.33

So it looks like a much faster version of the model rewriting all the code. So we can use the same exact interface, that we use for diffs, but it will just stream down a lot faster.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2738.408

Yeah, I think there's no model that... Grado dominates others, meaning it is better in all categories that we think matter. The categories being speed, ability to edit code, ability to process lots of code, long context, you know, a couple of other things and kind of coding capabilities. The one that I'd say right now is just kind of net best is Sonnet. I think this is a consensus opinion.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2767.578

Our one's really interesting and it's really good at reasoning. So if you give it really hard, uh, programming interview style problems or lead code problems. It can do quite, quite well on them. But it doesn't feel like it kind of understands your rough intent as well as Sonnet does.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2787.777

Like, if you look at a lot of the other frontier models, one qualm I have is it feels like they're not necessarily over, I'm not saying they train on benchmarks. But they perform really well in benchmarks relative to kind of everything that's kind of in the middle.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2804.149

So if you try it in all these benchmarks and things that are in the distribution of the benchmarks they're evaluated on, you know, they'll do really well. But when you push them a little bit outside of that, Sonnet's I think the one that kind of does best at kind of maintaining that same capability.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2817.795

Like you kind of have the same capability in the benchmark as when you try to instruct it to do anything with coding.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2959.731

Yeah, like in that case, it could be trained on the literal issues or pull requests themselves. And maybe the labs will start to do a better job, or they've already done a good job at decontaminating those things. But they're not going to emit the actual training data of the repository itself. Like these are all like some of the most popular Python repositories, like SymPy is one example.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

2981.096

I don't think they're going to handicap their models on SymPy and all these popular Python repositories in order to get true evaluation scores in these benchmarks.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3060.356

Yeah, with Claude, there's an interesting take I heard where I think AWS has different chips. And I suspect they have slightly different numerics than NVIDIA GPUs. And someone speculated that Claude's degraded performance had to do with maybe using the quantized version that existed on AWS Bedrock versus whatever was running on Anthropix GPUs.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3343.416

That's amazing. And you can do, like, other fancy things where if you have lots of code blocks from the entire code base, you could use retrieval and things like embedding and re-ranking scores to add priorities for each of these components.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3425.322

I think even as the system gets closer to some level of perfection, Often when you ask the model for something, not enough intent is conveyed to know what to do. And there are a few ways to resolve that intent. One is the simple thing of having the model just ask you, I'm not sure how to do these parts based on your query. Could you clarify that? I think the other could be maybe...

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3454.935

If there are five or six possible generations given the uncertainty present in your query so far, why don't we just actually show you all of those and let you pick them?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3795.515

Yeah, I mean, so we can go over a lot of the strategies that we use. One interesting thing is cache warming. And so what you can do is if, as the user is typing, you can have, you're probably going to use some piece of context. And you can know that before the user's done typing. So, you know, as we discussed before, Reusing the KV cache results in lower latency, lower costs, cross requests.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3822.241

So as the user starts typing, you can immediately warm the cache with like, let's say the current file contents. And then when they press enter, there's very few tokens. It actually has to pre-fill and compute before starting the generation. This will significantly lower TTFD.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3839.395

Yeah. So the way transformers work, I mean, like one of the mechanisms that allow transformers to not just independently, like the mechanism that allows transformers to not just independently look at each token, but see previous tokens. are the keys and values to tension.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3858.988

And generally the way attention works is you have at your current token, some query, and then you've all the keys and values of all your previous tokens, which are some kind of representation that the model stores internally of all the previous tokens in the prompt. And By default, when you're doing a chat, the model has to, for every single token, do this forward pass through the entire model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3885.698

That's a lot of matrix multiplies that happen, and that is really, really slow. Instead, if you have already done that, and you stored the keys and values, and you keep that in the GPU...

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3896.884

Then when I'm, let's say I have sorted for the last n tokens, if I now want to compute the output token for the n plus one token, I don't need to pass those first n tokens through the entire model because I already have all those keys and values. And so you just need to do the forward pass through that last token.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3915.854

And then when you're doing attention, you're reusing those keys and values that have been computed, which is the only kind of sequential part or sequentially dependent part of the transformer.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3933.003

Yeah, that that there's other types of caching you can kind of do. One interesting thing that you can do for cursor tab is you can basically predict ahead as if the user would have accepted the suggestion and then trigger another request. And so then you've cached, you've done this speculative, it's a mix of speculation and caching, right?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

3957.313

Because you're speculating what would happen if they accepted it. And then you have this value that is cached, this suggestion. And then when they press tab, the next one would be waiting for them immediately. It's a kind of clever heuristic slash trick. that uses a higher level caching and can give the... It feels fast despite there not actually being any changes in the model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4095

Yeah, it is a little different than speed. But, I mean, like, technically you tie it back in because you can get away with the smaller model if you RL your smaller model and it gets the same performance as the bigger one. That's, like, and while I was mentioning stuff about... about reducing the size of your KB cache. There are other techniques there as well that are really helpful for speed.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4119.066

So kind of back in the day, like all the way two years ago, people mainly use multi-head attention. And I think there's been a migration towards more efficient attention schemes like group query or multi-query attention. And this is really helpful for then with larger batch sizes, being able to generate the tokens much faster.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4143.041

The interesting thing here is this now has no effect on that time to first token pre-fill speed. The thing this matters for is now generating tokens. And why is that? Because when you're generating tokens, instead of... being bottlenecked by doing these super-paralyzable matrix multiplies across all your tokens.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4165.302

You're bottlenecked by how quickly, for long context with large batch sizes, by how quickly you can read those cache keys and values. That's memory bandwidth, and how can we make this faster? We can try to compress the size of these keys and values. Multi-query attention is the most aggressive of these.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4185.616

Where normally with multi-head attention, you have some number of quote-unquote attention heads and some number of query heads. Multi-query just preserves the query heads, gets rid of all the key value heads. So there's only one kind of key value head, and there's all the remaining query heads.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4207.662

With group query, you instead preserve all the query heads, and then your keys and values are kind of... There are fewer heads for the keys and values, but you're not reducing it to just one. But anyways, the whole point here is you're just reducing the size of your KV cache.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4228.814

Yeah, multi-latent. That's a little more complicated. And the way that this works is it kind of turns the entirety of your keys and values across all your heads into this kind of one latent vector that is then kind of expanded inference time.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4344.016

what i mean ultimately how does that map to the user experience trying to get the yeah the two things that it maps to is you can now make your cash a lot larger because you've less space allocated for the kv cash you can maybe cash a lot more aggressively and a lot more things so you get more cash hits which are helpful for reducing the time to first token for the reasons that were kind of described earlier and then the second being when you

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4368.678

start doing inference with more and more requests and larger and larger batch sizes, you don't see much of a slowdown in as it's generating the tokens, the speed of that.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4380.668

Yeah. So like the basic, the size of your KV cache is both the size of all your prompts multiplied by the number of prompts being processed in parallel. So you could increase either those dimensions, right? The batch size or the size of your prompts without degrading the latency of generating tokens.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4643.554

One maybe hacky but interesting idea that I like is holding a lock on saving. And so basically, you can then have the language model kind of hold the lock on saving to disk.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4653.838

And then instead of you operating in the ground truth version of the files that are saved to disk, you actually are operating what was the shadow workspace before and these unsaved things that only exist in memory that you still get linter errors for and you can

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4664.762

code in and then when you try to maybe run code it's just like there's a small warning that there's a lock and then you kind of will take back the lock from the language server if you're trying to do things concurrently or from the shadow workspace if you're trying to do things concurrently that's such an exciting future by the way it's a bit of a tangent but like to allow a model to change files

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4698.422

Yeah. And I think there may be different versions of like run ability where you For the simple things where you're doing things in the span of a few minutes on behalf of the user as they're programming, it makes sense to make something work locally in their machine.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4711.673

I think for the more aggressive things where you're making larger changes that take longer periods of time, you'll probably want to do this in some sandbox remote environment. And that's another incredibly tricky problem of how do you

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4724.105

exactly reproduce or mostly reproduce to the point of it being effectively equivalent for running code the user's environment with this remote remote sandbox i'm curious what kind of agency you want for for coding did you want them to find bugs do you want them to like implement new features like what agency you want

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4806.418

Yeah. I mean, it's really interesting that these models are so bad at bug finding when just naively prompted to find a bug. They're incredibly poorly calibrated.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4819.122

Exactly. Even 01. How do you explain that?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4825.086

I think these models are really strong reflection of the pre-training distribution. And, you know, I do think they, they generalize as the loss gets lower and lower, but I don't think the, the loss and the scale is quite, or the loss is low enough such that they're like really fully generalizing in code. Like the things that we use these things for, uh, the frontier models, uh,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4846.604

that they're quite good at are really code generation and question answering. And these things exist in massive quantities and pre-training with all of the code on GitHub on the scale of many, many trillions of tokens and questions and answers on things like stack overflow and maybe GitHub issues. And so when you try to push into these things that really don't exist, uh,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4871.075

very much online, like, for example, the cursor tap objective of predicting the next edit given the edits done so far. The brittleness kind of shows. And then bug detection is another great example where there aren't really that many examples of actually detecting real bugs and then proposing fixes. And the models just really struggle at it. But I think it's a question of transferring the model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4895

In the same way that you get this fantastic transfer from pre-trained models just on code in general to the cursor tab objective, you'll see a very, very similar thing with generalized models that are really good at code to bug detection. It just takes a little bit of nudging in that direction.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

4995.612

how paranoid is the user? But even then, if you're putting in a maximum paranoia, it still just doesn't quite get it.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5099.468

bring down you know like just bring down the server and like you like of course we like test a lot and whatever but there's always these things that you have to be very careful yeah like with just normal doc strings i think people will often just skim it when making a change and think oh this i know how to do this um and you kind of really need to point it out to them so that doesn't slip through yeah you have to be reminded that you can do a lot of damage

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5145.264

But concretely, what do you think that future would look like?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5222.392

Yeah. The worry is there's this massive document. Replacing something like unit tests, sure.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5287.34

or how do you handle i guess external dependencies like calling the stripe api maybe stripe would write a spec for but like you can't do this for everything like can you do this for everything you use like how do you how do you do it for if there's a language like maybe maybe like people will use language models as primitives in the programs they write and there's like a dependence on it and like how how do you now include that i think you might be able to prove prove that still

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5646.476

Yeah, I was going to say the moment that feels like people do this is when they share it, when they have this fantastic example, they just kind of share it with their friends.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5701.628

You can use terminal context as well inside of check, mank, kind of everything. We don't have the looping part yet, though we suspect something like this could make a lot of sense. There's a question of whether it happens in the foreground too, or if it happens in the background, like what we've been discussing.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

5796.908

It would be really interesting if you could branch a file system.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6134.75

Yeah. And there are a lot of clever things, like additional things that go into this indexing system. For example, the bottleneck in terms of costs is not storing things in the vector database or the database. It's actually embedding the code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6147.396

And you don't want to re-embed the code base for every single person in a company that is using the same exact code, except for maybe they're in a different branch with a few different files or they've made a few local changes.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6158.725

And so because, again, embeddings are the bottleneck, you can do one clever trick and not have to worry about the complexity of dealing with branches and the other databases where you just... have some cache on the actual vectors computed from the hash of a given chunk. And so this means that when the nth person at a company goes and invents their code base, it's really, really fast.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6183.083

And you do all this without actually storing any code on our servers at all. No code data is stored. We just store the vectors in the vector database and the vector cache.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6238.944

I think, like you mentioned, in the future, I think this is only going to get more and more powerful where we're working a lot on improving the quality of our retrieval. And I think the ceiling for that is really, really much higher than people give it credit for.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6367.193

Yeah, like an approximate nearest neighbors and this massive code base is going to just eat up your memory and your CPU. And that's just that. Let's talk about also the modeling side where, as Arvid said, there are these massive headwinds against local models where, one, things seem to move towards MOEs.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6388.739

One benefit is maybe they're more memory bandwidth bound, which plays in favor of local versus using GPUs or using NVIDIA GPUs. But the downside is these models are just bigger in total. And they're going to need to fit often not even on a single node, but multiple nodes. There's no way that's going to fit inside of even really good MacBooks. And I think especially for coding,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6415.262

It's not a question as much of like, does it clear some bar of like the models good enough to do these things? And then like we're satisfied, which may be the case for other problems and maybe where local models shine. But people are always going to want the best, the most intelligent, the most capable things. And that's going to be really, really hard to run for almost all people locally.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6671.066

Why do you think it's different than cloud providers?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6858.905

Like one interesting proof of concept for the learning this knowledge directly in the weights is with VS Code. So we're in a VS Code fork and VS Code, the code is all public. So these models in pre-training have seen all the code. They've probably also seen questions and answers about it, and then they've been fine-tuned and RLHFed to be able to answer questions about code in general.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6882.508

So when you ask it a question about VS Code, sometimes it'll hallucinate, but sometimes it actually does a pretty good job at answering the question. And I think like this is just by, it happens to be okay at it. But what if you could actually like specifically train or post train a model such that it really was built to understand this code base?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6905.151

It's an open research question, one that we're quite interested in. And then there's also uncertainty of like, do you want the model to be the thing that end to end is doing everything, i.e. it's doing the retrieval and its internals and then kind of answering the question, creating the code?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6918.236

Or do you want to separate the retrieval from the frontier model where maybe, you know, you'll get some really capable models that are much better than like the best open source ones in a handful of months? Yeah. And then you'll want to separately train a really good open source model to be the retriever, to be the thing that feeds in the context to these larger models.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6948.751

Is this... Yeah, I mean, there are many possible ways you could try doing it. There's certainly no shortage of ideas. It's just a question of going in and trying all of them and being empirical about which one works best. One very naive thing is to try to replicate what's done with VS Code and these frontier models.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6969.429

So let's continue pre-training, some kind of continued pre-training that includes general code data, but also throws in a lot of the data of some particular repository that you care about.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

6979.498

And then in post-training, meaning in, let's just start with instruction fine-tuning, you have like a normal instruction fine-tuning data set about code, but you throw in a lot of questions about code in that repository. So you could either get ground truth ones, which might be difficult, or you could do what you kind of hinted at or suggested using synthetic data, i.e.,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7002.344

kind of having the model ask questions about various pieces of the code. So you kind of take the pieces of the code, then prompt the model or have a model propose a question for that piece of code, and then add those as instruction finds new data points. And then in theory, this might unlock the model's ability to answer questions about that code base.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7033.673

I think test time compute is really, really interesting. So there's been the pre-training regime, which will kind of, as you scale up the amount of data and the size of your model, get you better and better performance, both on loss and then on downstream benchmarks and just general performance when we use it for coding or other tasks.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7055.327

We're starting to hit a bit of a data wall, meaning it's going to be hard to continue scaling up this regime.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7062.473

And so scaling up test time compute is an interesting way of now, you know, increasing the number of inference time flops that we use, but still getting like, like, yeah, as you increase the number of flops use inference time getting corresponding improvements in the performance of these models tremendously.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7079.787

Traditionally, we just had to literally train a bigger model that always used that many more flops. But now we could perhaps use the same size model and run it for longer to be able to get an answer at the quality of a much larger model. And so the really interesting thing I like about this is there are some problems that perhaps require

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7099.691

hundred trillion parameter model intelligence trained on a hundred trillion tokens. Um, but that's like maybe 1%, maybe like 0.1% of all queries. So are you going to spend all of this effort, all of this compute training model, uh,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7114.551

that costs that much and then run it so infrequently, it feels completely wasteful when instead you get the model that can, that you train the model that's capable of doing the 99.9% of queries, then you have a way of inference time running it longer for those few people that really, really want max intelligence.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7149.598

I mean, yeah, that's an open research problem, certainly. I don't think anyone's actually cracked this model routing problem quite well. We'd like to. We have initial implementations of this for something like CursorTab. But at the level of going between 4.0 Sonnet to O1, It's a bit trickier.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7171.255

There's also a question of what level of intelligence do you need to determine if the thing is too hard for the four-level model? Maybe you need the O1-level model. It's really unclear.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

719.561

Yeah, so... I think a lot of us, well, all of us were originally Vim users.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7197.08

Um, well, it's weird because like test time compute, there's like a whole training strategy needed to get test time to compute to work. And the really, the other really weird thing about this is no one like outside of the big labs and maybe even just open AI, no one really knows how it works. Like there've been some really interesting papers that, uh, show hints of what they might be doing.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7220.146

And so perhaps they're doing something with tree search using process reward models. But yeah, I just I think the issue is, we don't quite know exactly what it looks like.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7231.251

So it would be hard to kind of comment on like where it fits in, I would put it in post training, but maybe like the compute spent for this kind of for getting test time compute to work for a model is going to dwarf pre training eventually.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7261.468

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

727.015

Pure Vim, yeah. No NeoVim, just pure Vim in a terminal. And at least for myself, it was around the time that Copilot came out. So 2021. that I really wanted to try it. So I went into VS Code, the only platform, the only code editor in which it was available. And even though I really enjoyed using Vim, just the experience of Copilot with VS Code was more than good enough to convince me to switch.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7281.79

So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7295.697

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7322.792

So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7345.144

Because if you really can grade every single step of the chain of thought, then you can kind of branch out and explore multiple paths of this chain of thought. And then use these process reward models to evaluate how good is this branch that you're taking.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7372.864

And like the interesting work that I think has been done is figuring out how to properly train the process or the interesting work that has been open sourced. And people I think talk about is how to train the process reward models, maybe in a more automated way.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7387.718

I could be wrong here, could not be mentioning something because I haven't seen anything super that seems to work really well for using the process reward models creatively to do tree search and code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7555.939

But it has these significant limitations. Even barring capabilities, it does not stream. And that means it's really, really painful to use for things where you want to supervise the output. And instead, you're just waiting for the wall of text to show up. Also, it does feel like the early innings of test time compute and search, where it's just very, very much a v0.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

757.209

And so that kind of was the default until we started working on Cursor.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7581.006

And there's so many things that... like don't feel quite right. And I suspect in parallel to people increasing the amount of pre-training data and the size of the models and pre-training and finding tricks there, you'll now have this other thread of getting search to work better and better.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7683.854

Yeah, I think most of the additional value from Cursor versus everything else out there is not just integrating the new model fast like 01. It comes from all of the kind of depth that goes into these custom models that you don't realize are working for you in kind of every facet of the product, as well as like the really thoughtful UX with every single feature.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7716.931

Yeah, I think there are three main kinds of synthetic data. The first is, so what is synthetic data first? So there's normal data, like non-synthetic data, which is just data that's naturally created, i.e. usually it'll be from humans having done things. So from some human process, you get this data. Synthetic data, the first one would be distillation.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7741.174

So having a language model kind of output tokens or probability distributions over tokens. And then you can train some less capable model on this. This approach is not gonna get you a net, like more capable model than the original one that has produced the tokens.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7759.022

but it's really useful for if there's some capability you want to elicit from some really expensive high latency model, you can then distill that down into some smaller task specific model. The second kind is when like one direction of the problem is easier than the reverse.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7778.389

And so a great example of this is bug detection, like we mentioned earlier, where it's a lot easier to introduce reasonable looking bugs than it is to actually detect them. And this is probably the case for humans too. And so what you can do is you can get a model that's not trained in that much data, that's not that smart, to introduce a bunch of bugs in code.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7802.137

And then you can use that to then train, use a synthetic data to train a model that can be really good at detecting bugs. The last category, I think, is, I guess, the main one that it feels like the big labs are doing for synthetic data, which is... producing text with language models that can then be verified easily.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7823.357

So like, you know, extreme example of this is if you have a verification system that can detect if language is Shakespeare level and then you have a bunch of monkeys typing in typewriters, like you can eventually get enough training data to train a Shakespeare level language model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7839.146

And I mean, this is the case, like very much the case for math where verification is, is, is actually really, really easy for formal, um, formal languages, and then what you can do is you can have an okay model, generate a ton of rollouts, and then choose the ones that you know have actually proved the ground truth theorems and then train that further.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7861

There's similar things you can do for code with leetcode-like problems, where if you have some set of tests that you know correspond to, if something passes these tests, it has actually solved the problem. You can do the same thing where you verify that it's passed the test and then train the model and the output set of passed the tests.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7878.072

I think it's gonna be a little tricky getting this to work in all domains or just in general. Like having the perfect verifier feels really, really hard to do with just like open-ended miscellaneous tasks you give the model or more like long horizon tasks, even in coding.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7902.958

Yeah. Verification, it feels like it's best when you know for a fact that it's correct. And then it wouldn't be using a language model to verify. It would be using tests or formal systems. Or running the thing, too.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7928.26

I think that that's the category that is, um, most likely to result in like massive gains.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7946.505

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7967.843

RLAIF is interesting because you're kind of depending on, like this is actually kind of going to, it's depending on the constraint that verification is actually a decent bit easier than generation. Because it feels like, okay, what are you doing? Are you using this language model to look at the language model outputs and then prove the language model?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7988.989

But no, it actually may work if the language model has a much easier time verifying some solution than it does generating it. Then you actually could perhaps get this kind of recursive loop. I don't think it's going to look exactly like that. The other thing you could do is... we kind of do is a little bit of a mix of RLA-IF and RLA-HF, where usually the model is actually quite correct.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8013.983

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8052.228

My intuition would just say, yeah, it should be. This is kind of going back to like if you if you believe p does not equal np then there's this massive class of problems that are much much easier to verify given a proof than actually proving it i wonder if the same thing will prove p not equal to np or p equal to np that would be that would be really cool

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8116.717

I feel like I have much more to do there. It felt like the path to get to IMO was a little bit more clear because it already could get a few IMO problems. And there are a bunch of like there's a bunch of low hanging fruit given the literature at the time of like what what tactics people could take. I think I'm one much less versed in the space of theorem proving now.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8134.145

And two, yeah, less intuition about how close we are to solving these really, really hard open problems.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8182.417

I think we might get feels metal before AGI.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8219.986

I think it's interesting. The original scaling laws paper by OpenAI was slightly wrong because I think of some issues they did with learning rate schedules. And then Chinchilla showed a more correct version.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8232.986

And then from then people have again kind of deviated from doing the compute optimal thing because people start now optimizing more so for making the thing work really well given an inference budget. And I think there are a lot more dimensions to these curves than what we originally used of just compute number of parameters and data. like inference compute is the obvious one.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8258.849

I think context length is another obvious one. So if you care, like, let's say you care about the two things of inference compute and then context window, maybe the thing you want to train is some kind of SSM because they're much, much cheaper and faster at super, super long context.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8275.493

And even if maybe it is 10X worse scaling properties during training, meaning you have to spend 10X more compute to train the thing to get the same level of capabilities, it's worth it because you care most about that inference budget for really long context windows. So it'll be interesting to see how people kind of play with all these dimensions.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8322.085

I mean, I think bigger is certainly better for just raw performance.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8328.85

And raw intelligence. I think that the path that people might take is, I'm particularly bullish on distillation. And like, yeah, how many knobs can you turn to if we spend like a ton, ton of money on training, like get the most capable, cheap model?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8344.72

like really really caring as much as you can because like the the naive version of caring as much as you can about inference time compute is what people have already done with like the llama models or just over training the shit out of 7b models um on way way way more tokens than essential optimal right but if you really care about it maybe the thing to do is what gamma did which is let's just not let's not just train on tokens let's literally train on uh

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8369.105

minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8390.179

Yeah, distillation in theory is... I think getting out more signal from the data that you're training on. And it's like another, it's perhaps another way of getting over, not like completely over, but like partially helping with the data wall where like you only have so much data to train on.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8405.736

Let's like train this really, really big model on all these tokens and we'll distill it into a smaller one. And maybe we can get more signal per token for this much smaller model than we would have originally if we trained it.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8435.87

Yeah, I think there's a lot of these secrets and details about training these large models that I just don't know and are only privy to the large labs. And the issue is I would waste a lot of that money if I even attempted this because I wouldn't know those things.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8452.555

uh suspending a lot of disbelief and assuming like you had the know-how um and operate or or if you're saying like you have to operate with like the limited information you have now no no actually i would say you swoop in and you get all the information all the little heuristics all the little parameters all the all the parameters that define how the thing is trained and

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8501.533

Well, this gets into the question of like, are you really limited by compute and money or are you limited by these other things?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8531.018

I think, yeah, because even with all this compute and like, you know, all the data you could collect in the world, I think you really are ultimately limited by not even ideas, but just like really good engineering. Like, even with all the capital in the world, would you really be able to assemble... Like, there aren't that many people in the world who really can, like, make the difference here.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8553.696

And there's so much work that goes into research that is just, like, pure, really, really hard engineering work. As, like, a very...

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8563.968

kind of hand-wavy example, if you look at the original Transformer paper, you know, how much work was kind of joining together a lot of these really interesting concepts embedded in the literature versus then going in and writing all the codes, like maybe the CUDA kernels, maybe whatever else, I don't know if it ran on GPUs or TPUs originally, such that it actually saturated the GPU performance, right?

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8584.78

Getting GNOME to go in and do all this code, right? And GNOME is like probably one of the best engineers in the world. Or maybe going a step further, like the next generation of models, having these things, like getting model parallelism to work and scaling it on like, you know, thousands of or maybe tens of thousands of like V100s, which I think GBDE3 may have been.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8603.527

There's just so much engineering effort that has to go into all of these things to make it work. If you really brought that cost down to... like, you know, maybe not zero, but just made it 10X easier, made it super easy for someone with really fantastic ideas to immediately get to the version of like the new architecture they dreamed up that is like getting 50, 40% utilization on the GPUs.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8629.258

I think that would just speed up research by a ton.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8681.923

I think all of us believe new ideas are probably needed to get all the way there to HEI. And... All of us also probably believe there exist ways of testing out those ideas at smaller scales and being fairly confident that they'll play out.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

8702.198

It's just quite difficult for the labs in their current position to dedicate their very limited research and engineering talent to exploring all these other ideas when there's this core thing that will probably improve performance for some decent amount of time.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

9113.803

I really like that point about, it feels like a lot of the time with programming, they're

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

9118.527

two ways you can go about it one is like you think really hard carefully up front about the best possible way to do it and then you spend your limited time of engineering to actually implement it uh but i much prefer just getting in the code and like you know taking a crack at it seeing how it kind of lays out and then iterating really quickly on that that feels more fun um

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

9238.995

I think different people do programming for different reasons. But I think the true, maybe like the best programmers are the ones that really love just like absolutely love programming.

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

9252.907

For example, there are folks on our team who literally when they get back from work, they go and then they boot up Cursor and then they start coding on their side projects for the entire night and they stay up till 3 a.m. doing that. And when they're sad, they said, I just really need to code. And I think like,

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

9280.114

You know, there's that level of programmer where like this obsession and love of programming, I think makes really the best programmers. And I think these types of people will really get into the details of how things work.