Edward Gibson
Appearances
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
as about maybe a little more than 40%, maybe 45% of the world's languages or more, I mean, 50% of the world's languages are verb final. Those tend to be post positions. Those markers, they have the same kinds of markers as we do in English, but they put them after. So, sorry, they put them first, the markers come first.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's less of a value economically. It's kind of what drives this. It's not a... It's not just for fun. I mean, there are these groups that do want to learn language just for language's sake, and there's something to that. But those are rarities in general. Those are a few small groups that do that. Most people don't do that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's happening.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We're moving towards fewer and fewer languages. We are.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I completely agree.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, I completely agree. And there's a lot of work to try to create that identity so people want to do that. As a cognitive scientist and language expert, I hope that continues because I don't want languages to die. I want languages to survive because they're so interesting for so many reasons.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But I mean, I find them fascinating just for the language part, but I think there's a lot of connections to culture as well, which is also very important.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's going to be cases where it's going to be really hard, right? So there are concepts that are in one language and not in another. Like the most extreme kinds of cases are these cases of number information. So good luck translating a lot of English into Piraha. It's just impossible. There's no way to do it because there are no words for these concepts that we're talking about.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's probably the flip side, right? There's probably stuff in Piraha, which is going to be hard to translate into English on the other side. And so I just don't know what those concepts are. I mean, you know, the space, the world space is a little, is different from my world space. And so I don't know what, like, so that the things they talk about, things are...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's going to have to do with their life as opposed to my industrial life, which is going to be different. And so there's going to be problems like that always. Maybe it's not so bad in the case of some of these spaces, and maybe it's going to be harder in others. And so it's pretty bad in number. It's extreme, I'd say, in the number space, exact number space. But in the color dimension, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So you say, instead of, you know, talk about a book, you say a book about, the opposite order there in Japanese or in Hindi, you do the opposite. And the talk comes at the end. So the verb will come at the end as well. So instead of Mary kicked the ball, it's Mary ball kicked.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So that's not so bad. I mean, but it's a problem that you don't have ways to talk about
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes. Yeah. But so you were talking earlier about translation and about how translations, there's good and bad translations. I mean, now we're talking about translations of form, right? So what makes writing good, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's not just the content, it's how it's written. And translating that, that sounds difficult. I don't know how to do that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We can probably get measures of those.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I don't know. I'm optimistic that we could get measures of those things, and so maybe that's... Translatable. I don't know. I don't know, though. I have not worked on that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
that's your sense huh it's simple sentences short yeah yeah yeah i mean that's when if you have really long sentences even if they don't have center embedding like they can have longer connections yeah they can have longer connections they don't have to right you can't have a long long sentence with a bunch of local words yeah yeah but it's but it is much more likely to have the possibility of long dependencies with long sentences yeah
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I would say yes. I think it's not crazy at all. I think it's quite reasonable. There's this sort of odd view, I think, to think that human language is somehow special. I mean, maybe it is. We can certainly do more than any of the other species. And maybe our language system is part of that, it's possible.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But people have often talked about how human, like Chomsky, in fact, has talked about how only human language has this compositionality thing that he thinks is sort of key in language. And the problem with that argument is he doesn't speak whale. And he doesn't speak crow, and he doesn't speak monkey. They say things like, well, they're making a bunch of grunts and squeaks.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then if it says Mary kicked the ball to John, it's John to, the to, the marker there, the preposition, it's a postposition in these languages. And so the interesting thing, a fascinating thing to me is that within a language that this order aligns. It's harmonic.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the reasoning is like, that's bad reasoning. I'm pretty sure if you asked a whale what we're saying, they'd say, well, I'm making a bunch of weird noises. Exactly. And so it's like, this is a very odd reasoning to be making that human language is special because we're the only ones who have human language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I'm like, well, we don't know what those other, we just don't, we can't talk to them yet. And so there are probably a signal in there and it might very well be. something complicated like human language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, sure, with a small brain in lower species, there's probably not a very good communication system, but in these higher species where you have what seems to be abilities to communicate something, there might very well be a lot more signal there than we might have otherwise thought.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, I mean, I think some of them are not going to have much interesting to say, but some of them will. We don't know. We certainly don't know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, they're probably talking to other trees, right? They're not talking to us. And so to the extent they're talking, they're saying something interesting to some other, you know, conspecific as opposed to us, right? Yeah. And so there may be some signal there.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So there are people out there... Actually, it's pretty common to say that human language is special and different from any other animal communication system. And I just don't think the evidence is there for that claim. I think it's not obvious... We just don't know, because we don't speak these other communication systems until we get better.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I do think there are people working on that, as you pointed out, people working on whale speak, for instance. That's really fascinating.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so if it's one or the other, it's either verb initial or verb final, but then you'll have prepositions, prepositions, or postpositions. And that's across the languages that we can look at. We've got around 1,000 languages. There's around 7,000 languages on the earth right now. But we have information about, say, word order on around 1,000 of those, a pretty decent amount of information.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, I would want Dan Everett with me. He is like amazing at learning foreign languages. And so he, like, this is an amazing feat, right? To be able to go, this is a language, which has no translators before him. I mean, there were, he was a missionary. Well, there was a guy that had been there before, but he wasn't very good.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so he learned the language far better than anyone else had learned before him. He's like good at, He's a very social person. I think that's a big part of it, is being able to interact. So I don't know. It kind of depends on the species from outer space, how much they want to talk to us.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You start that, you actually start with like objects and just say, you know, just throw a stick down and say stick. And then you say, what do you call this? And then they'll say the word, whatever. And he says, the standard thing to do is to throw two sticks, two sticks.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then, you know, he learned pretty quick that there weren't any count words in this language because they didn't know this wasn't interesting. I mean, it was kind of weird. They'd say some or something, the same word over and over again. And so, but that is a standard thing. You just like try to, but you have to be pretty out there socially, like willing to talk to
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
which these are really very different people from you, and he's very social. And so I think that's a big part of this, is like that's how a lot of people know a lot of languages, is they're willing to talk to other people.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, oh God.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
When you see something interesting, just go and do it. Like, I do that. Like, that's something I do, which is kind of unusual for most people. So, like, when I saw the Piedra – like, if Piedra was available to go and visit, I was like, yes, yes, I'll go. And then when we couldn't go back, we had some trouble with the – Brazilian government, there's some corrupt people there.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It was very difficult to go back in there. And so I was like, all right, I got to find another group. And so we searched around and we were able to find the Chimane, because I wanted to keep working on this kind of problem. And so we found the Chimane and just go there. I didn't really have, we didn't have contact. We had a little bit of contact and brought someone.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And that was, you know, we just kind of just try things. I say it's like... A lot of that's just like ambition, just try to do something that other people haven't done. Just give it a shot is what I, I mean, I do that all the time. I don't know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Thank you very much, Lex. It's been a pleasure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And for those 1,000 which we know about, about 95% fit that pattern. So they will have either verb, it's about half and half, half are verb initial, like English, and half are verb final, like Japanese.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's correct. Yeah, the subject is generally first.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, it's pretty even. And those two are the most common by far. Those two words, the subject tends to be first. There's so many interesting things, but the thing I find so fascinating is there are these generalizations within and across a language. And there's actually a simple explanation, I think, for a lot of that. And that is you're trying to minimize dependencies between words.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's basically the story, I think, behind a lot of why word order looks the way it is, is we're always connecting. What is the thing I'm telling you? I'm talking to you in sentences. You're talking to me in sentences. These are sequences of words which are connected, and the connections are dependencies between the words.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it turns out that what we're trying to do in a language is actually minimize those dependency links. It's easier for me to say things if the words that are connecting for their meaning are close together. It's easier for you in understanding if that's also true. If they're far away, it's hard to produce that, and it's hard for you to understand. And the languages of the world,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
within a language and across languages fit that generalization. It turns out that having verbs initial and then having prepositions ends up making dependencies shorter. And having verbs final and having postpositions ends up making dependencies shorter than if you cross them. If you cross them, it's possible You can do it. You mean within a language? Within a language, you can do it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It just ends up with longer dependencies than if you didn't. So languages tend to go that way. They call it harmonic. So it was observed a long time ago, without the explanation, by a guy called Joseph Greenberg, who's a famous typologist from Stanford. He observed a lot of generalizations about how word order works, and these are some of the harmonic generalizations that he observed.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, what I mean is in... In language, there's kind of three structures to, three components to the structure of language. One is the sounds. So cat is k, a, and t in English. I'm not talking about that part. I'm talking, then there's two meaning parts. And those are the words. And you were talking about meaning earlier. So words have a form and they have a meaning associated with them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so cat is a full form in English and it has a meaning associated with whatever a cat is. And then the combinations of words, that's what I'll call grammar or syntax. And that's like when I have a combination like the cat or two cats, okay? So where I take two different words there and put them together and I get a compositional meaning from putting those two different words together.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so that's the syntax. And in any sentence or utterance, whatever I'm talking to you, you're talking to me, we have a bunch of words and we're putting together in a sequence. It turns out they are... connected so that every word is connected to just one other word in that sentence. And so you end up with what's called technically a tree.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's a tree structure where there's a root of that utterance of that sentence. And then there's a bunch of dependents, like branches from that root that go down to the words. The words are the leaves in this metaphor for a tree.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah. It's a graph theoretical thing. It's a graph theory thing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right. And everyone agrees on that. So all linguists will agree with that. Oh, so this is not a controversial thing? That is not controversial.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No, I think in every language, I think everyone agrees that all sentences are trees at some level. Can I pause on that? Sure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I think so. I've never heard of anyone disagreeing with that. That's weird. The details of the trees are what people disagree about.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, this is where, you know, depending on what your... There's different theoretical notions. I'm going to say the simplest thing, dependency grammar. It's like a bunch of people invented this. Tenier was the first French guy back in... I mean, the paper was published in 1959, but he was working on the 30s and stuff.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it goes back to, you know, philologist Pignini was doing this in ancient India, okay? And so, you know, doing something like this. The simplest thing we can think of is... that there's just connections between the words to make the utterance. And so let's just say I have like two dogs entered a room. Okay, here's a sentence. And so we're connecting two and dogs together.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's like, there's some dependency between those words to make some bigger meaning. And then we're connecting dogs now to entered, right? And we connect a room somehow to entered. And so I'm going to connect to room and then room back to entered. That's the tree. The root is entered. The thing is like an entering event. That's what we're saying here.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the subject, which is whatever that dog is, is two dogs, it was. And the connection goes back to dogs, which goes back to, then that goes back to two. I'm just, that's my tree. It starts at entered, goes to dogs, down to two. And on the other side, after the verb, The object, it goes to room, and then that goes back to the determiner or article, whatever you want to call that word.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So there's a bunch of categories of words here we're noticing. So there are verbs. Those are these things that typically mark... They refer to events and states in the world. And there are nouns, which typically refer to people, places, and things is what people say. But they can refer to other more... I think you've heard of events themselves as well. They're marked by...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
The category, the part of speech of a word is how it gets used in language. That's how you decide what the category of a word is. Not by the meaning, but how it gets used. How it's used.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Usually. Yes, yes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, if I don't say a verb, then there won't be a verb, and so it'll be something else.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No. You're constrained by whatever language you're dealing with. Probably you have other constraints in poetry. Usually in poetry, there's multiple constraints that you want to... You want to usually convey multiple meanings is the idea, and maybe you have a rhythm or a rhyming structure as well. But you usually are constrained by the rules of your language for the most part. So you don't...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
violate those too much. You can violate them somewhat, but not too much. So it has to be recognizable as your language. Like in English, I can't say, dogs two entered room ah. I mean, I meant, you know, two dogs entered a room. And I can't mess with the order of the articles and the nouns. You just can't do that. In some languages, you can mess around with the order of words much more.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, you speak Russian. Mm-hmm. Russian has a much freer word order than English. And so, in fact, you can move around words in, you know, I told you that English has the subject, verb, object word order. So does Russian. But Russian is much freer than English. And so you can actually mess around with the word order.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So probably Russian poetry is going to be quite different from English poetry because the word order is much less constrained.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, you can just mess with different things in each language. And so in Russian, you have case markers, which are these endings on the nouns, which tell you how it connects, each noun connects to the verb, right? We don't have that in English. And so when I say, Mary kissed John, I don't know who the agent or the patient is, except by the order of the words, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
In Russian, you actually have a marker on the end. If you're using a Russian name and each of those names, you'll also say, is it, you know, agent, it'll be the, you know, nominative, which is marking the subject, or an accusative will mark the object. And you could put them in the reverse order. You could put accusative first. You could put subject, you could put...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
the patient first, and then the verb, and then the subject. And that would be a perfectly good Russian sentence. And it would still mean, I could say John kissed Mary, meaning Mary kissed John, as long as I use the case markers in the right way. You can't do that in English.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Those are for kind of meaning. Those are meaning. And subject and object are generally used for position. So subject is just like the thing that comes before the verb, and the object is the one that comes after the verb. The agent is kind of like the thing doing it. That's kind of what that means, right? The subject is often the person doing the action, right? The thing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I think it's pretty automatable at this point. People can figure out what the words are. They can figure out the morphemes, which are the, technically, morphemes are the minimal meaning units within a language, okay? And so, when you say eats, Or drinks, it actually has two morphemes in English.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's the root, which is the verb, and then there's some ending on it which tells you, you know, that's the third person singular. Can you say what morphemes are? Morphemes are just the minimal meaning units within a language. And a word is just kind of the things we put spaces between in English. And they have a little bit more. They have the morphology as well.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They have the endings, this inflectual morphology on the endings, on the roots.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah, yeah. And so we have a little bit of that in English, very little. You have much more in Russian, for instance. But we have a little bit in English. And so we have a little on the nouns. You can say it's either singular or plural. And you can say the same thing for verbs. Like simple past tense, for example. So, you know, notice in English we say drinks.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
you know, he drinks, but everyone else is, I drink, you drink, we drink. It's unmarked in a way. And then, but in the past tense, it's just drank. For everyone, there's no morphology at all for past tense. There is morphology, it's marking past tense, but it's kind of, it's an irregular now. So we don't even, you know, drink to drank, you know, it's not even a regular word.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So in most verbs, many verbs, there's an ed we kind of add. So walk to walked, we add that to say it's the past tense. I just happened to choose an irregular because the high frequency word and the High-frequency words tend to have irregulars in English. What's an irregular? Irregular is just, there isn't a rule. So drink to drank is an irregular.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
As opposed to walk, walked, talked, talked.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's a lot of irregulars in English. The frequent ones, the common words, tend to be irregular. There's many, many more low-frequency words, and those tend to be, those are regular ones.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Morphology is the connections between the morphemes onto the roots. So in English, we mostly have suffixes. We have endings on the words, not very much, but a little bit, as opposed to prefixes. Some words, depending on your language, can have mostly prefixes, mostly suffixes, or both. And then even languages, several languages have things called infixes, where you have some kind of a general...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
form for the root, and you put stuff in the middle. You change the vowels. That's fascinating.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, in English, it's one or two. In English, it tends to be one or two. There can be more. In other languages, a language like English, Like Finnish, which has a very elaborate morphology, there may be 10 morphemes on the end of a root. And so there may be millions of forms of a given word.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's a really good question. That's a very good question. Why do languages have more morphology versus less morphology? I don't think we know the answer to this. I think there's just a lot of good solutions to the problem of communication. I believe, as you hinted, that Language is an invented system by humans for communicating their ideas.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I think it comes down to we label the things we want to talk about. Those are the morphemes and words. Those are the things we want to talk about in the world. And we invent those things. And then we put them together in ways that are easy for us to convey, to process. But that's like a naive view. And I don't, I mean, I think it's probably right, right? It's naive and probably right.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, these are very interesting questions. We don't know really about how words, even words, get invented very much. Assuming they get invented, we don't really know how that process works and how these things evolve. What we have is... kind of a current picture of a few thousand languages, a few thousand instances. We don't have any pictures of really how these things are evolving, really.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then the evolution is massively confused by contact, right? So as soon as one language group, one group runs into another, We are smart. Humans are smart. And they take on whatever is useful in the other group. And so any kind of contrast which you're talking about, which I find useful, I'm going to start using as well.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So I worked a little bit in specific areas of words, in number words and in color words. And in color... So we have, in English, we have around 11 words that everyone knows for colors. And many more if you happen to be interested in color for some reason or other. If you're a fashion designer or an artist or something, you may have many, many more words. But we can see millions.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Like if you have normal color vision, normal trichrometric vision, you can see millions of distinctions in color. So we don't have millions of words. The most efficient, no, the most detailed color vocabulary would have over a million terms to distinguish all the different colors that we can see, but of course we don't have that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So it's somehow, it's been, it's kind of useful for English to have evolved in some way to, so there's 11 terms that people find useful to talk about, black, white, red, blue, green, yellow, purple, gray, pink, and I probably missed something there. Anyway, there's 11 that everyone knows. But you go to different cultures, especially the non-industrialized cultures, and there'll be many fewer.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So some cultures will have only two, believe it or not. The Danai in Papua New Guinea have only two labels that the group uses for color. Those are roughly black and white. They are very, very dark and very, very light, which are roughly black and white. And you might think, oh, they're dividing the whole color space into light and dark or something. And that's not really true.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They mostly just only label the black and the white things. They just don't talk about the colors for the other ones. And then there's other groups. I've worked with a group called the Chimani down in Bolivia in South America. And they have... three words that everyone knows, but there's a few others that several people, that many people know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so they have, it's kind of depending on how you count, between three and seven words that the group knows, okay? And again, they're black and white. Everyone knows those. And red, red is, you know, like that tends to be the third word that everyone, that cultures bring in. If there's a word, it's always red, the third one.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then after that, it's kind of all bets are off about what they bring in. And so after that, they bring in a sort of a big blue-green group. They have one for that. And then different people have different words that they'll use for other parts of the space. And so anyway, it's probably related to what they want to talk... Not what they see, because they see the same colors as we see.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So it's not like they have a weak... a low color palette in the things they're looking at. They're looking at a lot of beautiful scenery, okay? A lot of different colored flowers and berries and things. And so there's lots of things of very bright colors, but they just don't label the color in those cases.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the reason probably, we don't know this, but we think probably what's going on here is that what you do, why you label something is you need to talk to someone else about it. And why do I need to talk about a color
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, if I have two things which are identical and I want you to give me the one that's different and the only way it varies is color, then I invent a word which tells you, you know, this is the one I want. So I want the red sweater off the rack, not the green sweater, right? There's two.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so those things will be identical because these are things we made and they're dyed and there's nothing different about them. And so in industrialized society, we have You know, everything we've got is pretty much arbitrarily colored. But if you go to a non-industrialized group, that's not true. And so they don't—it's not only that they're not interested in color.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
If you bring bright-colored things to them, they like them just like we like them. Bright colors are great. They're beautiful. But they just don't need to talk about them. They don't have—
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We have a little bit of old English to modern English because there was a writing system, and we can see how old English looked. So the word order changed, for instance, in old English to middle English to modern English. And so we could see things like that, but most languages don't even have a writing system. So of the 7,000, Only a small subset of those have a writing system.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And even if they have a writing system, it's not a very modern writing system. And so they don't have it. So we just basically have for Mandarin, for Chinese, we have a lot of evidence for a long time and for English and not for much else. Not for German a little bit, but not for a whole lot of long-term language evolution. We don't have a lot.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We just have snapshots is what we've got of current languages.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, English is changing. English changes all the time. All languages change all the time. So, you know, there's a famous result about the Queen's English. So if you look at the Queen's vowels, the Queen's English is supposed to be, you know, originally the proper way for the talk was sort of defined by whoever the Queen talked or the King, whoever was in charge.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so if you look at how her vowels changed, from when she first became queen in 1952 or 53, when she was coronated, the first, I mean, that's Queen Elizabeth who died recently, of course, until, you know, 50 years later, her vowels changed, her vowels shifted a lot. And so that, you know, even in the sounds of British English, in her, the way she was talking was changing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
The vowels were changing slightly. So that's just, in the sounds, there's change. I don't know what's, you know, we're, we're, I'm interested. We're all interested in what's driving any of these changes. The word order of English changed a lot over a thousand years, right? So it used to look like German.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know, it used to be a verb final language with case marking, and it shifted to a verb medial language. A lot of contact. So a lot of contact with French. And it became a verb medial language with no case marking. And so it became this, you know, verb initially thing. And so that's... It's evolving. It totally evolved. And so it may very well... I mean...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know, it doesn't evolve maybe very much in 20 years is maybe what you're talking about. But over 50 and 100 years, things change a lot, I think.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's for sure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I did. You were asking me before about how do I figure out what a dependency structure is. I'd say the dependency structures aren't that hard generally. I think there's a lot of agreement of what they are for almost any sentence in most languages. I think people will agree on a lot of that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There are other parameters in the mix such that some people think there's a more complicated grammar than just a dependency structure. And so, you know, like Noam Chomsky, he's the most famous linguist ever. And he is famous for proposing a slightly more complicated syntax. And so he invented phrase structure grammar. So he's... well-known for many, many things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But in the 50s, in the early 60s, like the late 50s, he was basically figuring out what's called formal language theory. And he figured out sort of a framework for figuring out how complicated a certain type of language might be, so-called phrase-structured grammars of language might be. And so his idea was that maybe
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We can think about the complexity of a language by how complicated the rules are. And the rules will look like this. They will have a left-hand side and they'll have a right-hand side. Something on the left-hand side will expand to the thing on the right-hand side. So say we'll start with an S, which is like the root, which is a sentence. And then we're going to expand to things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
like a noun phrase and a verb phrase is what he would say, for instance, okay? An S goes to an NP and a VP is a kind of a phrase structure rule. And then we figure out what an NP is. An NP is a determiner and a noun, for instance. And a verb phrase is something else, is a verb and another noun phrase and another NP, for instance. Those are the rules of a very simple phrase structure, okay?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so he proposed phrase structure grammar, right? as a way to sort of cover human languages. And then he actually figured out that, well, depending on the formalization of those grammars, you might get more complicated or less complicated languages.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so he said, well, these are things called, you know, context-free languages, that rule that he thought, you know, human languages tend to be what he calls context-free languages. But there are simpler languages, which are so-called regular languages, and they have a more constrained form to the rules of the phrase structure of these particular rules.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So he basically discovered and kind of invented ways to describe the language. And those are phrase structure, a human language. And he was mostly interested in English initially in his work in the 50s.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes, and it doesn't have to be human language there. We can have computer languages, any kind of system which is generating some set of expressions in a language. And those could be like the... The statements in a computer language, for example. It could be that or it could be human language. So technically you can study programming languages. Yes, and have been.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, heavily studied using this formalism. There's a big field of programming languages within the formal language. Okay.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's a particular... formalism for describing language. And Chomsky was the first one. He's the one who figured that stuff out back in the 50s. And that's equivalent, actually. The context-free grammar is actually kind of equivalent in the sense that it generates the same sentences as a dependency grammar would. The dependency grammar is a little simpler in some way.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You just have a root and it goes, like, we don't have any of these, the rules are implicit, I guess. And we just have connections between words. The free structure grammar is kind of a different way to think about the dependency grammar. It's slightly more complicated, but it's kind of the same in some ways.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They're very close. So phrase structure grammar and dependency grammar aren't that far apart. I like dependency grammar because it's more perspicuous, it's more transparent about representing the connections between the words. It's just a little harder to see in phrase structure grammar. The place where Chomsky sort of devolved or went off from this is he also thought there was...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
um something called movement okay and so it's so and that's where we disagree okay that's the place where i would say we disagree and and and i mean well maybe we'll get into that later but the idea is if you want to do you want me to explain that no i would love can you explain movement movement okay so you're saying so many interesting things yeah yeah okay so here's the movement is
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Chomsky basically sees English and he says, okay, I said, you know, we had that sentence earlier, like it was like two dogs entered the room. Let's change it a little bit, say two dogs will enter the room. And he notices that, hey, English, if I want to make a question, a yes, no question from that same sentence, I say, instead of two dogs will enter the room, I say, will two dogs enter the room?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Okay. There's a different way to say the same idea. And it's like, well, the auxiliary verb, that will thing, it's at the front as opposed to in the middle. Okay. And so, and he looked, you know, if you look at English, you see that that's true for all those modal verbs and for other kinds of auxiliary verbs in English. You always do that. You always put an auxiliary verb at the front.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And when he saw that, so if I say, I can win this bet, can I win this bet, right? So I move a can to the front. So actually, that's a theory. I just gave you a theory there. He talks about it as movement. That word in the declarative is the root, is the sort of default way to think about the sentence, and you move the auxiliary verb to the front. That's a movement theory, okay?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And he just thought that was just so obvious that it must be true. That there's nothing more to say about that, that this is how auxiliary verbs work in English. There's a movement rule such that to get from the declarative to the interrogative, you're moving the auxiliary to the front.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it's a little more complicated as soon as you go to simple present and simple past, because if I say, you know, John slept, you have to say that. did John sleep, not slept John, right? And so you have to somehow get an auxiliary verb. And I guess underlyingly, it's like slept is, it's a little more complicated than that, but that's his idea. There's a movement, okay?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so a different way to think about that, that isn't, I mean, then he ended up showing later, right? So he proposed this theory of grammar, which has movement. There's other places where he thought there's movement, not just auxiliary verbs, but things like the passive in English and things like questions, WH questions, a bunch of places where he thought there's also movement going on.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And in each one of those, he thinks there's words, well, phrases and words are moving around from one structure to another, which he called deep structure to surface structure. I mean, there's like two different structures in his theory, okay? There's a different way to think about this. which is there's no movement at all.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's a lexical copying rule such that the word will or the word can, these auxiliary verbs, they just have two forms. And one of them is the declarative and one of them is the interrogative. And you basically have the declarative one and, oh, I form the interrogative or I can form one from the other. It doesn't matter which direction you go.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I just have a new entry, which has the same meaning, which has a slightly different argument structure. Argument structure is just a fancy word for the ordering of the words. And so if I say, you know, it was the dog's two dogs can or will enter the room, there's two forms of will.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
One is will declarative, and then, okay, I've got my subject to the left, it comes before me, and the verb comes after me in that one. And then the will interrogative, it's like, oh, I go first. Interrogative, will is first, and then I have the subject immediately after, and then the verb after that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so you can just generate from one of those words another word with a slightly different argument structure, with different ordering,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, but that's the lexical copying is similar. So then we do lexical copying for that same idea that maybe the declarative is the source and then we can copy it. And so an advantage is Well, there's multiple advantages of the lexical copying story. It's not my story.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
This is like Ivan Sog, linguists, a bunch of linguists have been proposing these stories as well, you know, in tandem with the movement story. Okay, you know, Ivan Sog died a while ago, but he was one of the proponents of the non-movement of the lexical copying story. And so that is that a great advantage is, well...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Chomsky, really famously in 1971, showed that the movement story leads to learnability problems. It leads to problems for how language is learned. It's really, really hard to figure out what the underlying structure of a language is if you have both phrase structure and movement. It's really hard to figure out what came from what. There's a lot of possibilities there.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
If you don't have that problem, the learning problem gets a lot easier.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah. Yeah, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, just learning English. So a baby is lying around listening to the crib, listening to me talk, and how are they learning English? Or maybe it's a two-year-old who's learning interrogatives and stuff. How are they doing that? Are they doing it vocally? So Chomsky said it's impossible to figure it out, actually. He said it's actually impossible, not hard, but impossible.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And therefore, that's where universal grammar comes from, is that it has to be built in. And so what they're learning is that there's some built-in movement that's built in in his story. It's absolutely part of your language module. And then you are... you're just setting parameters. You're said, depending on English, it's just sort of a variant of the universal grammar.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And you're figuring out, oh, which orders does English do these things? The non-movement story doesn't have this. It's like much more bottom-up. You're learning rules. You're learning rules one by one. And, oh, this word is connected to that word. Another advantage, it's learnable. Another advantage of it is that it predicts that not all auxiliaries might move.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It might depend on the word, depending on whether you... And that turns out to be true. So there's words that don't really work as auxiliary. They work in declarative and not in interrogative. So I can say, I'll give you the opposite first. I can say, aren't I invited to the party? And that's an interrogative form. But it's not from, I aren't invited to the party. There is no I aren't, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So that's interrogative only. And then we also have forms like ought. I ought to do this. And I guess some old British people can say— Ought I. Exactly. It doesn't sound right, does it? For me, it sounds ridiculous. I don't even think ought is great, but I mean, I totally recognize I ought to do it. It's not too bad, actually. I can say I ought to do this. That sounds pretty good.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I don't know. It just sounds completely out to me. Odd eye. Anyway, so there are variants here. And a lot of these words just work in one versus the other. And that's fine under the lexical copying story. It's like, well, you just learn the usage. Whatever the usage is, is what you do with this word. But it's a little bit harder in the movement story.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
The movement story... That's an advantage, I think, of lexical copying. In all these different places, there's all these usage... which make the movement story a little bit harder to work.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah. versus dependency grammar. Those are equivalent in some sense in that for any dependency grammar, I can generate a phrase structure grammar which generates exactly the same sentences. I just like the dependency grammar formalism because it makes something really salient, which is the lengths of dependencies between words, which isn't so obvious in the phrase structure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
In the phrase structure, it's just kind of hard to see. It's in there. It's just very, very, it's opaque.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
uh technically i think phrase structure grammar is mappable to dependency grammar and vice versa and vice versa yeah but there's like these like little labels s and pvp yeah for a particular dependency grammar you can make a phrase structure grammar which generates exactly those same sentences and vice versa but there are many phrase structure grammars which you can't really make a dependency grammar
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, you can do a lot more in a phrase structure grammar, but you get many more of these extra nodes, basically. You can have more structure in there. And some people like that, and maybe there's value to that. I don't like it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Absolutely.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No. Okay. Not at all. I mean, regular languages are too simple for human languages. It's a part of the hierarchy, but human languages in the phrase structure world are at least context-free, maybe a little bit more, a little bit harder than that. So there's something called context-sensitive as well, where you can have, like this is just the formal language description,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
In a context-free grammar, you have one... This is like a bunch of formal language theory we're doing here.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Okay. So you have a left-hand side category, and you're expanding to anything on the right. That's a context-free. The idea is that that category on the left expands in independent of context to those things, whatever they are on the right. It doesn't matter what. And a context-sensitive... says, okay, I actually have more than one thing on the left.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I can tell you only in this context, you know, maybe you have like a left and a right context or just a left context or a right context. I have two or more stuff on the left tells you how to expand those things in that way. Okay, so it's context sensitive. A regular language is just more constrained. And so it It doesn't allow anything on the right.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It allows very... Basically, it's one very complicated rule is kind of what a regular language is. And so it doesn't have any... I was going to say long-distance dependencies. It doesn't allow recursion, for instance. There's no recursion. Yeah, recursion is where you... Human languages have recursion. They have embedding.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And you can't... Well, it doesn't allow center-embedded recursion, which human languages have, which is what... Center-embedded recursion.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, within a sentence. So here we're going to get to that. But the formal language stuff is a little aside. Chomsky wasn't proposing it for human languages even. He was just pointing out that human languages are context-free. Because that was kind of stuff we did for formal languages. And what he was most interested in was
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
human language, and that's like, the movement is where we, where he sort of set off on the, I would say, a very interesting, but wrong foot. It was kind of interesting, it's a very, I agree, it's a very interesting history. So he proposed this,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
multiple theories in 57 and then 65 there they all have this framework though was phrase structure plus movement different versions of the of the phrase structure and the movement in the 57 these are the most famous original bits of chomsky's work and then 71 is when he figured out that those lead to learning problems that that there's cases where a kid could never figure out which rule um which set of rules was intended and and so and then he said well that means it's innate
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's kind of interesting. He just really thought the movement was just so obviously true that he couldn't... He didn't even entertain giving it up. It's just obvious. That's obviously right. And it was later where people figured out that there's all these subtle ways in which things which look like generalizations aren't generalizations across the category.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They're word-specific, and they kind of work, but they don't work across various other words in the category. And so it's easier to just think of these things as lexical copies. And I think he was very obsessed. I don't know. I'm just guessing. He really wanted this story to be simple in some sense. And language is a little more complicated in some sense. He didn't like words.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
He never talks about words. He likes to talk about combinations of words. And words are... You know, if you look up a dictionary, there's 50 senses for a common word, right? The word take will have 30 or 40 senses in it. So there'll be many different senses for common words. And he just doesn't think about that. He doesn't think that's language. I think he doesn't think that's language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
He thinks that words are distinct from combinations of words. I think they're the same. If you look at my brain in the scanner while I'm listening to a language I understand, And you compare, I can localize my language network in a few minutes, in like 15 minutes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And what you do is I listen to a language I know, I listen to, you know, maybe some language I don't know, or I listen to muffled speech, or I read sentences and I read non-words. Like I can do anything like this, anything that's sort of really like English and anything that's not very like English. So I've got something like it and not, and I got to control.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the voxels, which is just, you know, the 3D pixels in my brain that are responding most, is a language area. And that's this left lateralized area in my head. And wherever I look in that network, if you look for the combinations versus the words, it's everywhere. It's the same. That's fascinating. And so it's like hard to find, there are no areas that we know. I mean, that's,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's a little overstated right now. At this point, the technology isn't great. It's not bad. But we have the best way to figure out what's going on in my brain when I'm listening or reading language is to use fMRI, functional magnetic resonance imaging. And that's a very good localization technique.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
method so i can figure out where exactly these signals are coming from pretty you know down to you know millimeters you know cubic millimeters or smaller okay very small we can figure those out very well the problem is the when okay uh it's it's measuring um oxygen okay and oxygen takes a little while to get to those cells so it takes on the order of seconds so i talk fast i probably listen fast and i can probably understand things really fast so a lot of stuff happens in two seconds
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so to say that we know what's going on, that the words right now in that network, our best guess is that whole network is doing something similar, but maybe different parts of that network are doing different things. And that's probably the case. We just don't have very good methods to figure that out right at this moment. And so...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, I mean, I've been at MIT for 31 years, since 1993, and Chomsky's been there much longer. So I met him, I knew him, I met when I first got there, I guess, and we would interact every now and then. I'd say our biggest difference is our methods. And so that's the biggest difference between me and Noam, is that I gather data from people.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I do experiments with people and I gather corpus data, whatever, whatever corpus data is available. And we do quantitative methods to evaluate any kind of hypothesis we have. He just doesn't do that. So, you know, you, you know, he has never once been associated with any experiment or corpus work ever. And so it's all thought experiments. It's his own intuitions.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So I just don't think that's the way to do things. Yeah. That's an across-the-street-there-across-the-street-from-us kind of difference between Brain and CogSci and linguistics. I mean, some of the linguists, depending on what you do, more speech-oriented, they do more quantitative stuff.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But in the meaning, words and, well, it's combinations of words, syntax, semantics, they tend not to do experiments and... and corpus analyses.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, I mean, I'm a psychologist. So I would say we're in psychology. Brain and Cognitive Science is MIT's old psychology department. It was a psychology department up until 1985, and it became the Brain and Cognitive Science department. And so, I mean, my training is math and computer science, but I'm a psychologist. I mean, I don't know what I am.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I am what I am, but I'm happy to be called a linguist, I'm happy to be called a computer scientist, I'm happy to be called a psychologist, any of those things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Those are theories. But I think the reason we differ in part is because of how we evaluate the theories. And so I evaluate theories quantitatively, and Noam doesn't. Got it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So the reason I like dependency grammar, as I've said before, is that it's very transparent about its representation of distance between words. So it's like, all it is, is you've got a bunch of words, you're connecting together to make a sentence. And...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
a really neat insight which turns out to be true is that the further apart the pair of words are that you're connecting the harder it is to do the production the harder it is to do the comprehension it's harder to produce hard to understand when the words are far apart when they're close together it's easy to produce and it's easy to comprehend let me give you an example okay so
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We have, in any language, we have mostly local connections between words, but they're abstract. The connections are abstract, they're between categories of words. And so you can always make things further apart if you add modification, for example, after a noun, so a noun in English comes before a verb, the subject noun comes before a verb, and then there's an object after, for example.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So I can say what I said before, you know, the dog entered the room or something like that. So I can modify dog. If I say something more about dog after it, then what I'm doing is,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
indirectly i'm lengthening the dependence the dependence between dog and entered by adding more stuff to it so i just make just make it explicit here if i say um uh the the boy who the cat scratched cried we're going to have a mean cat here And so what I've got here is the boy cried. It would be a very short, simple sentence. And I just told you something about the boy.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I told you it was the boy who the cat scratched, okay?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Right? And so I can do that. I can say that. That's a perfectly fine English sentence. And I can say, the cat which the dog chased ran away or something. Okay? I can do that. But it's really hard now. I've got, you know, whatever I have here. I have the boy who the cat... Now let's say I try to modify cat. Okay? The boy who the cat... which the dog chased, scratched, ran away.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Oh my God, that's hard, right? I can, I'm sort of just working that through in my head, how to produce and how to, and it's really just horrendous to understand. It's not so bad. At least I've got intonation there to sort of mark the boundaries and stuff, but it's, that's really complicated. That's, sort of English in a way. I mean, that follows the rules of English.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So what's interesting about that is that what I'm doing is nesting dependencies there. I've got a subject connected to a verb there, and then I'm modifying that with a clause, another clause, which happens to have a subject and a verb relation. I'm trying to do that again on the second one. And what that does is it
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
lengthens out the dependence multiple dependents actually get lengthened out there the dependencies get get longer longer on the outside ones get long and even the ones in between get kind of long and and and you just so what's fascinating is that that's bad that's really horrendous in english but that's horrendous in any language and so in no matter what language you look at if you do you
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
just figure out some structure where I'm going to have some modification following some head, which is connected to some later head, and I do it again, it won't be good. Guaranteed. Like 100%, that will be uninterpretable in that language in the same way that was uninterpretable in English.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's a good question. I'll just say words. Your words are morphemes between. We don't know that. Actually, that's a very good question. What is the distance metric? But let's just say it's words.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They will have a hard time either producing or comprehending it. They might tell you that's not their language. It's sort of their language. They'll agree with each of those pieces as part of their language, but somehow that combination will be very, very difficult to produce and understand.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
is... Well, I'm giving you an explanation. I'm giving you two kinds of explanations. I'm telling you that center embedding, that's nesting, those are synonyms for the same concept here. And the explanation for why... Those are always hard. Center embedding and nesting are always hard. And I gave you an explanation for why they might be hard, which is long-distance connections.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
When you do center embedding, when you do nesting, you always have long-distance connections between the dependents. You just... So that's not necessarily the right explanation. I can go through reasons why that's probably a good explanation. And it's not really just about one of them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So probably it's a pair of them or something of these dependents that you get along that drives you to be really confused in that case. And so what the behavioral consequence there, I mean, we... This is kind of methods, like how do we get at this? You could try to do experiments to get people to produce these things. They're going to have a hard time producing them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You can try to do experiments to get them to understand them and see how well they understand them, can they understand them. Another method you can do is give people partial materials and ask them to complete them, those center-embedded materials, and they'll fail. Yeah. So I've done that. I've done all these kinds of things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No, no. Nesting is the same thing. Central embedding. Those are totally equivalent terms. I'm sorry. I sometimes use one and sometimes use the other. Got it. Got it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah, you could. I mean, there's multiple ways to do that. I mean, there's the simplest ways. Just ask people, how good does it sound? How natural is the sound? That's a very blunt, but very good measure. If it's very reliable, people will do the same thing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so it's like, I don't know what it means exactly, but it's doing something such that we're measuring something about the confusion, the difficulty associated with those.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So if you give them a partial sentence, say I say, the book... which the author who, and I ask you to now finish that off for me. I mean, either say it. Yeah, yeah, but you can just say it's written in front of you and you can just type and have as much time as you want. They will, even though that one's not too hard, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So if I say it's like the book, it's like, oh, the book which the author who I met wrote That's a very simple completion for that. If I give that completion online somewhere to a crowdsourcing platform and ask people to complete that, they will miss off a verb very regularly, like half the time, maybe two-thirds of the time. They'll just leave off one of those verb phrases.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Even with that simple, so to say, the book... which the author who, and they'll say, you need three verbs, right? I need three verbs here. Who I met, wrote, was good. And they'll give me two. They'll say, who was famous was good or something like that. They'll just give me two. And that'll happen about 60% of the time.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So 40%, maybe 30, they'll do it correctly, meaning they'll do a three-verb phrase. I don't know what's correct or not. This is hard. It's a hard task.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
If you look, it's a little easier than listening. It's pretty tough. Because there's no trace of it. You have to remember the words that I'm saying, which is very hard auditorily. We wouldn't do it this way. You do it written. You can look at it and figure it out. It's easier in many dimensions in some ways, depending on the person. It's easier to gather information. written data.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I work in psycholinguistics, psychology of language and stuff, and so a lot of our work is based on written stuff because it's so easy to gather data from people doing written kinds of tasks. Spoken tasks are just more complicated to administer and analyze because people do weird things when they speak, and it's harder to analyze what they do, but they generally point to the same kinds of things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
All languages. All languages. All languages have short dependencies. You can actually measure that. So an ex-student of mine, this guy is at University of California, Irvine, Richard Futrell did a thing a bunch of years ago now where he looked at all the languages we could look at, which was about 40 initially. And now I think there's about 60 for which there are dependency structures.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So meaning there's got to be like a big text, a bunch of texts, which have been parsed for their dependency structures. And there's about 60 of those which have been parsed that way. And for all of those, what he did was take any sentence in one of those languages and you can do the dependency structure and then start at the root. We're talking about dependency structures. That's pretty easy now.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And he's trying to figure out what a control way you might say the same sentence is in that language. And so he's just like, all right, there's a root. And it has, let's say as a sentence is, let's go back to, you know, two dogs entered the room. So entered is the root. And entered has two dependents. It's got dogs and it has room, okay? And what he does is like, let's scramble that order.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's three things, the root and the head and the two dependents and into some random order, just random. And then just do that for all the dependents down the tree. So now look, do it for the and whatever was two and dogs and for, and room. And that's, you know, that's not a, it's a very short sentence.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
When sentences get longer and you have more dependence, there's more scrambling that's possible. And what he found was, so that's one, you can figure out one scrambling for that sentence. He did this like a hundred times for every sentence in every one of these texts, every corpus.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then he just compared the dependency lengths in those random scramblings to what actually happened, what the English or the French or the German was in the original language, or Chinese or whatever, all these like 80, no, 60 languages, okay? And the dependency lengths are always shorter in the real language compared to this kind of a control.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And there's another, it's a little more rigid, his control. So... The way I described it, you could have crossed dependencies. By scrambling that way, you could scramble in any way at all. Languages don't do that. They tend not to cross dependencies very much. So the dependency structure, they tend to keep things non-crossed.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's a technical term they call that, projective, but it's just non-crossed is all that is, projective. And so if you just constrain the scrambling so that it only gives you projective sort of non-crossed, the same thing holds. So still human languages are much shorter than this kind of a control. So there's like, what it means is that there...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
In every language, we're trying to put things close relative to this kind of a control. It doesn't matter about the word order. Some of these are verb final. Some of them use a verb medial like English. And some are even verb initial. There are a few languages in the world which have VSO, word order, verb, subject, object languages. I haven't talked about those.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right. So, I mean, the story here is just about communication. It is just about production, really. It's about ease of production is the story.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's easier for me to say things when the... What I'm doing whenever I'm talking to you is somehow I'm formulating some idea in my head and I'm putting these words together. And it's easier for me to do that to say something where the words are closely connected in a dependency as opposed to separated by putting something in between and over and over again.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's just hard for me to keep that in my head. That's the whole story. The story is basically the dependency grammar sort of gives that to you Like just like long is bad, short is good. It's like easier to keep in mind because you have to keep it in mind for, probably for production, probably matters in comprehension as well. Like also matters in comprehension.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But I would guess it's probably evolved for production. It's about producing. It's what's easier for me to say that ends up being easier for you also. That's very hard to disentangle. This idea of who is it for? Is it for me, the speaker? Or is it for you, the listener? I mean, part of my language is for you. Like the way I talk to you is going to be different from how I talk to different people.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I'm definitely angling what I'm saying to who I'm saying, right? It's not like I'm just talking the same way to every single person. And so I am sensitive to my audience. But does that work itself out in the dependency length differences? I don't know. Maybe that's about just the words, that part, you know, which words I select.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We have different senses, I guess. I'm very selfish. Yeah. And you're like, I'm like, I think it's like, it's all about me. I'm like, I'm just doing what's easiest for me. I don't want to, I'm like, I'll, I mean, but I have to, of course, choose the words that I think you're going to know. I'm not going to choose words you don't know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
In fact, I'm going to fix that when I, you know, so there it's about, but maybe for the syntax, for the combinations, it's just about me. I feel like it's, I don't know though. It's very hard.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's about the listener. It's a little circular there too then. Okay.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, let's control for what it is I want to say. I'm saying let's control for the thing, the message. Control for the message.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's the goal. Oh, but that's the meaning. So I'm still talking about the form. Just the form of the meaning. How do I frame the form of the meaning is all I'm talking about. You're talking about a harder thing, I think. It's like, how am I trying to change the meaning? Let's keep the meaning constant. Got it. If you keep the meaning constant, how can I phrase whatever it is I need to say?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I've got to pick the right words, and I'm going to pick the order so it's easy for me. That's what I think it's probably like.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But look, for any event... There's an unbounded, I don't want to say infinite, but sort of ways that I might communicate that same event. This two dogs entered a room, I can say in many, many different ways. I can say, hey, there's two dogs. They entered the room. Hey, the room was entered by something. The thing that was entered was two dogs.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, it's kind of awkward and weird and stuff, but those are all similar messages, right? with different forms, different ways I might frame. And of course, I use the same words there all the time. I could have referred to the dogs as a Dalmatian and a poodle or something. I could have been more specific or less specific about what they are.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I could have said, been more abstract about the number. So I'm trying to keep the meaning, which is this event, And then how am I going to describe that to get that to you? It kind of depends on what you need to know, right? And what I think you need to know. But I'm like, let's get control for all that stuff.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I'm just like choosing, but I'm doing something simpler than you're doing, which is just forms.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That might be, yeah. Yeah, that would be changing. Oh, that would be changing the meaning for sure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's changing the meaning. But say, even if we keep that constant, we can still talk about what's easier or hard for me, right? The listener and the, right? Which phrase structures I use, which combinations, which, you know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But I think that's why... these large language models are so successful is because they're good at form and form isn't that hard in some sense. And meaning is tough still. And that's why they're not there. You know, they don't understand what they're doing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We're going to talk about that later maybe, but like we can distinguish in our, forget about large language models, like humans, maybe you'll talk about that later too, is like the difference between language, which is a communication system and thinking, which is meaning. So language is a communication system for the meaning. It's not the meaning.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so that's why, I mean, and there's a lot of interesting evidence we can talk about relevant to that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, you or anyone has to think of a task which they think is a good thinking task. And there's lots and lots of tasks which should be good thinking tasks. And whatever those tasks are, let's say it's playing chess or that's a good thinking task or playing some game or doing some complex puzzles. Maybe...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
maybe remembering some digits, that's thinking, remembering some, a lot of different tasks we might think, maybe just listening to music is thinking, or there's a lot of different tasks we might think of as thinking. There's this woman in my department, F. Fedorenko, and she's done a lot of work on this question about what's the connection between language and thought.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so she uses, I was referring earlier to MRI, fMRI, that's her primary method. And so she has been really fascinated by this question about whether
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
what language is okay and so as i mentioned earlier you can localize my language area your language area in a few minutes okay like 15 minutes i can listen to language listen to non-language or backward speech or something and and we'll find areas left lateralized network in my head which is especially which is very sensitive to language as opposed to whatever that control was okay
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Just sentences. You know, I'm listening to English of any kind, a story, or I can read sentences, anything at all that I understand, if I understand it, then it'll activate my language network. So right now my language network is going like crazy when I'm talking and when I'm listening to you because we're both communicating.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, it's incredibly stable. So I happen to be married to this woman at Federico, and so I've been scanned by her over and over and over since 2007 or 2006 or something. And so my language network is exactly the same, you know, like a month ago as it was back in 2007. It's amazingly stable. It's astounding. It's fantastic. really fundamentally cool thing. My language network is like my face.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's not changing much over time inside my head.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We don't know. That's a very hard question. They're working on that right now because of the problem of scanning little kids. Trying to do the localization on little children in this scanner. You're lying in the fMRI scan. That's the best way to figure out where something's going on inside our brains.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
and the scanner is loud, and you're in this tiny little area, you're claustrophobic, and it doesn't bother me at all. I can go to sleep in there. But some people are bothered by it, and little kids don't really like it, and they don't like to lie still. And you have to be really still, because if you move around, that messes up the coordinates of where everything is. And so...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Your question is, how and when are language developing? How does this left-lateralized system come to play? And it's really hard to get a two-year-old to do this task. But you can maybe, they're starting to get three and four and five-year-olds to do this task for short periods. And it looks like it's there pretty early.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes. And I can find my network in 15 minutes. And now we can ask, find my network, find yours, find 20 other people do this task. And we can do some other tasks. Anything else you think is thinking of some other thing. I can do a spatial memory task. I can do a music perception task. I can do programming task if I program. Yeah. where I can understand computer programs.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
None of those tasks tap the language network at all. At all. There's no overlap. They're highly activated in other parts of the brain. There's a bilateral network which I think she tends to call the multiple demands network, which does anything kind of hard. And so anything that's kind of difficult in some ways will activate that multiple demands network. I mean, music will be in some music area.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's music-specific kinds of areas. But none of them are activating the language area at all, unless there's words. So if you have music and there's a song and you can hear the words, then you get the language area.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
This is all comprehension of any kind. That is fascinating. This network doesn't make any difference if it's written or spoken. The thing that Federico calls the language network is this high-level language. It's not about the spoken language, and it's not about the written language. It's about either one of them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so when you do speech, you listen to speech and you subtract away some language you don't understand, or you subtract away backwards speech, which sounds like speech, but isn't. And then, so you take away the sound part altogether. And then if you do written, you get exactly the same network.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So for just reading the language versus reading sort of nonsense words or something like that, you'll find exactly the same network. And so this is about high level, The comprehension. Comprehension of language, yeah, in this case. And the same thing happens, production's a little harder to run the scanner, but the same thing happens in production. You get the same network.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So production's a little harder, right? You have to figure out how do you run a task in the network such that you're doing some kind of production. And I can't remember what, they've done a bunch of different kinds of tasks there where you get people to produce things, yeah, figure out how to produce. And the same network goes on there, exactly the same place.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, if you read things like,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah. Lewis Carroll's Twas Brillig. Jabberwocky, right? They call that Jabberwocky speech.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Not as much. There are words in there. There's function words and stuff. So it's lower activation. Fascinating. Yeah, yeah. So there's like, basically the more language-like it is, the higher it goes in the language network. And that network is there from when you speak, from as soon as you learn language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it's there, like you speak multiple languages, the same network is going for your multiple languages. So you speak English, you speak Russian, both of them are hitting that same network, if you're fluent in those languages.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Not at all. Isn't that amazing? Even if you're a really good programmer, that is not a human language. It's just not conveying the same information. And so it is not in the language network.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It is amazing. That's really weird. So that's like one set of data. This is hers, like shows that what you might think is thinking is not language. Language is just this conventionalized system that we've worked out in human languages. Oh, another fascinating little tidbit is that even if they're these constructed languages like Klingon or... I don't know the languages from Game of Thrones.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I'm sorry. I don't remember those languages. There's a lot of people offended right now. There's people that speak those languages. They really speak those languages because the people that wrote... the languages for the shows, they did an amazing job of constructing something like a human language. And that lights up the language area.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Because they can speak pretty much arbitrary thoughts in a human language. It's a constructed human language, and probably it's related to human languages because the people that were constructing them were making them like human languages in various ways. But it also activates the same network, which is pretty cool.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It has to be doing. So it's doing in communication, right? It is translating from thought, whatever that is, is more abstract. And it's doing that. That's what it's doing. Like it is, that is kind of what it is doing. It's like kind of a meaning network, I guess.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
set of concepts that we... Well, there's connections right between what these things mean, and then there's probably other parts of the brain about what these things mean. And so, you know, when I'm talking about whatever it is I want to talk about, it'll be represented somewhere else. That knowledge of whatever that is will be represented somewhere else.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's correct. Isn't that cool? And that's so interesting. So people, I mean, this is like hard to do experiments on, but there is this idea of an inner voice. And a lot of people have an inner voice. And so if you do a poll on the internet and ask, you hear yourself talking when you're just thinking or whatever. About 70 or 80% of people will say yes. Most people have an inner voice. I don't.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so I always find this strange. So when people talk about an inner voice, I always thought this was a metaphor. And they hear, I know most of you, whoever's listening to this thinks I'm crazy now because I don't have an inner voice and I just don't know what you're listening to.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It sounds so kind of annoying to me to have this voice going on while you're thinking, but I guess most people have that and I don't have that and we don't really know what that connects to.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I don't know. I don't know. I mean, this could be speechy, right? Do you have an inner voice? I don't think so. Oh. A lot of people have this sense that they hear themselves, and then say they read someone's email. I've heard people tell me that they hear that other person's voice when they read other people's emails. And I'm like, wow, that sounds so disruptive.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, you probably don't have an inner voice.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
People have an inner voice. People have this strong percept of hearing sound in their heads when they're just thinking.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Majority, absolutely. What? It's like two-thirds or three-quarters. It's a lot. I would never ask a class, and I went on the internet, they always say that. So you're in a minority.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Mm-hmm. So that's one set. One set of data from Fedorenko's group is that no matter what task you do, if it doesn't have words and combinations of words in it, then it won't light up the language network. It'll be active somewhere else, but not there. So that's one. And then this other...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
piece of evidence relevant to that question is that it turns out there are these, this group of people who've had a massive stroke on the left side and wiped out their language network. And as long as they didn't wipe out everything on the right as well, in that case, they wouldn't be, you know, cognitively functionable.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But if they just wiped out language, which is pretty tough to do because it's very expansive on the left, but if they have, then there are these, there's patients like this, so-called global aphasics, who can do Any task, just fine, but not language. You can't talk to them. I mean, they don't understand you. They can't speak. They can't write. They can't read. But they can play chess.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They can drive their cars. They can do all kinds of other stuff. They can do math. So math is not in the language area, for instance. You do arithmetic and stuff. That's not language area. it's got symbols. So people sort of confuse some kind of symbolic processing with language and symbolic processing is not the same. So there are symbols and they have meaning, but it's not language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's not a conventionalized language system. And so math isn't there. And so they can do math. They do just as well as their control, age match controls and all these tasks. This is Rosemary Varley over in University College London, who has a bunch of patients who she's shown this, that they're just... So that sort of combination suggests that language isn't necessary for thinking.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It doesn't mean you can't think in language. You could think in language, because language allows a lot of expression, but it's just, you don't need it for thinking. It suggests that language is a separate system for thinking.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's cool, isn't it?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It sure does. And they've been working on that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's kind of a big theory, but the reason it's arguably the best is that it does the best at predicting what's English, for instance. It's incredibly good, better than any other theory. But it's not sort of – there's not enough detail. Or it's opaque.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's a black box. It's another black box. But I think it is a theory.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, I don't know. Maybe I'm just being loose there. I think it's not a great theory, but it's a theory. It's a good theory in one sense in that it covers all the data. Like anything you want to say in English, it does. And so that's how it's arguably the best. is that no other theory is as good as a large language model in predicting exactly what's good and what's bad in English.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Now you're saying, is it a good theory? Well, probably not, because I want a smaller theory than that. It's too big. I agree.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, you know, that's, I mean, that presumes, and there's some evidence for this, that some large language models are implementing something like dependency grammar inside them. And so there's work from a guy called Chris Manning and colleagues over at Stanford in natural language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And they looked at, I don't know how many large language model types, but certainly BERT and some others where you do some kind of
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
fancy math to figure out exactly what the sort of what kind of abstractions of representations are going on and they and they were saying it does look like dependency structure is is what they're constructing it doesn't like so it's actually a very very good map so kind of a they are constructing something like that um does it mean that you know that they're using that for meaning i mean probably but we don't know
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's just a general theory of language such that there's a form and a meaning pair for lots of pieces of the language. And so it's primarily usage-based, is the construction grammar. It's trying to deal with the things that people actually say, actually say and actually write. And so it's a usage-based idea. And what's a construction?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
A construction is either a simple word, sort of like a morpheme plus its meaning, or a combination of words. It's basically... combinations of words like the the rules so but it's it's um it's uh unspecified as to what the form of the grammar is underlyingly. And so I would argue that the dependency grammar is maybe the right form to use for the types of construction grammar.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Construction grammar typically isn't kind of formalized quite. And so maybe the formalization, a formalization of that, it might be in dependency grammar. I mean, I would think so, but I mean, it's up to people, other researchers in that area, if they agree or not, so.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, I would argue they're doing the form. They're doing the form and doing it really, really well. And are they doing the meaning? No, probably not. I mean, there's lots of these examples from various groups showing that they can be tricked in all kinds of ways. They really don't understand the meaning of what's going on.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so there's a lot of examples that he and other groups have given that Which show they don't really understand what's going on. So, you know, the Monty Hall problem is this silly problem, right? Where, you know, if you have three doors, it's Let's Make a Deal, it's this old game show, and there's three doors, and there's a prize behind one, and there's some...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Junk prize is behind the other two and you're trying to select one. And if you, you know, he knows, Monty, he knows where the target item is, the good thing. He knows everything is back there. And you're supposed to, he gives you a choice. You choose one of the three. And then he opens one of the doors and it's some junk prize. And then the question is, should you trade to get the other one?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the answer is yes, you should trade because he knew which ones you could turn around. And so now the odds are two thirds. Okay.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Um, and then if you just change that a little bit to the large language model, the large language model, just seen that, that, that explanation so many times that it just, if you change the stories a little bit, but it make it sound like it's the Monty Hall problem, but it's not, you just say, oh, um, There's three doors, and one behind them is a good prize, and there's two bad doors.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I happen to know it's behind door number one. The good prize, the car, is behind door number one. So I'm going to choose door number one. Monty Hall opens door number three and shows me nothing there. Should I trade for door number two? Even though I know the good prize is in door number one.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then the large language model will say, yes, you should trade, because it just goes through the forms that it's seen before so many times on these cases. where it, yes, you should trade because your odds have shifted from one in three now to two out of three to being that thing. It doesn't have any way to remember that actually you have 100% probability behind that door number one.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know that. That's not part of the scheme that it's seen hundreds and hundreds of times before. And so you can't, even if you try to explain to it that it's wrong, that they can't do that, it'll just keep giving you back the problems.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, you don't have to convince me of that. I am very, very impressed, but does it do, I mean, you're giving a possible world where maybe someone's gonna train some other version such that it'll be somehow abstracting away from types of forms I mean, I don't think that's happened.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's not the inference. So I don't want to make that, the inference I wouldn't want to make was that inference. The inference I'm trying to push is just that is it like humans here? It's probably not like humans here. It's different. So humans don't make that error. If you explain that to them, they're not going to make that error. They don't make that error.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so that's something, it's doing something different from humans that they're doing in that case.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I'm just saying the error there is like, if I explain to you there's 100% chance that the car is behind this case, this door, well, do you want to trade? People say no. But this thing will say yes, because it's so, that trick, it's so wound up on the form that it's, that's an error that a human doesn't make, which is kind of interesting.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Look, the places where large language models are, the form is amazing. So let's go back to nested structures, center-embedded structures, okay? If you ask a human to complete those, they can't do it. Neither can a large language model. They're just like humans in that. If you ask, if I ask a large language model- That's fascinating, by the way. The central embedding?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
As a kid in school, when we had to structure sentences in English grammar, I found that process interesting. I found it confusing as to what it was I was told to do. I didn't understand what the theory was behind it, but I found it very interesting.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Just like humans. Exactly like humans. Exactly the same way as humans. And that's not trained. So that is a similarity. But that's not meaning. This is form. But when we get into meaning, this is where they get kind of messed up. When you start just saying, oh, what's behind this door? Oh, this is the thing I want. Humans don't mess that up as much. The form, it's just like.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
The form of the match is amazing. without being trained to do that. I mean, it's trained in the sense that it's getting lots of data, which is just like human data, but it's not being trained on, you know, bad sentences and being told what's bad. It just can't do those. It'll actually say things like, those are too hard for me to complete or something, which is kind of interesting, actually.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Kind of, how does it know that? I don't know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I think so. Yeah, I think so.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, that's what this group argues. So the same group, Federenko's group, has a recent paper arguing exactly that. There's a guy called Kyle Mahuel who's here in Austin, Texas, actually. He's an old student of mine, but he's a faculty in linguistics at Texas, and he was the first author on that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know, I... I don't see any limits to their form.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, it's just the same as humans. It seems the same.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No, but I want it to be like humans. I want a model of humans.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, that's the mechanism. If it's modeling, I think it's kind of really interesting that it can't. That it's really interesting. I think it's potentially underlyingly modeling something like the way the form is processed.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, I think that's right. I didn't know I was going to work on this at all at that point. I was really just... I was kind of a math geek person, computer scientist. I really liked computer science. And then I found... Language is a neat puzzle to work on from an engineering perspective, actually.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So what I like about dependency grammar is it makes... The cognitive cost associated with longer distance connections, very transparent. It turns out there is a cost associated with producing and comprehending connections between words which are just not beside each other. The further apart they are, the worse it is, according to, well, we can measure that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And there is a cost associated with that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Sure. And how do you measure it? Oh, well, you can measure it in a lot of ways. The simplest is just asking people to say how good a sentence sounds. That's one way to measure. And you try to triangulate then across sentences and across structures to try to figure out what the source of that is. You can look at...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
reading times in controlled materials, in certain kinds of materials, and then we can measure the dependency distances there. There's a recent study which looked at, we're talking about the brain here, we could look at the language network, okay? We could look at the language network and we could look at the activation in the language network
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And how big the activation is depending on the length of the dependencies. And it turns out in just random sentences that you're listening to, if you're listening to, so it turns out there are people listening to stories here. And the bigger, the longer the dependency is, the stronger the activation in the language network.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so there's some measure, there's a bunch of different measures we could do. That's kind of a neat measure, actually, of actual activations. Activation in the brain.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, it's complicated, but probably it's doable. I would guess it's doable. I tried to do that a while ago, and I was reasonably successful, but for some reason I stopped working on it. on that. I agree with you that it would be nice to figure out. So there's like some way to figure out the cost. I mean, it's complicated. Another issue you raised before was like, how do you measure distance?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Is it words? It probably isn't. Is it part of the problem? Is that some words matter more than others and probably, you know, meaning like nouns might matter depending, and then it maybe depends on which kind of noun. Is it a noun we've already introduced or a noun that's already been mentioned? Is it a pronoun versus a name? Like all these things probably matter.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So probably the simplest thing to do is just like, oh, let's forget about all that and just think about words or morphemes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I sort of accidentally decided after I finished my undergraduate degree, which was computer science and math in Canada and Queen's University, I decided to go to grad school. That's what I always thought I would do. And I went to Cambridge, where they had a master's program in computational linguistics. And I hadn't taken a single language class before.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I think it's an exponential. So we think it's probably an exponential such that the longer the distance, the less it matters. And so then it's the sum of those is my... That was our best guess a while ago. So you've got a bunch of dependencies. If you've got a bunch of them that are being connected at some point, that's...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
At the ends of those, the cost is some exponential function of those, is my guess. Because the reason it's probably an exponential is it's not just the distance between two words. Because I can make a very, very long subject, verb dependency, by adding lots and lots of noun phrases and prepositional phrases, and it doesn't matter too much.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's when you do nested, when I have multiple of these, then things go really bad, go south.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, that's probably a function of the memory here is the access, is trying to find those earlier things. It's kind of hard to figure out what was referred to earlier. Those are those connections. That's the sort of notion of working, as opposed to a storage-y thing, but trying to connect things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
retrieve retrieve those earlier words depending on what was in between and then then we're talking about interference of similar things in between that's the right theory probably has that kind of notion and it is an interference of similar and so i i'm dealing with an abstraction over the right theory which is just you know let's count words it's not right but it's close and then maybe you're right though there's some sort of um an exponential or something on on the on the
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
to figure out the total so we can figure out a function for any given, for any given sentence in any given language. But, you know, it's funny, you know, people haven't done that too much, which I do think is, I'm interested that you find that interesting. I really find that interesting. And a lot of people haven't found it interesting.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I don't know why I haven't got people to want to work on that. I really like that too.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, that's why I like it too. It's so simple.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And yet it explains some very complicated phenomena. If I write these very complicated sentences, it's kind of hard to know why they're so hard. And you can like, oh, nail it down. I can give you a math formula for why each one of them is bad and where. And that's kind of cool. I think that's very neat.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No, no, simple answer is no, that does, there's probably things you can do in that kind of direction.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We might, you know, we're going to talk about legalese at some point. And so maybe we'll talk about that kind of thinking later. with applied to legalese.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Oh, well, legalese is what you think it is. It's just any legal language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So I'm just talking about language in laws and language in contracts. So the stuff that you have to run into, we have to run into every other day or every day, and you skip over because it reads poorly. Or partly it's just long, right? There's a lot of text there that we don't really want to know about.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
All I'd taken was CS, computer science, math classes, pretty much, mostly, as an undergrad. And I just thought, oh, this was an interesting thing to do for a year, because it was a single-year program. And then I end up spending my whole life doing it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But the thing I'm interested in – so I've been working with this guy called Eric Martinez, who is a – he was a lawyer – who was taking my class. I was teaching a psycholinguistics lab class, and I have been teaching it for a long time at MIT, and he was a law student at Harvard.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And he took the class because he had done some linguistics as an undergrad, and he was interested in the problem of why legalese Sounds hard to understand. So why is it hard to understand and why do they write that way if it is hard to understand? It seems apparent that it's hard to understand. The question is, why is it? And so we didn't know. And we did an evaluation of a bunch of contracts.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Actually, we just took a bunch of random contracts. Because I don't know, you know, there's contracts and laws might not be exactly the same, but contracts are kind of the things that most people have to deal with most of the time. And so that's kind of the most common thing that humans have, like, that adults in our industrialized society have to deal with a lot. And so that's what we pulled.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And we didn't know what was hard about them, but it turns out that the way they're written is very center-embedded, has nested structures in them. So it has low-frequency words as well. That's not surprising. Lots of texts have low-frequency. It does have surprising, slightly lower-frequency words than other kinds of control texts, even sort of academic texts. Legalese is even worse.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It is the worst that we were able to find.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, you know, it's interesting. Now you're getting at why. And so now you're saying they're doing it intentionally. I don't think they're doing it intentionally. It's an emergent phenomenon. Yeah, yeah, yeah. We'll get to that. We'll get to that. But we wanted to see why. So we see what first. Because it turns out that we're not the first to observe that legalese is weird.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Like back to Nixon had a Plain Language Act in 1970, and Obama had one. And boy, a lot of presidents have said, oh, we've got to simplify legal language, must simplify it. But if you don't know how it's complicated, it's not easy to simplify it. You need to know what it is you're supposed to do before you can fix it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so you need a psycholinguist to analyze the text and see what's wrong with it before you can fix it. You don't know how to fix it. How am I supposed to fix something? I don't know what's wrong with it. And so what we did was just, that's what we did. We figured out, okay, we just took a bunch of contracts, had people, and we encoded them for a bunch of features.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so another feature, one of them was center embedding. And so that is basically how often a person
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
a clause would intervene between a subject and a verb for example that's one kind of a center embedding of a clause okay and turns out they're massively center embedded like so I think in random contracts and in random laws I think you get about 70% or 80 something 70% of sentences have a center embedded clause which is insanely high. If you go to any other text, it's down to 20% or something.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's so much higher than any control you can think of, including you think, oh, people think, oh, technical academic text. No, people don't write center-embedded sentences in technical academic text. I mean, they do a little bit, but it's on the 20%, 30% realm as opposed to 70%. And so there's that, and there's low-frequency words. And then people, oh, maybe it's passive.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
People don't like the passive. Passive, for some reason, the passive voice in English has a bad rap, and I'm not really sure where that comes from. And there is a lot of passive. There's much more passive voice in legalese than there is in other texts.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No, no, no, no. Those are separate. Those are separate.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, sucks is different. That's a judgment on passive. Yeah, yeah, yeah. Drop the judgment. It's just like, these are frequent. These are things which happen in legalese text. Then we can ask. The dependent measure is how well you understand those things with those features. And it turns out the passive makes no difference.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So it has a zero effect on your comprehension ability, on your recall ability. Nothing at all. It has no effect. The words matter a little bit. Low frequency words are going to hurt you in recall and understanding. But what really hurts is the center of betting. That kills you. That slows people down. That makes them very poor at understanding.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They can't recall what was said as well, nearly as well. And we did this not only on lay people. We did it on a lot of lay people. We ran it on 100 lawyers. We recruited lawyers from a wide range of... of sort of different levels of law firms and stuff. And they have the same pattern. So they also, like when they did this, I did not know what happened.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I thought maybe they could process, they're used to legalese, they can process it just as well as if it was normal. No, no, they're much better than laypeople. So they can much better recall, much better understanding, but they have the same main effects as laypeople, exactly the same. So they also much prefer the non-center. So we constructed non-center embedded versions of each of these.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
As an engineer, I'd say, I mean, to be frank, I had taken an AI class, I guess it was 83 or 84, 85, somewhere 84 in there a long time ago. And there was a natural language section in there. And it didn't impress me. I thought there must be more interesting things we can do. It didn't seem very, it seemed just a bunch of... to me. It didn't seem like a real theory of things in any way.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We constructed versions which have... higher frequency words in those places, and we un-passivized. We turned them into active versions. The passive-active made no difference. The words made a little difference, and the un-center embedding makes big differences in all the populations. Un-center embedding.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, yeah. So there's automatic parsers for English, which are pretty good. And they can detect center embedding. Oh, yeah. Or I guess nesting. Perfectly. Yeah, pretty much.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, we are in this case, in these cases. But long dependencies, they're highly correlated.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Can I read a sentence for you from these things? I mean, this is just like one of the things that, this is just typical.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So here we go. Because in the event that any payment or benefit by the company, all such payments and benefits, including the payments and benefits under Section 3A hereof, being here and after referred to as a total payment, would be subject to the excise tax, then the cash severance payments shall be reduced. So that's something we pulled from a regular text, from a contract. Wow.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And the center-embedded bit there is just, for some reason, there's a definition. They throw the definition of what... payments and benefits are in between the subject and the verb. How about don't do that? How about put the definition somewhere else as opposed to in the middle of the sentence? And so that's very, very common, by the way. That's what happens.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You just throw your definitions, you use a word, a couple words, and then you define it, and then you continue the sentence. Like, just don't write like that. And you ask, so then we asked lawyers, we thought, oh, maybe lawyers like this. Lawyers don't like this. They don't like this. They don't want to write like this.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We asked them to rate materials which are with the same meaning, with un-centribed and centribed, and they much preferred the un-centribed versions.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, and we asked them, would you hire someone who writes like this or this? We asked them all kinds of questions, and they always preferred the less complicated version, all of them. So I don't even think they want it this way. Yeah, but how did it happen? How did it happen? That's a very good question. And the answer is, I still don't know.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, our best theory at the moment is that there's actually some kind of a performative meaning in the center embedding in the style which tells you it's legalese. We think that that's the kind of a style which tells you it's legalese. Like, that's a reasonable guess. And maybe it's just... So, for instance, if you're... Like, it's like... So we kind of call this the magic spell hypothesis.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So when you tell someone to put a magic spell on someone, what do you do? People know what a magic spell is and they do a lot of rhyming. That's kind of what people will tend to do. They'll do rhyming and they'll do some kind of poetry kind of thing.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah. And maybe there's a syntactic sort of reflex here of a magic spell, which is centromedding. And so that's like, oh, it's trying to tell you this is something which is true, which is what the goal of law is, right? It's telling you something that... we want you to believe as certainly true, right? That's what legal contracts are trying to enforce on you, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so maybe that's like a form which has, this is like an abstract, very abstract form, syndrome betting, which has a meaning associated with it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That was one of our working hypotheses. We just couldn't find any evidence of that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so I just thought this seemed like an interesting area where there wasn't enough good work.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I'm suspicious as well. I'm still suspicious. And I hear what you're saying. It could be kind of no individual, and even average of individuals, it could just be a few bad apples in a way, which are driving the effect in some way.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But it is kind of interesting that among our hundred lawyers, they did not...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's the best thing. They really didn't like it.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But they had the same difference.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Exact same difference. But they wanted it fixed. So they also... And so that gave us hope that because it actually isn't very hard to construct a material which is un-center-embedded and has the same meaning, it's not very hard to do. Just basically in that situation, you're just putting definitions outside of the subject-verb relation in that particular example.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And that's pretty general, what they're doing, is just throwing stuff in there which you didn't have to put in there. There's extra words involved. Typically, you may need a few extra words to refer to the things that you're defining outside in some way. Because if you only use it in that one sentence, then there's no reason to introduce extra terms.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So we might have a few more words, but it'll be easier to understand. So I have hope that now that maybe we can make legalese less... less convoluted in this way.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
He makes Eric. Martinez is the guy you should really put in there. I mean, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's fascinating.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So that's about communication. And so this is going back to Shannon. So Shannon, Claude Shannon was a... student at MIT in the 40s. And so he wrote this very influential piece of work about communication theory or information theory. And he was interested in human language, actually. He was interested in this problem of communication, of getting a message from my head to your head.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And he was concerned or interested in What was a robust way to do that? And so assuming we both speak the same language, we both already speak English, whatever the language is, we speak that. What is a way that I can say the language so that it's most likely to get the signal that I want to you? And then the problem there
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
in the communication is the noisy channel, is that there's a lot of noise in the system. I don't speak perfectly. I make errors. That's noise. There's background noise. You know that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Literal background noise. There is white noise in the background or some other kind of noise. There's some speaking going on that you're at a party. That's background noise. You're trying to hear someone. It's hard to understand them because there's all this other stuff going on in the background.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then there's noise on the receiver side, so that you have some problem maybe understanding me for stuff that's just internal to you in some way. So you've got some other problems, whatever, with understanding for whatever reasons. Maybe you've had too much to drink. Who knows why you're not able to pay attention to the signal? So that's the noisy channel.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so that language, if it's a communication system, we are trying to optimize in some sense the passing of the message from one side to the other. One idea is that maybe aspects of word order, for example, might have optimized in some way to make language a little more easy to be passed from speaker to listener. So Shannon's the guy that did this stuff way back in the 40s.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
I mean, I probably did, but I wasn't as interested in it. I was trying to do the easier problems first, the ones I could, thought maybe were handleable, which seems like the syntax is easier, which is just the forms as opposed to the meaning. When you're starting to talk about the meaning, that's a very hard problem, and it still is a really, really hard problem. But the forms is easier.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's very interesting. Historically, he was interested in working in linguistics. He was at MIT, and this was his master's thesis of all things. It's crazy how much he did for his master's thesis in 1948, I think, or 49, something. And he wanted to keep working in language, and it just wasn't a popular thing. communication as a reason, a source for what language was, wasn't popular at the time.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So Chomsky was moving in there, and he just wasn't able to get a handle there, I think. And so he moved to Bell Haps and worked on communication from a mathematical point of view and did all kinds of amazing work. And so he's just- More on the signal side versus the language side. Yeah, mm-hmm.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
He was interested in that. His examples in the 40s are very language-like things. We can show that there's a noisy channel process going on in When you're listening to me, you know, you can often sort of guess what I meant by what I, you know, what you think I meant given what I said.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I mean, with respect to sort of why language looks the way it does, we might, there might be sort of, as I alluded to, there might be ways in which word order is somewhat optimized for, because of the noisy channel in some way.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah. Well, dependency length is really about memory, right? I think that's like about sort of what's easier or harder to produce in some way. And these other ideas are about sort of robustness to communication. So the problem of potential loss of signal due to noise. So there may be aspects of word order, which is somewhat optimized for that. And we have this one guess in that direction.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
These are kind of just so stories, I have to be pretty frank. They're not I can't show this is true. All we can do is look at the current languages of the world. We can't see how languages change or anything because we've got these snapshots of a few hundred or a few thousand languages. We can't do the right modifications to test these things experimentally.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know, so just take this with a grain of salt, okay, from here, this stuff. The dependency stuff, I'm much more solid on. I'm like, here's what the lengths are, and here's what's hard, and here's what's easy, and this is a reasonable structure. I think I'm pretty reasonable. Here's like, why does the word order look the way it does? We're now into shaky territory, but it's kind of cool. Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes, yes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We think that there's at least three different kinds of things going on there. And we probably don't want to treat them all as the same. And so I think the right model, a better model of a noisy channel would have three different sources of noise, which are background noise, speaker-inherent noise, and listener-inherent noise. And those are all different things.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so I thought at least figuring out the forms of human language, which sounds really hard, but is actually maybe more tractable.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, how about just form still though? Like just what language you know? Like, so how well you know those languages?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
language and so if it's second language for you versus first language and in how maybe what other languages you know these are still just form stuff and that's like potentially very informative and and you know how old you are these things probably matter right so like a child learning a language is is a you know as a noisy representation of english grammar uh you know depending on how old they are so maybe when they're six they're perfectly formed but
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, well, all the languages in the world's language, none is, right now, we know is any better than any other with respect to sort of optimizing dependency lengths, for example. They're all kind of do it, do it well. They all keep low. So I think of every human language as some kind of an optimization problem. a complex optimization problem to this communication problem.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so they've like, they've solved it. They know they're just sort of noisy solutions to this problem of communication. There's just so many ways you can do this.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They're probably less for communication. And learning. So yes, one of the factors, which is, yeah, so learning is messing this up a bit. And so, so for example, if it were just about minimizing dependency lengths and that was all that matters, you know, then we, you know, so then, then we might, find grammars which didn't have regularity in their rules.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But languages always have regularity in their rules. So what I mean by that is that if I wanted to say something to you in the optimal way to say it, what really mattered to me, all that mattered was keeping the dependencies as close together as possible, then I would have a very lax set of phrase structure or dependency rule. I wouldn't have very many of those. I would have very little of that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And I would just put the words as close, the things that refer to the things that are connected right beside each other. But we don't do that. There are word order rules. And depending on the language, they're more and less strict. So you speak Russian, they're less strict than English. English has very rigid word order rules. We order things in a very particular way. And so why do we do that?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's probably not about communication. That's probably about learning. Then we're talking about learning. It's probably easier to learn Regular things, things which are very predictable and easy to... So that's probably about learning, is our guess, because that can't be about communication.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, if it were just a communication, then we should have languages which have very, very free word order, and we don't have that. We have freer, but not free... Like, there's always...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah. I think that's what they're good at, is form. Exactly. And that's why they're good, because they can do form. Meaning's hard.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yes, that's true for a second language.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But that depends on what you started with.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
so it really depends on how close that second language is to the first language you've got and so yes it's very very hard to learn Arabic if you've started with English or it's hard to learn Japanese or if you've started with Chinese I think is the worst there's like Defense Language Institute in the United States has like a list of how hard it is to learn what language from English I think Chinese is the worst you're saying babies don't care
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
No. There's no evidence that there's anything harder or easier about any language learned. By three or four, they speak that language. And so there's no evidence of anything harder or easier about any human language. They're all kind of equal.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
The answer is I don't know, of course. I'm an engineer at heart, I guess, and I think it's fine to postulate that a lot of it's learned. I'm guessing that a lot of it's learned. I think the reason Chomsky went with the innateness is because he hypothesized movement in his grammar. He was interested in grammar and movement's hard to learn. I think he's right.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Movement is a hard, it's a hard thing to learn, to learn these two things together and how they interact. And there's like a lot of ways in which you might generate exactly the same sentences and it's like really hard. And so he's like, oh, I guess it's learned. Sorry, I guess it's not learned, it's innate.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And if you just throw out the movement and just think about that in a different way, then you get some messiness But the messiness is human language, which it actually fits better. That messiness isn't a problem. It's actually a valuable asset of the theory. And so I think I don't really see a reason to postulate much innate structure.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And that's kind of why I think these large language models are learning so well, is because I think you can learn the form, the forms of human language from the input. I think it's likely to be true.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It doesn't have to be innate. So like lots of stuff is modular in the brain that's learned. It doesn't have to, you know, so there's something called the visual word form area in the back. And so it's in the back of your head near the, you know, the visual cortex. Okay. And that is very specialized language, sorry, very specialized brain area, which does a
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
visual word processing if you read, if you're a reader, okay? If you don't read, you don't have it, okay? Guess what? You spend some time learning to read and you develop that brain area, which does exactly that. And so the modularization is not evidence for innateness. So the modularization of a language area doesn't mean we're born with it. We could have easily learned that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
We might have been born with it. We just don't know at this point. We might very well have been born with this left-lateralized area. I mean, there's a lot of other interesting components here, features of this kind of argument. So some people get a stroke or something goes really wrong on the left side, where the language area would be, and that isn't there. It's not available.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it develops just fine on the right. So it's not about the left. It goes to the left. This is a very interesting question. Why are any of the brain areas the way that they are and how did they come to be that way? There's these natural experiments which happen where people get these strange events in their brains at very young ages which wipe out sections of their brain.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And they behave totally normally and no one knows anything was wrong. And we find out later, because they happen to be accidentally scanned for some reason, it's like, what happened to your left hemisphere? It's missing. There's not many people who've missed their whole left hemisphere, but they'll be missing some other section of their left or their right.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And they behave absolutely normally, we'd never know. So that's like a very interesting, you know, current research hypothesis. This is another project that this person, F. Fedorenko, is working on. She's got all these people contacting her because she's scanned some people who have been missing sections. One person missed a section of her brain and was scanned in her lab.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And she happened to be a writer for the New York Times. And there was an article in the New York Times just about the scanning procedure and what about what might be learned by sort of the general process of MRI and language. And because she's writing for the New York Times, all these people started writing to her who also have similar kinds of deficits because they've been accidentally
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
to scan for some reason and found out they're missing some section. They volunteer to be scanned.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Natural experiments. They're kind of messy, but natural experiments. It's kind of cool. She calls them interesting brains.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right. Absolutely.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right. So, you know, he's basically a philosopher, philosopher of language in a way, thinking about these things. It's a fine thought. You can't test it in his methods. You can't do a thought experiment to figure that out. You need a scanner. You need brain damage people. You need something, you need ways to measure that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And that's what, you know, fMRI offers as a, and, you know, patients are a little messier. fMRI is pretty unambiguous, I'd say. It's like very unambiguous. There's no way to say that the language network is doing any of these tasks. There's like, you should look at those data. It's like, there's no chance that you can say that those networks are overlapping. They're not overlapping.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They're just like completely different. And so, you know, you can always make, you know, it's only two people, it's four people or something for the patients. And there's something special about them. We don't know. But these are just random people and with lots of them and you find always the same effects and it's very robust, I'd say.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So that term weird is from Joe Henrich. He's at Harvard. He's a Harvard evolutionary biologist. And so he works on lots of different topics. And he basically was pushing that observation that we should be careful about the inferences we want to make when we're talking in psychology or social... Yeah, mostly in psychology, I guess, about...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Humans, if we're talking about undergrads at MIT and Harvard, those aren't the same, right? These aren't the same things. And so if you want to make inferences about language, for instance, there's a lot of other kinds of languages in the world than English and French and Chinese. And so maybe for language, we care about how culture...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
What I find beautiful about human language is some of the generalizations that happen across the human languages, within and across a language. So let me give you an example of something which I find kind of remarkable. That is if a language, if it has... a word order such that the verbs tend to come before their objects. And so that's like English does that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Because cultures can be very... I mean, of course, English and Chinese cultures are very different, but hunter-gatherers are much more different in some ways. And so if culture has an effect on what language is, then we kind of want to look there as well as looking... It's not like the industrialized cultures aren't interesting. Of course they are.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But we want to look at non-industrialized cultures as well. And so I've worked with two. I've worked with the Chimani, which are in Bolivia... And there's Amazon, both in the Amazon in these cases. And there are so-called farmer foragers, which is not hunter-gatherers.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's sort of one up from hunter-gatherers in that they do a little bit of farming as well, a lot of hunting as well, but a little bit of farming. And the kind of farming they do is the kind of farming that I might do if I ever were to grow like tomatoes or something in my backyard. So it's not like big field farming, it's just farming for a family, a few things you do that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so that's the kind of farming they do. And the other group I've worked with are the Pirahã, which are also in the Amazon and happen to be in Brazil.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And that's with a guy called Dan Everett, who is a linguist anthropologist who actually lived and worked in the, I mean, he was a missionary, actually, initially, back in the 70s, working with, trying to translate languages so they could teach them the Bible, teach them Christianity. What can you say about that? Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So the two groups I've worked with, the Chimane and the Piraha, are both isolate languages, meaning there's no known connected languages at all. They're just like on their own. There's a lot of those. And most of the isolates occur in the Amazon or in Papua New Guinea and these places where the world has sort of stayed still for a long enough time. And so there aren't earthquakes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There are certainly no earthquakes in the Amazon jungle. And the climate isn't bad. So you don't have droughts. And so in Africa, you've got a lot of moving of people because there's drought problems. And so they get a lot of language contact. when you have, when people have to, you gotta move because you got no water, then you gotta get going.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then you run into contact with other tribes, other groups. In the Amazon, that's not the case. And so people can stay there for hundreds and hundreds and probably thousands of years, I guess. And so these groups have, the Chimane and the Piraha are both isolates in that. And they can just, I guess they've just lived there for ages and ages with minimal contact with other outside groups.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Um, and so I, I mean, I'm interested in them because they are, I mean, I, you know, I, in these cases, I'm interested in their words. I would love to study their syntax, their orders of words, but I'm mostly just interested in how languages, you know, are connected to, um, their cultures in this way. And so with the piraha, the most interesting, I was working on number there, number information.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so the basic idea is I think language is invented, right? That's what I get from the words here is that I think language is invented. We talked about color earlier. It's the same idea so that what you need to talk about with someone else is what you're going to invent words for, okay? And so we invent labels for colors that I need, not that I...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
that I can see, but that, but the things I need to tell you about so that I can get objects from you or get you to give me the right objects. And I just don't need a word for teal or, or a word for aquamarine in, in the, in the Amazon jungle for the most part, because I don't have two things which differ on those colors. I just don't have that.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So we have the first, the subject comes first in a simple sentence. So I say, you know, the dog chased the cat or Mary kicked the ball. So the subject's first. And then after the subject, there's the verb. And then we have objects. All these things come after in English. So it's generally a verb. And most of the stuff that we want to say comes after the subject. It's the objects.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so, and so numbers are really another fascinating source of information here where you might, you know, Naively, I certainly thought that all humans would have words for exact counting. And the piraha don't. So they don't have any words for even one. There's not a word for one in their language. And so there's certainly not a word for two, three, or four.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So that kind of blows people's minds off.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's pretty weird.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You just don't. And so that's just not a thing you can possibly ask in the Puraha. It's not possible. There's no words for that. So here's how we found this out, okay? So it was thought to be a one, two, many language. There are three words, four quantifiers for sets. And people had thought that those meant one, two, and many. but what they really mean is few, some, and many. Many is correct.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's few, some, and many. And so the way we figured this out, and this is kind of cool, is that we gave people, we had a set of objects, okay? These were having to be spools of thread. It doesn't really matter what they are. Identical objects. And when I sort of start off here, I just give you one of those and say, what's that? Okay, so you're a piano speaker and you tell me what it is.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then I give you two and say, what's that? And nothing's changing in the set except for the number, okay? And then I just ask you to label these things. We just do this for a bunch of different people. And frankly, I did this task.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it's a weird, it's a little bit weird. So they say the word that we thought was one, it's few, but for the first one. And then maybe they say few or maybe they say some for the second. And then for the third or the fourth, they start using the word many for the set. And then five, six, seven, eight, I go all the way to 10. Okay. And it's always the same word.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And they look at me like I'm stupid because they told me what the word was for six, seven, eight. And I'm going to continue asking them at nine and 10. I'm like, I'm sorry. I just, I just, they understand that I want to know their language. That's the point of the task is I'm trying to learn their language. And so that's okay.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But it does seem like I'm a little slow because they already told me what the word for many was, five, six, seven, and I keep asking. So it's a little funny to do this task over and over. We did this with a guy called... Dan was our translator. He's the only one who really speaks Piraha fluently. He's a good bilingual for a bunch of languages, but also English and Piraha.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then a guy called Mike Frank was also a student with me down there. He and I did these things. And... So you do that, okay? And everyone does the same thing. All, you know, we asked like 10 people and they all do exactly the same labeling for one up. And then we just do the same thing down on like random order. Actually, we do some of them up, some of them down first, okay?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And so we do, instead of one to 10, we do 10 down to one. And so I give them 10, nine, at eight, they start saying the word. for some. And then when you get to four, everyone is saying the word for few, which we thought was one. So it's like the context determined what word, what that quantifier they used was. So it's not a count word. They're not count words. They're just approximate words.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, I don't know what that means. That's going to depend on the context. I think it's true in English too, right? If you ask an English person what a few is, I mean, that's going to depend completely on the context.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, it might be. It might still be there, yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
There's a lot of things we want to say to come after. And there's a lot of languages like that. About 40% of the languages of the world look like that. They're subject-verb-object languages. And then these languages tend to have prepositions, these little markers on the nouns that connect words. Nouns to other nouns or nouns to verbs.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, so the words aren't there. And so then we do these other things. Well, if they don't have the words... Can they do exact matching kinds of tasks? Can they even do those tasks? And the answer is sort of yes and no. And so, yes, they can do them. So here's the tasks that we did. We put out those spools of thread again. Okay, so I'm going to put like three out here.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And then we gave them some objects. And those happen to be uninflated red balloons. It doesn't really matter what they are. It's just they're a bunch of exactly the same thing. And it was easy to put down right next to these objects. spools of thread, okay? And so then I put out three of these, and your task was to just put one against each of my three things. And they could do that perfectly.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So, I mean, I would actually do that. It was a very easy task to explain to them because I did this with this guy, Mike Frank, and he would be my I'd be the experimenter telling him to do this and showing him to do this. And then we just like, just do what he did. You'll copy him. All we had to, I didn't have to speak Piraha except for know what, copy him.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Like do what he did is like all we had to be able to say. And then they would do that just perfectly. And so we'd move it up. We'd do some sort of random number of items up to 10 and they basically do perfectly on that. They never get that wrong. I mean, that's not a counting task, right? That is just a match.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You just put one against, it doesn't matter how many, I don't need to know how many there are there to do that correctly. And they would make mistakes, but very, very few and no more than MIT undergrads. I'm just going to say, like, there's no, these are low stakes. So, you know, you make mistakes.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's right. Not at all. Okay. And so that's our control. And this guy had gone down there before and said that they couldn't do this task, but I just don't know what he did wrong there because they can do this task perfectly well. And I can train my dog to do this task. So of course they can do this task. And so it's not a hard task.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But the other task that was sort of more interesting is like, so then we do a bunch of tasks where you need... some way to encode the set. So like one of them is just, I just put a opaque sheet in front of the things. I put down a bunch, a set of these things and I put an opaque sheet down. And so you can't see them anymore. And I tell you, do the same thing you were doing before, right?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You know, and it's easy if it's two or three, it's very easy. But if I don't have the words for eight, it's a little harder. Like maybe, you know, with practice went, well, no,
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
For us, it's easy because we just count them. It's just so easy to count them. But they can't count them because they don't count. They don't have words for this thing. And so they would do approximate. It's totally fascinating. So they would get them approximately right after four or five. Basically, you always get four right, three or four. That's something we can visually see.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
But after that, you kind of have... It's an approximate number. And so then... And there's a bunch of tasks we did and they all failed as... I mean, failed. They did approximate after five on all those tasks. And it kind of shows that the words... You kind of need the words to be able to do these kinds of tasks.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Yeah, here language is just, is the words. Here is the words. Like the words for exact count is the limiting factor here. They just don't have them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
So a preposition like in or on or of or about, I say I talk about something. The something is the object of that preposition. These little markers come, just like verbs, they come before their nouns. So now we look at other languages like Japanese or Hindi. These are so-called verb final languages. Those...
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
That's going to be true, yeah. So it's probably, I mean, we don't know. This is one of those problems with the snapshot of just current languages is that we don't know what causes a culture to discover slash invent a counting system. But the hypothesis is, the guess out there is something to do with farming. So if you have a bunch of goats, And you want to keep track of them.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And say you have 17 goats and you go to bed at night and you get up in the morning. Boy, it's easier to have a count system to do that. That's an abstraction over a set. People often ask me when I talk to them about this kind of work, they say, well, don't these Purahas, don't they have kids? Don't they have a lot of children? I'm like, yeah, they have a lot of children. And they do.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
They often have families of three or four or five kids. And they go, well, don't they need the numbers to keep track of their kids? And I always ask the person who says this, like, do you have children? And the answer is always no, because that's not how you keep track of your kids. You care about their identities. It's very important to me when I go, I think I have five children.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It doesn't matter which, yeah, it matters which five. It's like... If you replaced one with someone else, I would care. A goat, maybe not, right? That's the kind of point. It's an abstraction. Something that looks very similar to the one wouldn't matter to me, probably.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You're absolutely right. But I'm saying it is an abstraction such that you don't have to care about their identities to do this thing fast. That's the hypothesis, not mine. From... Anthropologists are guessing about where words for counting came from is from farming maybe. Yeah.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Well, my guess is The function of a language is to do something in a community. I mean, unless there's some function to that language in the community, it's not going to survive. It's not going to be useful. So here's a great example. Language death is super common, okay? Languages are dying all around the world. And here's why they're dying.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And it's like, yeah, I see this in, you know, it's not happening right now in either the Chimane or the Piraha, but it probably will. And so there's a neighboring group called Mositan, which is, I said that it's isolated. It's actually, there's a duel. There's two of them, okay? So it's actually, there's two languages which are really close, which are Mositan and
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
and Chimane, which are unrelated to anything else. And Mositán is unlike Chimane in that it has a lot of contact with Spanish and it's dying. So that language is dying. The reason it's dying is there's not a lot of value for the local people in their native language. So there's much more value in knowing Spanish because they want to feed their families. And how do you feed your family?
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
You learn Spanish so you can make money, so you can get a job and do these things, and then you make money. And so they want Spanish things. And so most Itan is in danger and is dying. And that's normal. And so basically the problem is that people... The reason we learn language is to communicate. And we use it to make money and to do whatever it is to feed our families.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
And if that's not happening, then it won't take off. It's not like a game or something. This is like something we use. Like, why is English so popular? It's not because it's an easy language to learn. Maybe it is. I don't really know. But that's not why it's popular.
Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
It's all it is. It's all about money. And so there's a motivation to learn Mandarin. There's a motivation to learn Spanish. There's a motivation to learn English. These languages are very valuable to know because there's so, so many speakers all over the world.