Lex Fridman Podcast
#426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Wed, 17 Apr 2024
Edward Gibson is a psycholinguistics professor at MIT and heads the MIT Language Lab. Please support this podcast by checking out our sponsors: - Yahoo Finance: https://yahoofinance.com - Listening: https://listening.com/lex and use code LEX to get one month free - Policygenius: https://policygenius.com/lex - Shopify: https://shopify.com/lex to get $1 per month trial - Eight Sleep: https://eightsleep.com/lex to get special savings Transcript: https://lexfridman.com/edward-gibson-transcript EPISODE LINKS: Edward's X: https://x.com/LanguageMIT TedLab: https://tedlab.mit.edu/ Edward's Google Scholar: https://scholar.google.com/citations?user=4FsWE64AAAAJ TedLab's YouTube: https://youtube.com/@Tedlab-MIT PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (10:53) - Human language (14:59) - Generalizations in language (20:46) - Dependency grammar (30:45) - Morphology (39:20) - Evolution of languages (42:40) - Noam Chomsky (1:26:46) - Thinking and language (1:40:16) - LLMs (1:53:14) - Center embedding (2:19:42) - Learning a new language (2:23:34) - Nature vs nurture (2:30:10) - Culture and language (2:44:38) - Universal language (2:49:01) - Language translation (2:52:16) - Animal communication
The following is a conversation with Edward Gibson, or Ted, as everybody calls him. He is a psycholinguistics professor at MIT. He heads the MIT Language Lab that investigates why human languages look the way they do, the relationship between cultural language and how people represent, process, and learn language.
Also, he should have a book titled Syntax, A Cognitive Approach, published by MIT Press, coming out this fall. So look out for that. And now a quick few second mention of each sponsor. Check them out in the description. It's the best way to support this podcast. We got Yahoo Finance for basically everything you've ever needed if you're an investor. Listening for listening to research papers.
Policy Genius for insurance. Shopify for selling stuff online. And Eight Sleep for naps. Choose wisely, my friends. Also, if you want to work with our amazing team or just get in touch with me, go to lexfriedman.com contact. And now, onto the full ad reads. As always, no ads in the middle. I try to make this interesting, but if you must skip friends, please still check out the sponsors.
I enjoyed their stuff, maybe you will too. This episode is brought to you by Yahoo Finance, a new sponsor. And they got a new website that you should check out. It's a website that provides financial management reports, information, and news for investors. Yahoo itself has been around forever. Yahoo Finance has been around forever. I don't know how long, but it must be over 20 years.
It survived so much. It evolved rapidly and quickly, adjusting, evolving, improving, all of that. The thing I use it for now is there's a portfolio that you can add your account to. Ever since I had zero money, I used, boy, I think it's called TD Ameritrade. I still use that same thing, just getting a basic mutual fund account.
And I think TDM MetaTrade got bought by Charles Schwab or acquired or merged. I don't know. I don't know how these things work. All I know is that Yahoo Finance can integrate that and just show me everything I need to know about my quote-unquote portfolio. I don't have anything interesting going on, but it is still good to kind of monitor it, to stay in touch.
Now, a lot of people I know have a lot more interesting stuff going on investment-wise, so all of that could be easily integrated into Yahoo Finance, and you can look at all that stuff, the charts, blah, blah, blah. It looks beautiful and sexy and just helps you be informed.
Now, that's about your own portfolio, but then also for the entirety of the finance information for the entirety of the world. That's all there. the big news, the analysis of everything that's going on, everything like that. And I should also mention that I would like to do more and more financial episodes. I've done a couple of conversations with Ray Dalio.
A lot of that is about finance, but some of that is about sort of geopolitics and the bigger context of finance. I just recently did a conversation with Bill Ackman, very much about finance. And I did a series of conversations on cryptocurrency. Lots and lots of brilliant people, Michael Saylor, so on.
Charles Hoskinson, Vitalik, I mean just lots of brilliant people in that space thinking about the future of money, future of finance. Anyway, you can keep track of all of that with Yahoo Finance. For comprehensive financial news and analysis, go to yahoofinance.com. That's yahoofinance.com. This episode is also brought to you by Listening, an app that allows you to listen to academic papers.
It's a thing I've always wished existed, and I always kind of suspected it's very difficult to pull off, but these guys pulled it off. Basically, it's any kind of formatted text brought to life through audio. Now for me, the thing I care about most, and I think that's at the foundation of listening, is academic papers.
So I love to read academic papers, and there's several levels of rigor in the actual reading process, but listening to them, especially after I skimmed it, or after I did a deep dive, listening to them is just such a beautiful experience. It solidifies the understanding. It brings to life all kinds of thoughts.
And I'm doing this while I'm cooking, while I'm running, while I'm going to grab a coffee, all that kind of stuff. It does require an elevated level of focus, especially the kind of papers I listen to, which are computer science papers. But you can load in all kinds of stuff. You can do philosophy papers. You can do psychology papers like this. Very topic of linguistics.
I've listened to a few papers on linguistics. I went back to Chomsky and listened to papers. It's great. Papers, books, PDFs, webpages, articles, all that kind of stuff. Even email newsletters. And the voices they got are pretty sexy. It's great. It's pleasant to listen to. I think that's what's ultimately most important is it shouldn't feel like a chore to listen to it. Like I really enjoy it.
Normally you'd get a two week free trial, but listeners of this podcast get one month free. So go to listening.com slash Lex. That's listening.com slash Lex. This episode is brought to you by Policy Genius, a marketplace for insurance, life, auto, home, disability, all kinds of insurance. There's really nice tools for comparison. I'm a big fan of nice tools for comparison.
Like I have to travel to harsh conditions soon, and I have to figure out how I need to update my equipment to make sure it's weatherproof, waterproof even. It's just resilient to harsh conditions. And it would be nice to have sort of comparisons. I have to resort to like Reddit posts or forum posts kind of debating different audio quarters and cabling and microphones and...
and waterproof containers, all that kind of stuff. I would love to be able to do a rigorous comparison of them. Of course, going to Amazon, you get the reviews, and those are actually really, really solid. And so I think Amazon's been the giant gift to society in that way, that you kind of can lay out all the different options and get a lot of structured analysis of how good Amazon is.
this thing is, so Amazon's been great at that. Now, what Policy Genius did is did the Amazon thing, but for insurance, so the tools for comparison is really my favorite thing. It's just really easy to understand. The full marketplace of insurance. With Policy Genius, you can find life insurance policies that start at just $292 per year for $1 million of coverage.
Head to policygenius.com slash Lex or click the link in the description to get your free life insurance quotes and see how much you can save. That's policygenius.com slash Lex. This episode is also brought to you by Shopify, a platform designed for anyone to sell anywhere with a great looking online store.
I'm not name dropping here, but I recently went on a hike with the CEO of Shopify, Toby, he's brilliant. I've been a fan of his for a long time, long before Shopify was a sponsor. I don't even know if he knows that Shopify sponsors this podcast. Now, just to clarify, it really doesn't matter.
Nobody in this world can put pressure on me to have a sponsor or not to have a sponsor or for a sponsor to put pressure on me what I can and can't say. I, when I wake up in the morning, feel completely free to say what I want to say and to think what I want to think. I've been very fortunate in that way in many dimensions in my life.
And I also have always lived a frugal life and a life of discipline, which is where the freedom of speech and the freedom of thought truly comes from. So I don't need anybody. I don't need a boss. I don't need money. I'm free to exist in this world in the way I want. sees right.
Now, on top of that, of course, I'm surrounded by incredible people, many of whom I disagree with and have arguments, so I'm influenced by those conversations and those arguments and I'm always learning, always challenging myself, always humbling myself. I have kind of intellectual humility. I kind of suspect I'm kind of an idiot.
I start my approach to the world of ideas from that place, assuming I'm an idiot and everybody has a lesson to teach me. Anyway, not sure why I got off that tangent, but the hike was beautiful. Nature, friends, is beautiful. Anyway, I have a Shopify store, lexfriedman.com slash store. It's very minimal, which is how I like, I think, most things. If you want to set up a store, it's super easy.
It takes a few minutes. Even I figured out how to do it. Sign up for a $1 per month trial period at shopify.com slash lex. That's all lowercase. Go to shopify.com slash lex to take your business to the next level today. This episode is also brought to you by Eight Sleep, and it's part of the three cover. The source of my escape.
The door, when opened, allows me to travel away from the troubles of the world into this ethereal universe of calmness. A cold bed surface with a warm blanket. a perfect 20 minute nap, and it doesn't matter how dark the place my mind is in, a nap will pull me out, and I see the beauty of the world again. Technologically speaking, a-sleep is just really cool. You can control temperature with a nap.
It's become such an integral part of my life that I've begun to take it for granted. Typical human. So the app controls the temperature. I set it, currently I'm setting it to a negative five. And it's just super nice, cool surface. It's something I really look forward to, especially when I'm traveling. I don't have one of those. It really makes me feel like home.
Check it out and get special savings when you go to asleep.com slash Lex. This is the Lex Freeman Podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Edward Gibson. When did you first become fascinated with human language?
As a kid in school, when we had to structure sentences in English grammar, I found that process interesting. I found it confusing as to what it was I was told to do. I didn't understand what the theory was behind it, but I found it very interesting.
So when you look at grammar, you're almost thinking about it like a puzzle, almost like a mathematical puzzle?
Yeah, I think that's right. I didn't know I was going to work on this at all at that point. I was really just... I was kind of a math geek person, computer scientist. I really liked computer science. And then I found... Language is a neat puzzle to work on from an engineering perspective, actually.
I sort of accidentally decided after I finished my undergraduate degree, which was computer science and math in Canada and Queen's University, I decided to go to grad school. That's what I always thought I would do. And I went to Cambridge, where they had a master's program in computational linguistics. And I hadn't taken a single language class before.
All I'd taken was CS, computer science, math classes, pretty much, mostly, as an undergrad. And I just thought, oh, this was an interesting thing to do for a year, because it was a single-year program. And then I end up spending my whole life doing it.
So fundamentally, your journey through life was one of a mathematician and a computer scientist, and then you kind of discovered the puzzle, the problem of language, and approached it from that angle. To try to understand it from that angle, almost like a mathematician or maybe even an engineer.
As an engineer, I'd say, I mean, to be frank, I had taken an AI class, I guess it was 83 or 84, 85, somewhere 84 in there a long time ago. And there was a natural language section in there. And it didn't impress me. I thought there must be more interesting things we can do. It didn't seem very, it seemed just a bunch of... to me. It didn't seem like a real theory of things in any way.
And so I just thought this seemed like an interesting area where there wasn't enough good work.
Did you ever come across the philosophy angle of logic? So if you think about the 80s with AI, the expert systems where you try to kind of maybe sidestep the poetry of language and some of the syntax and the grammar and all that kind of stuff and go to the underlying meaning that language is trying to communicate and try to somehow compress that in a computer-representable way.
Do you ever come across that in your studies?
I mean, I probably did, but I wasn't as interested in it. I was trying to do the easier problems first, the ones I could, thought maybe were handleable, which seems like the syntax is easier, which is just the forms as opposed to the meaning. When you're starting to talk about the meaning, that's a very hard problem, and it still is a really, really hard problem. But the forms is easier.
And so I thought at least figuring out the forms of human language, which sounds really hard, but is actually maybe more tractable.
So it's interesting. You think there is a big divide. There's a gap. There's a distance between form and meaning. Because that's a question you discuss a lot with LLMs, because they're damn good at form.
Yeah. I think that's what they're good at, is form. Exactly. And that's why they're good, because they can do form. Meaning's hard.
Do you think there's, oh, wow. I mean, it's an open question, right? How close form and meaning are. We'll discuss it, but to me, studying form, maybe it's a romantic notion, gives you, form is like the shadow. of the bigger meaning thing underlying language. Language is how we communicate ideas. We communicate with each other using language.
So in understanding the structure of that communication, I think you start to understand the structure of thought and the structure of meaning behind those thoughts and communication to me. But to you, big gap.
Yeah.
What do you find most beautiful about human language? Maybe the form of human language, the expression of human language.
What I find beautiful about human language is some of the generalizations that happen across the human languages, within and across a language. So let me give you an example of something which I find kind of remarkable. That is if a language, if it has... a word order such that the verbs tend to come before their objects. And so that's like English does that.
So we have the first, the subject comes first in a simple sentence. So I say, you know, the dog chased the cat or Mary kicked the ball. So the subject's first. And then after the subject, there's the verb. And then we have objects. All these things come after in English. So it's generally a verb. And most of the stuff that we want to say comes after the subject. It's the objects.
There's a lot of things we want to say to come after. And there's a lot of languages like that. About 40% of the languages of the world look like that. They're subject-verb-object languages. And then these languages tend to have prepositions, these little markers on the nouns that connect words. Nouns to other nouns or nouns to verbs.
So a preposition like in or on or of or about, I say I talk about something. The something is the object of that preposition. These little markers come, just like verbs, they come before their nouns. So now we look at other languages like Japanese or Hindi. These are so-called verb final languages. Those...
as about maybe a little more than 40%, maybe 45% of the world's languages or more, I mean, 50% of the world's languages are verb final. Those tend to be post positions. Those markers, they have the same kinds of markers as we do in English, but they put them after. So, sorry, they put them first, the markers come first.
So you say, instead of, you know, talk about a book, you say a book about, the opposite order there in Japanese or in Hindi, you do the opposite. And the talk comes at the end. So the verb will come at the end as well. So instead of Mary kicked the ball, it's Mary ball kicked.
And then if it says Mary kicked the ball to John, it's John to, the to, the marker there, the preposition, it's a postposition in these languages. And so the interesting thing, a fascinating thing to me is that within a language that this order aligns. It's harmonic.
And so if it's one or the other, it's either verb initial or verb final, but then you'll have prepositions, prepositions, or postpositions. And that's across the languages that we can look at. We've got around 1,000 languages. There's around 7,000 languages on the earth right now. But we have information about, say, word order on around 1,000 of those, a pretty decent amount of information.
And for those 1,000 which we know about, about 95% fit that pattern. So they will have either verb, it's about half and half, half are verb initial, like English, and half are verb final, like Japanese.
So just to clarify, verb initial is subject, verb, object. That's correct, verb. verb final is still subject, object, verb.
That's correct. Yeah, the subject is generally first.
That's so fascinating. I ate an apple or I apple ate. Yes. Okay, and it's fascinating that there's a pretty even division in the world amongst those, 40, 45%.
Yeah, it's pretty even. And those two are the most common by far. Those two words, the subject tends to be first. There's so many interesting things, but the thing I find so fascinating is there are these generalizations within and across a language. And there's actually a simple explanation, I think, for a lot of that. And that is you're trying to minimize dependencies between words.
That's basically the story, I think, behind a lot of why word order looks the way it is, is we're always connecting. What is the thing I'm telling you? I'm talking to you in sentences. You're talking to me in sentences. These are sequences of words which are connected, and the connections are dependencies between the words.
And it turns out that what we're trying to do in a language is actually minimize those dependency links. It's easier for me to say things if the words that are connecting for their meaning are close together. It's easier for you in understanding if that's also true. If they're far away, it's hard to produce that, and it's hard for you to understand. And the languages of the world,
within a language and across languages fit that generalization. It turns out that having verbs initial and then having prepositions ends up making dependencies shorter. And having verbs final and having postpositions ends up making dependencies shorter than if you cross them. If you cross them, it's possible You can do it. You mean within a language? Within a language, you can do it.
It just ends up with longer dependencies than if you didn't. So languages tend to go that way. They call it harmonic. So it was observed a long time ago, without the explanation, by a guy called Joseph Greenberg, who's a famous typologist from Stanford. He observed a lot of generalizations about how word order works, and these are some of the harmonic generalizations that he observed.
harmonic generalizations about word-to-word. There's so many things I want to ask you. Okay, good. Let me just, sometimes basics. You mentioned dependencies a few times.
Yeah, yeah.
What do you mean by dependencies?
Well, what I mean is in... In language, there's kind of three structures to, three components to the structure of language. One is the sounds. So cat is k, a, and t in English. I'm not talking about that part. I'm talking, then there's two meaning parts. And those are the words. And you were talking about meaning earlier. So words have a form and they have a meaning associated with them.
And so cat is a full form in English and it has a meaning associated with whatever a cat is. And then the combinations of words, that's what I'll call grammar or syntax. And that's like when I have a combination like the cat or two cats, okay? So where I take two different words there and put them together and I get a compositional meaning from putting those two different words together.
And so that's the syntax. And in any sentence or utterance, whatever I'm talking to you, you're talking to me, we have a bunch of words and we're putting together in a sequence. It turns out they are... connected so that every word is connected to just one other word in that sentence. And so you end up with what's called technically a tree.
It's a tree structure where there's a root of that utterance of that sentence. And then there's a bunch of dependents, like branches from that root that go down to the words. The words are the leaves in this metaphor for a tree.
So a tree is also sort of a mathematical construct.
Yeah, yeah. It's a graph theoretical thing. It's a graph theory thing.
Yeah, yeah. So it's fascinating that you can break down a sentence into a tree, and then every word is hanging on to another. It's depending on it.
That's right. And everyone agrees on that. So all linguists will agree with that. Oh, so this is not a controversial thing? That is not controversial.
There's nobody sitting here listening mad at you. I do not think so. I don't think so. Okay. There's no linguist sitting there mad at this.
No, I think in every language, I think everyone agrees that all sentences are trees at some level. Can I pause on that? Sure.
Because to me, just as a layman, it's surprising that you can break down sentences in all languages into a tree.
I think so. I've never heard of anyone disagreeing with that. That's weird. The details of the trees are what people disagree about.
Well, okay, so what's at the root of a tree? How do you construct? How hard is it? What is the process of constructing a tree from a sentence?
Well, this is where, you know, depending on what your... There's different theoretical notions. I'm going to say the simplest thing, dependency grammar. It's like a bunch of people invented this. Tenier was the first French guy back in... I mean, the paper was published in 1959, but he was working on the 30s and stuff.
And it goes back to, you know, philologist Pignini was doing this in ancient India, okay? And so, you know, doing something like this. The simplest thing we can think of is... that there's just connections between the words to make the utterance. And so let's just say I have like two dogs entered a room. Okay, here's a sentence. And so we're connecting two and dogs together.
That's like, there's some dependency between those words to make some bigger meaning. And then we're connecting dogs now to entered, right? And we connect a room somehow to entered. And so I'm going to connect to room and then room back to entered. That's the tree. The root is entered. The thing is like an entering event. That's what we're saying here.
And the subject, which is whatever that dog is, is two dogs, it was. And the connection goes back to dogs, which goes back to, then that goes back to two. I'm just, that's my tree. It starts at entered, goes to dogs, down to two. And on the other side, after the verb, The object, it goes to room, and then that goes back to the determiner or article, whatever you want to call that word.
So there's a bunch of categories of words here we're noticing. So there are verbs. Those are these things that typically mark... They refer to events and states in the world. And there are nouns, which typically refer to people, places, and things is what people say. But they can refer to other more... I think you've heard of events themselves as well. They're marked by...
The category, the part of speech of a word is how it gets used in language. That's how you decide what the category of a word is. Not by the meaning, but how it gets used. How it's used.
What's usually the root? Is it going to be the verb that defines the event?
Usually. Yes, yes.
Okay.
I mean, if I don't say a verb, then there won't be a verb, and so it'll be something else.
What if you're messing... Are we talking about language that's like correct language? What if you're doing poetry and messing with stuff? then rules go out the window, right?
No. You're constrained by whatever language you're dealing with. Probably you have other constraints in poetry. Usually in poetry, there's multiple constraints that you want to... You want to usually convey multiple meanings is the idea, and maybe you have a rhythm or a rhyming structure as well. But you usually are constrained by the rules of your language for the most part. So you don't...
violate those too much. You can violate them somewhat, but not too much. So it has to be recognizable as your language. Like in English, I can't say, dogs two entered room ah. I mean, I meant, you know, two dogs entered a room. And I can't mess with the order of the articles and the nouns. You just can't do that. In some languages, you can mess around with the order of words much more.
I mean, you speak Russian. Mm-hmm. Russian has a much freer word order than English. And so, in fact, you can move around words in, you know, I told you that English has the subject, verb, object word order. So does Russian. But Russian is much freer than English. And so you can actually mess around with the word order.
So probably Russian poetry is going to be quite different from English poetry because the word order is much less constrained.
Yeah, there's a much more extensive culture of poetry throughout the history of the last hundred years in Russia. And I always wondered why that is. But it seems that there's more flexibility in the way the language is used. You're morphing the language easier by altering the words, altering the order of the words, messing with it.
Well, you can just mess with different things in each language. And so in Russian, you have case markers, which are these endings on the nouns, which tell you how it connects, each noun connects to the verb, right? We don't have that in English. And so when I say, Mary kissed John, I don't know who the agent or the patient is, except by the order of the words, right?
In Russian, you actually have a marker on the end. If you're using a Russian name and each of those names, you'll also say, is it, you know, agent, it'll be the, you know, nominative, which is marking the subject, or an accusative will mark the object. And you could put them in the reverse order. You could put accusative first. You could put subject, you could put...
the patient first, and then the verb, and then the subject. And that would be a perfectly good Russian sentence. And it would still mean, I could say John kissed Mary, meaning Mary kissed John, as long as I use the case markers in the right way. You can't do that in English.
And so... I love the terminology of agent and patient and the other ones you used. Those are sort of linguistic terms, correct? Correct.
Those are for kind of meaning. Those are meaning. And subject and object are generally used for position. So subject is just like the thing that comes before the verb, and the object is the one that comes after the verb. The agent is kind of like the thing doing it. That's kind of what that means, right? The subject is often the person doing the action, right? The thing.
Okay, this is fascinating. So how hard is it to form a tree in general? Is there... Is there a procedure to it? Like, if you look at different languages, is it supposed to be a very natural, like, is it automatable, or is there some human genius involved?
I think it's pretty automatable at this point. People can figure out what the words are. They can figure out the morphemes, which are the, technically, morphemes are the minimal meaning units within a language, okay? And so, when you say eats, Or drinks, it actually has two morphemes in English.
There's the root, which is the verb, and then there's some ending on it which tells you, you know, that's the third person singular. Can you say what morphemes are? Morphemes are just the minimal meaning units within a language. And a word is just kind of the things we put spaces between in English. And they have a little bit more. They have the morphology as well.
They have the endings, this inflectual morphology on the endings, on the roots.
It modifies something about the word that adds additional meaning.
Yeah, yeah, yeah. And so we have a little bit of that in English, very little. You have much more in Russian, for instance. But we have a little bit in English. And so we have a little on the nouns. You can say it's either singular or plural. And you can say the same thing for verbs. Like simple past tense, for example. So, you know, notice in English we say drinks.
you know, he drinks, but everyone else is, I drink, you drink, we drink. It's unmarked in a way. And then, but in the past tense, it's just drank. For everyone, there's no morphology at all for past tense. There is morphology, it's marking past tense, but it's kind of, it's an irregular now. So we don't even, you know, drink to drank, you know, it's not even a regular word.
So in most verbs, many verbs, there's an ed we kind of add. So walk to walked, we add that to say it's the past tense. I just happened to choose an irregular because the high frequency word and the High-frequency words tend to have irregulars in English. What's an irregular? Irregular is just, there isn't a rule. So drink to drank is an irregular.
Drink, drank, okay.
As opposed to walk, walked, talked, talked.
And there's a lot of irregulars in English.
There's a lot of irregulars in English. The frequent ones, the common words, tend to be irregular. There's many, many more low-frequency words, and those tend to be, those are regular ones.
The evolution of the irregulars are fascinating, because it's essentially slang that's sticky, because you're breaking the rules, and then everybody uses it and doesn't follow the rules, and they say screw it to the rules. It's fascinating. So you said morphemes, lots of questions. So morphology is what, the study of morphemes?
Morphology is the connections between the morphemes onto the roots. So in English, we mostly have suffixes. We have endings on the words, not very much, but a little bit, as opposed to prefixes. Some words, depending on your language, can have mostly prefixes, mostly suffixes, or both. And then even languages, several languages have things called infixes, where you have some kind of a general...
form for the root, and you put stuff in the middle. You change the vowels. That's fascinating.
That is fascinating. So in general, there's, what, two morphemes per word? Usually one or two? Or three?
Well, in English, it's one or two. In English, it tends to be one or two. There can be more. In other languages, a language like English, Like Finnish, which has a very elaborate morphology, there may be 10 morphemes on the end of a root. And so there may be millions of forms of a given word.
Okay, I'll ask the same question over and over. But... how does the, just sometimes to understand things like morphemes, it's nice to just ask the question, how does these kinds of things evolve? So you have a great book studying sort of the
how the cognitive processing, how language is used for communication, so the mathematical notion of how effective language is for communication, what role that plays in the evolution of language, but just high level, like how does a language evolve where English is two morphemes or one or two morphemes per word and then Finnish has infinity per word? So how does that happen? Is it just...
That's a really good question. That's a very good question. Why do languages have more morphology versus less morphology? I don't think we know the answer to this. I think there's just a lot of good solutions to the problem of communication. I believe, as you hinted, that Language is an invented system by humans for communicating their ideas.
And I think it comes down to we label the things we want to talk about. Those are the morphemes and words. Those are the things we want to talk about in the world. And we invent those things. And then we put them together in ways that are easy for us to convey, to process. But that's like a naive view. And I don't, I mean, I think it's probably right, right? It's naive and probably right.
Well, I don't know if it's naive. I think it's simple. Simple. I think naive is an indication that it's incorrect somehow. It's a trivial, too simple. I think it could very well be correct. But it's interesting how sticky. It feels like two people got together. It just feels like once you figure out certain aspects of a language, that just becomes sticky and the tribe forms around that language.
Maybe the language, maybe the tribe forms first and then the language evolves. And then you just kind of agree and you stick to whatever that is.
I mean, these are very interesting questions. We don't know really about how words, even words, get invented very much. Assuming they get invented, we don't really know how that process works and how these things evolve. What we have is... kind of a current picture of a few thousand languages, a few thousand instances. We don't have any pictures of really how these things are evolving, really.
And then the evolution is massively confused by contact, right? So as soon as one language group, one group runs into another, We are smart. Humans are smart. And they take on whatever is useful in the other group. And so any kind of contrast which you're talking about, which I find useful, I'm going to start using as well.
So I worked a little bit in specific areas of words, in number words and in color words. And in color... So we have, in English, we have around 11 words that everyone knows for colors. And many more if you happen to be interested in color for some reason or other. If you're a fashion designer or an artist or something, you may have many, many more words. But we can see millions.
Like if you have normal color vision, normal trichrometric vision, you can see millions of distinctions in color. So we don't have millions of words. The most efficient, no, the most detailed color vocabulary would have over a million terms to distinguish all the different colors that we can see, but of course we don't have that.
So it's somehow, it's been, it's kind of useful for English to have evolved in some way to, so there's 11 terms that people find useful to talk about, black, white, red, blue, green, yellow, purple, gray, pink, and I probably missed something there. Anyway, there's 11 that everyone knows. But you go to different cultures, especially the non-industrialized cultures, and there'll be many fewer.
So some cultures will have only two, believe it or not. The Danai in Papua New Guinea have only two labels that the group uses for color. Those are roughly black and white. They are very, very dark and very, very light, which are roughly black and white. And you might think, oh, they're dividing the whole color space into light and dark or something. And that's not really true.
They mostly just only label the black and the white things. They just don't talk about the colors for the other ones. And then there's other groups. I've worked with a group called the Chimani down in Bolivia in South America. And they have... three words that everyone knows, but there's a few others that several people, that many people know.
And so they have, it's kind of depending on how you count, between three and seven words that the group knows, okay? And again, they're black and white. Everyone knows those. And red, red is, you know, like that tends to be the third word that everyone, that cultures bring in. If there's a word, it's always red, the third one.
And then after that, it's kind of all bets are off about what they bring in. And so after that, they bring in a sort of a big blue-green group. They have one for that. And then different people have different words that they'll use for other parts of the space. And so anyway, it's probably related to what they want to talk... Not what they see, because they see the same colors as we see.
So it's not like they have a weak... a low color palette in the things they're looking at. They're looking at a lot of beautiful scenery, okay? A lot of different colored flowers and berries and things. And so there's lots of things of very bright colors, but they just don't label the color in those cases.
And the reason probably, we don't know this, but we think probably what's going on here is that what you do, why you label something is you need to talk to someone else about it. And why do I need to talk about a color
Well, if I have two things which are identical and I want you to give me the one that's different and the only way it varies is color, then I invent a word which tells you, you know, this is the one I want. So I want the red sweater off the rack, not the green sweater, right? There's two.
And so those things will be identical because these are things we made and they're dyed and there's nothing different about them. And so in industrialized society, we have You know, everything we've got is pretty much arbitrarily colored. But if you go to a non-industrialized group, that's not true. And so they don't—it's not only that they're not interested in color.
If you bring bright-colored things to them, they like them just like we like them. Bright colors are great. They're beautiful. But they just don't need to talk about them. They don't have—
So probably color words is a good example of how language evolves from sort of function. When you need to communicate the use of something, then you kind of invent different variations. And basically, you can imagine that the evolution of a language has to do with what the early tribes were doing.
What kind of problems are facing them, and they're quickly figuring out how to efficiently communicate the solution to those problems, whether it's aesthetic or functional, all that kind of stuff, running away from a mammoth or whatever.
I think what you're pointing to is that we don't have data on the evolution of language, because many languages were formed a long time ago, so you don't get the chatter anymore.
We have a little bit of old English to modern English because there was a writing system, and we can see how old English looked. So the word order changed, for instance, in old English to middle English to modern English. And so we could see things like that, but most languages don't even have a writing system. So of the 7,000, Only a small subset of those have a writing system.
And even if they have a writing system, it's not a very modern writing system. And so they don't have it. So we just basically have for Mandarin, for Chinese, we have a lot of evidence for a long time and for English and not for much else. Not for German a little bit, but not for a whole lot of long-term language evolution. We don't have a lot.
We just have snapshots is what we've got of current languages.
Yeah, you get an inkling of that from the rapid communication on certain platforms, like on Reddit, there's different communities, and they'll come up with different slang, usually from my perspective, driven by a little bit of humor, or maybe mockery or whatever, just talking shit in different kinds of ways. And you could see the evolution of language there.
because I think a lot of things on the internet, you don't want to be the boring mainstream. So you like want to deviate from the proper way of talking. And so you get a lot of deviation, like rapid deviation. Then when communities collide, you get like, just like you said, humans adapt to it. And you can see it through the lens of humor.
I mean, it's very difficult to study, but you can imagine like 100 years from now, well, if there's a new language born, for example, we'll get really high resolution data.
I mean, English is changing. English changes all the time. All languages change all the time. So, you know, there's a famous result about the Queen's English. So if you look at the Queen's vowels, the Queen's English is supposed to be, you know, originally the proper way for the talk was sort of defined by whoever the Queen talked or the King, whoever was in charge.
And so if you look at how her vowels changed, from when she first became queen in 1952 or 53, when she was coronated, the first, I mean, that's Queen Elizabeth who died recently, of course, until, you know, 50 years later, her vowels changed, her vowels shifted a lot. And so that, you know, even in the sounds of British English, in her, the way she was talking was changing.
The vowels were changing slightly. So that's just, in the sounds, there's change. I don't know what's, you know, we're, we're, I'm interested. We're all interested in what's driving any of these changes. The word order of English changed a lot over a thousand years, right? So it used to look like German.
You know, it used to be a verb final language with case marking, and it shifted to a verb medial language. A lot of contact. So a lot of contact with French. And it became a verb medial language with no case marking. And so it became this, you know, verb initially thing. And so that's... It's evolving. It totally evolved. And so it may very well... I mean...
You know, it doesn't evolve maybe very much in 20 years is maybe what you're talking about. But over 50 and 100 years, things change a lot, I think.
We'll now have good data on it, which is great.
That's for sure.
Can you talk to what is syntax and what is grammar? So you wrote a book on syntax.
I did. You were asking me before about how do I figure out what a dependency structure is. I'd say the dependency structures aren't that hard generally. I think there's a lot of agreement of what they are for almost any sentence in most languages. I think people will agree on a lot of that.
There are other parameters in the mix such that some people think there's a more complicated grammar than just a dependency structure. And so, you know, like Noam Chomsky, he's the most famous linguist ever. And he is famous for proposing a slightly more complicated syntax. And so he invented phrase structure grammar. So he's... well-known for many, many things.
But in the 50s, in the early 60s, like the late 50s, he was basically figuring out what's called formal language theory. And he figured out sort of a framework for figuring out how complicated a certain type of language might be, so-called phrase-structured grammars of language might be. And so his idea was that maybe
We can think about the complexity of a language by how complicated the rules are. And the rules will look like this. They will have a left-hand side and they'll have a right-hand side. Something on the left-hand side will expand to the thing on the right-hand side. So say we'll start with an S, which is like the root, which is a sentence. And then we're going to expand to things.
like a noun phrase and a verb phrase is what he would say, for instance, okay? An S goes to an NP and a VP is a kind of a phrase structure rule. And then we figure out what an NP is. An NP is a determiner and a noun, for instance. And a verb phrase is something else, is a verb and another noun phrase and another NP, for instance. Those are the rules of a very simple phrase structure, okay?
And so he proposed phrase structure grammar, right? as a way to sort of cover human languages. And then he actually figured out that, well, depending on the formalization of those grammars, you might get more complicated or less complicated languages.
And so he said, well, these are things called, you know, context-free languages, that rule that he thought, you know, human languages tend to be what he calls context-free languages. But there are simpler languages, which are so-called regular languages, and they have a more constrained form to the rules of the phrase structure of these particular rules.
So he basically discovered and kind of invented ways to describe the language. And those are phrase structure, a human language. And he was mostly interested in English initially in his work in the 50s.
So quick questions around all this. So formal language theory is the big field of just studying language formally.
Yes, and it doesn't have to be human language there. We can have computer languages, any kind of system which is generating some set of expressions in a language. And those could be like the... The statements in a computer language, for example. It could be that or it could be human language. So technically you can study programming languages. Yes, and have been.
I mean, heavily studied using this formalism. There's a big field of programming languages within the formal language. Okay.
And then phrase structure grammar is this idea that you can break down language into this S-N-P-V-P type of thing?
It's a particular... formalism for describing language. And Chomsky was the first one. He's the one who figured that stuff out back in the 50s. And that's equivalent, actually. The context-free grammar is actually kind of equivalent in the sense that it generates the same sentences as a dependency grammar would. The dependency grammar is a little simpler in some way.
You just have a root and it goes, like, we don't have any of these, the rules are implicit, I guess. And we just have connections between words. The free structure grammar is kind of a different way to think about the dependency grammar. It's slightly more complicated, but it's kind of the same in some ways.
So to clarify, dependency grammar is the framework under which you see language and you make the case that this is a good way to describe language. And Noam Chomsky is watching this, he's very upset right now, so let's, just kidding, but what's the difference between, where's the place of disagreement? Between phrase structure grammar and dependency grammar.
They're very close. So phrase structure grammar and dependency grammar aren't that far apart. I like dependency grammar because it's more perspicuous, it's more transparent about representing the connections between the words. It's just a little harder to see in phrase structure grammar. The place where Chomsky sort of devolved or went off from this is he also thought there was...
um something called movement okay and so it's so and that's where we disagree okay that's the place where i would say we disagree and and and i mean well maybe we'll get into that later but the idea is if you want to do you want me to explain that no i would love can you explain movement movement okay so you're saying so many interesting things yeah yeah okay so here's the movement is
Chomsky basically sees English and he says, okay, I said, you know, we had that sentence earlier, like it was like two dogs entered the room. Let's change it a little bit, say two dogs will enter the room. And he notices that, hey, English, if I want to make a question, a yes, no question from that same sentence, I say, instead of two dogs will enter the room, I say, will two dogs enter the room?
Okay. There's a different way to say the same idea. And it's like, well, the auxiliary verb, that will thing, it's at the front as opposed to in the middle. Okay. And so, and he looked, you know, if you look at English, you see that that's true for all those modal verbs and for other kinds of auxiliary verbs in English. You always do that. You always put an auxiliary verb at the front.
And when he saw that, so if I say, I can win this bet, can I win this bet, right? So I move a can to the front. So actually, that's a theory. I just gave you a theory there. He talks about it as movement. That word in the declarative is the root, is the sort of default way to think about the sentence, and you move the auxiliary verb to the front. That's a movement theory, okay?
And he just thought that was just so obvious that it must be true. That there's nothing more to say about that, that this is how auxiliary verbs work in English. There's a movement rule such that to get from the declarative to the interrogative, you're moving the auxiliary to the front.
And it's a little more complicated as soon as you go to simple present and simple past, because if I say, you know, John slept, you have to say that. did John sleep, not slept John, right? And so you have to somehow get an auxiliary verb. And I guess underlyingly, it's like slept is, it's a little more complicated than that, but that's his idea. There's a movement, okay?
And so a different way to think about that, that isn't, I mean, then he ended up showing later, right? So he proposed this theory of grammar, which has movement. There's other places where he thought there's movement, not just auxiliary verbs, but things like the passive in English and things like questions, WH questions, a bunch of places where he thought there's also movement going on.
And in each one of those, he thinks there's words, well, phrases and words are moving around from one structure to another, which he called deep structure to surface structure. I mean, there's like two different structures in his theory, okay? There's a different way to think about this. which is there's no movement at all.
There's a lexical copying rule such that the word will or the word can, these auxiliary verbs, they just have two forms. And one of them is the declarative and one of them is the interrogative. And you basically have the declarative one and, oh, I form the interrogative or I can form one from the other. It doesn't matter which direction you go.
And I just have a new entry, which has the same meaning, which has a slightly different argument structure. Argument structure is just a fancy word for the ordering of the words. And so if I say, you know, it was the dog's two dogs can or will enter the room, there's two forms of will.
One is will declarative, and then, okay, I've got my subject to the left, it comes before me, and the verb comes after me in that one. And then the will interrogative, it's like, oh, I go first. Interrogative, will is first, and then I have the subject immediately after, and then the verb after that.
And so you can just generate from one of those words another word with a slightly different argument structure, with different ordering,
And these are just lexical copies. They're not necessarily moving from one to another. There's no movement. There's a romantic notion that you have like one main way to use a word and then you could move it around. Right, right. Which is essentially what movement is implying.
Yeah, but that's the lexical copying is similar. So then we do lexical copying for that same idea that maybe the declarative is the source and then we can copy it. And so an advantage is Well, there's multiple advantages of the lexical copying story. It's not my story.
This is like Ivan Sog, linguists, a bunch of linguists have been proposing these stories as well, you know, in tandem with the movement story. Okay, you know, Ivan Sog died a while ago, but he was one of the proponents of the non-movement of the lexical copying story. And so that is that a great advantage is, well...
Chomsky, really famously in 1971, showed that the movement story leads to learnability problems. It leads to problems for how language is learned. It's really, really hard to figure out what the underlying structure of a language is if you have both phrase structure and movement. It's really hard to figure out what came from what. There's a lot of possibilities there.
If you don't have that problem, the learning problem gets a lot easier.
Just say there's lexical copies.
Yeah. Yeah, yeah.
When we say the learning problem, do you mean humans learning a new language?
Yeah, just learning English. So a baby is lying around listening to the crib, listening to me talk, and how are they learning English? Or maybe it's a two-year-old who's learning interrogatives and stuff. How are they doing that? Are they doing it vocally? So Chomsky said it's impossible to figure it out, actually. He said it's actually impossible, not hard, but impossible.
And therefore, that's where universal grammar comes from, is that it has to be built in. And so what they're learning is that there's some built-in movement that's built in in his story. It's absolutely part of your language module. And then you are... you're just setting parameters. You're said, depending on English, it's just sort of a variant of the universal grammar.
And you're figuring out, oh, which orders does English do these things? The non-movement story doesn't have this. It's like much more bottom-up. You're learning rules. You're learning rules one by one. And, oh, this word is connected to that word. Another advantage, it's learnable. Another advantage of it is that it predicts that not all auxiliaries might move.
It might depend on the word, depending on whether you... And that turns out to be true. So there's words that don't really work as auxiliary. They work in declarative and not in interrogative. So I can say, I'll give you the opposite first. I can say, aren't I invited to the party? And that's an interrogative form. But it's not from, I aren't invited to the party. There is no I aren't, right?
So that's interrogative only. And then we also have forms like ought. I ought to do this. And I guess some old British people can say— Ought I. Exactly. It doesn't sound right, does it? For me, it sounds ridiculous. I don't even think ought is great, but I mean, I totally recognize I ought to do it. It's not too bad, actually. I can say I ought to do this. That sounds pretty good.
If I'm trying to sound sophisticated, maybe.
I don't know. It just sounds completely out to me. Odd eye. Anyway, so there are variants here. And a lot of these words just work in one versus the other. And that's fine under the lexical copying story. It's like, well, you just learn the usage. Whatever the usage is, is what you do with this word. But it's a little bit harder in the movement story.
The movement story... That's an advantage, I think, of lexical copying. In all these different places, there's all these usage... which make the movement story a little bit harder to work.
So one of the main divisions here is the movement story versus the lesson, the copy story. That has to do about the auxiliary words and so on. But if you rewind to the phrase structure grammar.
Yeah. versus dependency grammar. Those are equivalent in some sense in that for any dependency grammar, I can generate a phrase structure grammar which generates exactly the same sentences. I just like the dependency grammar formalism because it makes something really salient, which is the lengths of dependencies between words, which isn't so obvious in the phrase structure.
In the phrase structure, it's just kind of hard to see. It's in there. It's just very, very, it's opaque.
uh technically i think phrase structure grammar is mappable to dependency grammar and vice versa and vice versa yeah but there's like these like little labels s and pvp yeah for a particular dependency grammar you can make a phrase structure grammar which generates exactly those same sentences and vice versa but there are many phrase structure grammars which you can't really make a dependency grammar
I mean, you can do a lot more in a phrase structure grammar, but you get many more of these extra nodes, basically. You can have more structure in there. And some people like that, and maybe there's value to that. I don't like it.
Well, for you, so we should clarify, so dependency grammar is just, well, one word depends on only one other word, and you form these trees, and that makes, it really puts priority on those dependencies, just like as a There's a tree that you can then measure the distance of the dependency from one word to the other.
They can then map to the cognitive processing of the sentences, how easy it is to understand and all that kind of stuff. So it just puts the focus on just like the mathematical distance of dependence between words. So it's just a different focus.
Absolutely.
Just continue on the thread of Chomsky because it's really interesting. Because as you're... discussing disagreement, to the degree there's disagreement, you're also telling the history of the study of language, which is really awesome. So you mentioned context-free versus regular. Does that distinction come into play for dependency grammars?
No. Okay. Not at all. I mean, regular languages are too simple for human languages. It's a part of the hierarchy, but human languages in the phrase structure world are at least context-free, maybe a little bit more, a little bit harder than that. So there's something called context-sensitive as well, where you can have, like this is just the formal language description,
In a context-free grammar, you have one... This is like a bunch of formal language theory we're doing here.
I love it.
Okay. So you have a left-hand side category, and you're expanding to anything on the right. That's a context-free. The idea is that that category on the left expands in independent of context to those things, whatever they are on the right. It doesn't matter what. And a context-sensitive... says, okay, I actually have more than one thing on the left.
I can tell you only in this context, you know, maybe you have like a left and a right context or just a left context or a right context. I have two or more stuff on the left tells you how to expand those things in that way. Okay, so it's context sensitive. A regular language is just more constrained. And so it It doesn't allow anything on the right.
It allows very... Basically, it's one very complicated rule is kind of what a regular language is. And so it doesn't have any... I was going to say long-distance dependencies. It doesn't allow recursion, for instance. There's no recursion. Yeah, recursion is where you... Human languages have recursion. They have embedding.
And you can't... Well, it doesn't allow center-embedded recursion, which human languages have, which is what... Center-embedded recursion.
So within a sentence? Within a sentence.
Yeah, within a sentence. So here we're going to get to that. But the formal language stuff is a little aside. Chomsky wasn't proposing it for human languages even. He was just pointing out that human languages are context-free. Because that was kind of stuff we did for formal languages. And what he was most interested in was
human language, and that's like, the movement is where we, where he sort of set off on the, I would say, a very interesting, but wrong foot. It was kind of interesting, it's a very, I agree, it's a very interesting history. So he proposed this,
multiple theories in 57 and then 65 there they all have this framework though was phrase structure plus movement different versions of the of the phrase structure and the movement in the 57 these are the most famous original bits of chomsky's work and then 71 is when he figured out that those lead to learning problems that that there's cases where a kid could never figure out which rule um which set of rules was intended and and so and then he said well that means it's innate
It's kind of interesting. He just really thought the movement was just so obviously true that he couldn't... He didn't even entertain giving it up. It's just obvious. That's obviously right. And it was later where people figured out that there's all these subtle ways in which things which look like generalizations aren't generalizations across the category.
They're word-specific, and they kind of work, but they don't work across various other words in the category. And so it's easier to just think of these things as lexical copies. And I think he was very obsessed. I don't know. I'm just guessing. He really wanted this story to be simple in some sense. And language is a little more complicated in some sense. He didn't like words.
He never talks about words. He likes to talk about combinations of words. And words are... You know, if you look up a dictionary, there's 50 senses for a common word, right? The word take will have 30 or 40 senses in it. So there'll be many different senses for common words. And he just doesn't think about that. He doesn't think that's language. I think he doesn't think that's language.
He thinks that words are distinct from combinations of words. I think they're the same. If you look at my brain in the scanner while I'm listening to a language I understand, And you compare, I can localize my language network in a few minutes, in like 15 minutes.
And what you do is I listen to a language I know, I listen to, you know, maybe some language I don't know, or I listen to muffled speech, or I read sentences and I read non-words. Like I can do anything like this, anything that's sort of really like English and anything that's not very like English. So I've got something like it and not, and I got to control.
And the voxels, which is just, you know, the 3D pixels in my brain that are responding most, is a language area. And that's this left lateralized area in my head. And wherever I look in that network, if you look for the combinations versus the words, it's everywhere. It's the same. That's fascinating. And so it's like hard to find, there are no areas that we know. I mean, that's,
It's a little overstated right now. At this point, the technology isn't great. It's not bad. But we have the best way to figure out what's going on in my brain when I'm listening or reading language is to use fMRI, functional magnetic resonance imaging. And that's a very good localization technique.
method so i can figure out where exactly these signals are coming from pretty you know down to you know millimeters you know cubic millimeters or smaller okay very small we can figure those out very well the problem is the when okay uh it's it's measuring um oxygen okay and oxygen takes a little while to get to those cells so it takes on the order of seconds so i talk fast i probably listen fast and i can probably understand things really fast so a lot of stuff happens in two seconds
And so to say that we know what's going on, that the words right now in that network, our best guess is that whole network is doing something similar, but maybe different parts of that network are doing different things. And that's probably the case. We just don't have very good methods to figure that out right at this moment. And so...
Since we're kind of talking about the history of the study of language, what other interesting disagreements, and you're both at MIT, or were for a long time, what kind of interesting disagreements there, tension of ideas are there between you and Noam Chomsky?
And we should say that Noam was in the linguistics department, and you're, I guess for a time were affiliated there, but primarily brain and cognitive science department. which is another way of studying language, and you've been talking about fMRI. Is there something else interesting to bring to the surface about the disagreement between the two of you, or other people in the discipline?
Yeah, I mean, I've been at MIT for 31 years, since 1993, and Chomsky's been there much longer. So I met him, I knew him, I met when I first got there, I guess, and we would interact every now and then. I'd say our biggest difference is our methods. And so that's the biggest difference between me and Noam, is that I gather data from people.
I do experiments with people and I gather corpus data, whatever, whatever corpus data is available. And we do quantitative methods to evaluate any kind of hypothesis we have. He just doesn't do that. So, you know, you, you know, he has never once been associated with any experiment or corpus work ever. And so it's all thought experiments. It's his own intuitions.
So I just don't think that's the way to do things. Yeah. That's an across-the-street-there-across-the-street-from-us kind of difference between Brain and CogSci and linguistics. I mean, some of the linguists, depending on what you do, more speech-oriented, they do more quantitative stuff.
But in the meaning, words and, well, it's combinations of words, syntax, semantics, they tend not to do experiments and... and corpus analyses.
That's the biggest method. But the method is a symptom of a bigger approach, which is sort of a psychology philosophy side on GNOME, and for you, it's more sort of data-driven, sort of almost like a mathematical approach.
Yeah, I mean, I'm a psychologist. So I would say we're in psychology. Brain and Cognitive Science is MIT's old psychology department. It was a psychology department up until 1985, and it became the Brain and Cognitive Science department. And so, I mean, my training is math and computer science, but I'm a psychologist. I mean, I don't know what I am.
So data-driven psychologists, well, you are.
I am what I am, but I'm happy to be called a linguist, I'm happy to be called a computer scientist, I'm happy to be called a psychologist, any of those things.
But in the actual, like how that manifests itself outside of the methodology is like these differences, these subtle differences about the movement story versus the lexical copy story.
Those are theories. But I think the reason we differ in part is because of how we evaluate the theories. And so I evaluate theories quantitatively, and Noam doesn't. Got it.
Okay, well, let's explore the theories that... You explore in your book. Let's return to this dependency grammar framework of looking at language. What's a good justification why the dependency grammar framework is a good way to explain language? What's your intuition?
So the reason I like dependency grammar, as I've said before, is that it's very transparent about its representation of distance between words. So it's like, all it is, is you've got a bunch of words, you're connecting together to make a sentence. And...
a really neat insight which turns out to be true is that the further apart the pair of words are that you're connecting the harder it is to do the production the harder it is to do the comprehension it's harder to produce hard to understand when the words are far apart when they're close together it's easy to produce and it's easy to comprehend let me give you an example okay so
We have, in any language, we have mostly local connections between words, but they're abstract. The connections are abstract, they're between categories of words. And so you can always make things further apart if you add modification, for example, after a noun, so a noun in English comes before a verb, the subject noun comes before a verb, and then there's an object after, for example.
So I can say what I said before, you know, the dog entered the room or something like that. So I can modify dog. If I say something more about dog after it, then what I'm doing is,
indirectly i'm lengthening the dependence the dependence between dog and entered by adding more stuff to it so i just make just make it explicit here if i say um uh the the boy who the cat scratched cried we're going to have a mean cat here And so what I've got here is the boy cried. It would be a very short, simple sentence. And I just told you something about the boy.