CS224U: Natural Language Understanding

Podcast episode: Roger Levy

June 10, 2022

With Chris Potts

From genes to memes, evidence in linguistics, central questions of computational psycholinguistics, academic publishing woes, and the benefits of urban density.

Chris Potts:All right. I am delighted to welcome Roger Levy to the CS224U podcast. Roger is an incredible scholar whose research spans multiple fields – linguistics, cognitive psychology, NLP, and probably others I'm forgetting, maybe anthropology. We'll get to that a little bit later, I think.

Roger completed his PhD in Stanford linguistics in 2005, and as part of that work, he helped to lead a new wave of research and probabilistic approaches to linguistics that truly has reshaped the field in exciting ways. After that, he became a professor of linguistics at UC San Diego, and in 2016, he moved to MIT, where he is now a professor in the Department of Brain and Cognitive Sciences. And he's now a leading voice in research on computational psycholinguistics with deep learning, and he continues to do all sorts of innovative things in linguistics, cognitive psychology, and other areas.

So, welcome, Roger. Let's kick this off by talking a bit about how you seem to be unable to choose a field for yourself. For a long time, you've worked really fluidly across theoretical linguistics, psycholinguistics, NLP, cognitive psychology. What are the big questions that guide you as you work in all these areas?

Roger Levy:Wow, thank you, Chris. It's a delight to be here. Thank you for that really generous and undeserved introduction.

Chris Potts:Oh, come now!

Roger Levy:I'm very excited to be in this conversation. This is great. So I'm happy to talk about moving across fields and being unable to choose. In fact, why don't I kick it off by telling you the story of how I got into this field or this set of fields.

Chris Potts:Sure.

Roger Levy:I started my undergraduate work in mathematics and physics, and I wound up majoring in mathematics, mainly because I really liked taking math classes and learning math, and I couldn't decide on what else to major in, so math seemed as good as anything else. When I was an undergraduate, it was becoming pretty popular in the United States to spend a semester or a year studying abroad. And when I was going into my senior year, I had the opportunity to study abroad for a year in Singapore. And I spent that time doing a number of things, but the most life-changing experience that I had was starting to study Mandarin Chinese.

Now, at this time, I had finished three out of my four years of college, and I thought that where I was headed was to do graduate work in mathematical evolutionary biology – population genetics. That field studies the transmission of information in genetic form from organism to organism. And of course, that's the origin of modern biological life as we know it. Now, I was very influenced by the book The Selfish Gene, which maybe you might or might not know of. Many listeners may not know at this point, but it was a book by Richard Dawkins that was written early in his career and was very influential, I think, in the popular scientific gestalt when I was an undergraduate. That was the book that introduced the term "meme" to popular culture. And what was a meme? A meme was a unit of transmissible information from human to human in cultural form. And I thought that was an extremely interesting idea.

When I spent such a long time outside of my own familiar cultural environment and linguistic environment, I got very interested in, well, what are the transmissible units of information really like, and what would a science of those units of information really be like? And that got me increasingly interested in disciplines that were specific to studying us humans, to anthropology, to sociology, to history.

At that point, I didn't know what to do, because I thought I was going to go into evolutionary biology, but I was worried that I would spend all of my time working on the organisms whose units of transmissible information we can study in finer detail, because we can do more free experiments like bacteria or plants. I wanted to spend more time with people and understanding us as people.

At this time, I knew nothing at all about linguistics. Linguistics, in my view – it's always had a need to do better PR at the pre-college level than it does. I had no understanding of what the discipline was when I went to college, and it never crossed my mind that that would be something that I would want to study. So, what I did converge on was anthropology. There were some anthropologists, actually, at Stanford university, where I wound up going and doing my graduate work, who had taken the methods of population genetics, and more generally, the ideas of evolutionary biology, and applied them in a variety of ways to looking at human cultural variation, human skills, learning, acquisition, and that seemed like the direction that I should go.

So instead of going into a PhD in evolutionary biology, I started a PhD in anthropology at Stanford. At the time that I got there, though, I had come to realize that I was really, really interested in language. At this point, I had finished my undergraduate degree. I went back to East Asia for two years. I spent a year in Taiwan on a Fulbright continuing my study of Mandarin Chinese. And I also spent a year in Japan after that at the University of Tokyo, doing half time evolutionary anthropology research, and half time learning Japanese. And I had come to realize that my love of languages and of language was deep enough that I sort of ought to know a little bit about linguistics.

So my first year in graduate school, I took two graduate classes in linguistics with Stanford faculty, who you, of course, know well: Lexical Functional Grammar with Joan Bresnan and sociolinguistics with John Rickford. And that very rapidly convinced me that, really, linguistics was the right discipline for me. The analysis underlying trying to figure out what are those units of transmissible information, and what are the parameters of universality and of variation across the languages of the world? So that's what got me into the discipline as a whole.

So my first exposure was in theoretical linguistics and sociolinguistics, which sound very different, but if you're coming from a perspective where you had done math and anthropology, they're actually not that strange of a pairing.

I got a master's degree in anthropology instead of a PhD, and I transferred into the linguistics program. Chris Manning had just arrived, and I took Statistical Natural Language Processing, and I had always really loved probability of all the parts of mathematics, and I really like Stat NLP. And then I continued to work with Chris, and with many others in the department during my graduate work, but I was always interested: whereas NLP has historically been, and today still is, in large part, a more engineering-oriented discipline, I was always interested in the science of the human mind. And over time, I learned how to connect natural language processing, representational ideas, and theoretical linguistics with the psychology of language. I learned to do experimental linguistics in my postdoc, and those are the tools that I've brought together.

So, in a way, the underlying questions are still the same. We as a species are able to communicate with each other to encode thoughts into forms, use those forms to put those thoughts in other people's heads, or approximations of those thoughts. And that is deeply, intrinsically fascinating and challenging from a scientific understanding perspective in and of its own right. It's also the thing that gives rise to human society and civilization as we know it. And so, those are the fundamental questions. How do we represent? How do we mean? How do we communicate? What are the constraints on those things? And how do we learn how to do those things?

And so those are the central questions that guide me. I think everybody is motivated by slightly different central questions, but if you follow the central questions that motivate you, the chances are reasonable that they will lead you to cross-disciplinary boundaries. So I think it's happened to be the case that the same set of central questions that motivated me has led me to cross these boundaries. And I've been very lucky. I've been able to bring disparate fields, ideas, and methods together in ways that have turned out to be productive.

Chris Potts:So, from genotype to phenotype to memotype, short story? "Memotype" is a word I just made up.

Roger Levy:Well, I wouldn't say the phenotype. Phenotype might be at the end of both genotype and memotype. Yeah.

Chris Potts:But I have to ask, wait, so I probably know this in some sense, but where did you do your undergrad?

Roger Levy:It was at the University of Arizona.

Chris Potts:So did you know Noah Goodman already as an undergrad?

Roger Levy:I did know Noah Goodman. Noah was a few years behind me in the program, but yeah, we did know each other. My younger brother was also at University of Arizona. And he and Noah were pretty good friends, actually, during that time. So, yeah, Noah and I go back a very, very long way. I don't know about Noah, but I certainly could not have forecast as an undergraduate that I would've wound up in the field that I'm in. So it's sort of amazing the paths that we take.

Chris Potts:Though it is unusual, right, because you were doing this physics/math thing. Noah was also a mathematician at the time, right?

Roger Levy:Yeah.

Chris Potts:Then he went off and did real estate or something for a while, and you went off and did all these adventures in Southeast Asia, and then you both ended up back essentially doing the same kinds of things. What a trip!

Roger Levy:That's exactly right. That's right. It's amazing. Well, I think it speaks also to how compelling our field is, that it's just very attractive. It's a very cool field to study. We, of course, we have an anthropocentric type bias or self-centered bias in terms of thinking about that, but it sure does seem like an appealing field to work in to me.

Chris Potts:Do you ever long, though, for the simplicity of the human genome as compared to language – or at least systematicity?

Roger Levy:Well... Yeah. So in terms of what I miss from evolutionary biology and population genetics, there are a few things. Actually, a lot of it is the opacity to us as observers. We can introspect a lot about language, and we can get a lot of headway there. That means, actually, in some ways, I think our discipline is very scientifically far forward in terms of how much we understand about the content matter, because we have that massive advantage over, say, the proverbial Martian coming down to study language. So we understand intrinsically so much less about how genomes work. We don't have any intuitions about it. The evidentiary basis of the field has no intrinsic link to our own subjective intuitions or experiences. So, that also means that from a scientist's point of view, there's a lot of untilled soil. There's a lot we don't know that is there to discover. And of course, the impact is absolutely massive, and it's a very different kind of impact than the impact of natural language processing and linguistics, which is sort of transforming way we communicate versus perhaps transform our bodies. But, in some sense, they're very similar disciplines, because they deal about the representation and structuring and expression of information.

Chris Potts:Before we move on from that, just because you said something that's really interesting to me. If I understood correctly, you said that it's an asset in linguistics that we have intuitions.

Roger Levy:Hmm. Yeah.

Chris Potts:But I think, also, this can get in the way. There are some times when I kind of long for the idea that we as scientists, if we were always working on languages we didn't speak, that we might, in some sense, be better off. Does that sound off the wall to you? Or do you see what I mean?

Roger Levy:I see what you mean. Well, that's a rich and complex topic. I think the discipline would be very different if we had access to... Well, there are two different we's. There's we the people who are the subject of the kind of investigation that we do when we study language, and then there's we the individual and small groups of researchers that have our own favorite theories and ideas, and I think that it's the latter that is the most dangerous in terms of the thing that we have to be on guard about. Me relying on my intuitions while I build my theory is, I think, fraught, highly fraught.

Chris Potts:Yeah.

Roger Levy:But I do think that if we didn't have intuitions at all, then we would be missing out, potentially, on a whole lot of what we could understand. A whole lot of the depth and subtlety of linguistic meaning, for example, I think would be very difficult to achieve. That understanding would be difficult to achieve, without using intuitions that we can then validate independently in some form. So the non-self-interested individual who has intuitions about, "This means a certain thing," or, "I wouldn't use that expression in this context" – those things are extremely valuable, in my view.

Chris Potts:Oh, totally. But what about... Try this on for size, as a claim. Some of the most robust results in linguistics are from the areas where we as people, linguists, whoever, have the least conscious access. I'm thinking of things like articulatory phonetics, acoustic phonetics, right? We've got rock solid results. Even some stuff about online sentence processing seems really solid, and it seems not incidental that people aren't really attuned to those things at all, as compared to their intuitions about what a word means or what the constituents in a sentence are.

Roger Levy:Yeah. That's a great point. Well, okay. So I would say what a word means is tough.

Chris Potts:Yeah. Fair, fair. We could return to that, yes! We might delude ourselves into thinking we know!

Roger Levy:Here's the bird's-eye view picture that I like to draw. We linguists are fond of saying the language has different levels, and there's the phonetic level, there's the phonological level, so you go from sound or sign to abstract, discrete, combinatorial units that are expressed to sounds or signs, to morphology, the minimal building blocks of words, to words, to syntax, to semantics and pragmatics. And a lot of that, in the middle, we don't have direct access to, right? But language is sort of pinned to externally measurable things on both sides – to, as you say, articulatory phonetics or the articulatory characteristics of signs. That's the form-realization end. We can measure that.

That was one of the things that got me so passionate about linguistics. Of all human cultural symbolic activity, language is what we can most easily record, but it's also the thing that, in a lot of ways, we can introspect about very effectively. On the other end, the meaning side, I mean, that is pinned to something we can measure. The most raw form of that is, of course, people's behaviors in response to what they hear or read, or what they decide to say in response to the context. But in the best situations, the intuitions are sort of a convenient shortcut to get to approximations of that. That's on either side, and so the stuff in the middle, when you can't directly pin it to something observable, I absolutely agree.

Chris Potts:Yeah, wait, because you've given me another way to frame my idea, which is: the things that are purely behavioral, like the way you pronounce things and even stuff that's like how you modulate vowel quality in different word forms based on the context, stuff that you really, even as a linguist, might have trouble accessing consciously. Often, our most robust results are about them, and everything gets kind of wobbly when it is the kind of thing where a linguist, or even just a person who thinks about language, can purport to offer you a direct intuition, like, "This is the meaning," or, "This is this syntactic structure." What do you think of that?

Roger Levy:Well, not all the results are always as robust as one might hope, but I think that we do have a lot of robust results in that area. I'm a little less sure that I would say we don't have robust results in some of the other intermediate areas, like the way morphology and syntax work. I think there are some pretty robust results there.

Chris Potts:Well, they might be robust assuming we are willing to admit some uncertainty, some inherent uncertainty or instability, right, some probabilistic component? When they become categorical, I start to feel nervous again.

Roger Levy:Yeah. Well, you're opening a very big, deep, interesting theoretical and empirical question. I mean, in some ways, it's a theoretical question. What's the role of the categorical versus the noncategorical in our representations and in our theory building?

I spend a lot of time in my work, in psycholinguistics most of all – and in some ways, you might say that almost all or all of my work is psycholinguistics in some form, thinking about the linking functions between our theories and the observables. And I think that one thing we've learned over the last couple of decades is that we need to be very careful about stating those linking hypotheses and not moving fast and loose between the observable things and the theoretical implications. We have to be very careful about being explicit, because a lot of the time, and probably most of the time, the observables will underspecify at least some of the theoretical parts that are required. Maybe one version of that is we need to be very thoughtful about linking hypotheses.

Chris Potts:Agreed. Yeah. Yeah. Related to that, what about different notions of evidence that we can bring to bear on linguistic questions? It seems to me that, across all of linguistics and NLP, we make use of observational corpus data, intuitions from linguists and also from just people who speak a language, and also active manipulations in the lab – more like controlled psycholinguistic experiments. What do you think about all these sources of evidence? Do you think of them as complementing each other? Are some more valuable than others?

Roger Levy:Well, there are a couple of different axes there. So one is the naturalism versus experimental control axis, and then the other one, the nature of the evidence – like, is it a reaction time? Is it an acceptability judgement? Is it a corpus occurrence? Is it some kind brain recording or an articulatory recording?

The one that I guess I'm most fond of commenting on is the naturalistic versus experimental control axis. I spend a lot of time designing controlled experiments, and you get painfully aware, when you design a controlled experiment, of a couple of things.

One is how you need a lot of data. Well, this depends a little bit on the signal-to-noise ratio of your measurement, but for many things that I do, for example, reaction time, response time, reading time data, this is also extremely, maybe even more true of brain response data. The signal-to-noise ratio on the individual instance level, – that I read this word in that sentence, and what do my eyes do – is very poor. So you have to aggregate over a lot of data to get clear patterns. But the patterns are there, and the patterns are robust, and the patterns are theoretically informative.

So you spend a lot of time thinking about like, "How do I get decent statistical power to really sharpen my lens, so that I have a crisp view of the underlying pattern that I care about, whether it exists or not or is different than what I have hypothesized?" You get painfully aware of that. And you also get painfully aware about how that kind of repetition probably leads to very unnatural experiences for the person from whom you're collecting data.

Conversely, I've also spent a lot of time working with corpus data to try to test theories. You do that, and you become painfully aware of the sparsity of the phenomena of theoretical interest a lot of the time, and the presence of confounds that you don't even understand. Things are correlated with each other. This is not unique to language data. This is like any kind of naturalistic data, pretty much, out there in the world, even if you're studying other areas of the human sciences, or even ecology, say.

So I'm always most fond of finding converging evidence from multiple methods. So, here's a really great example. There's a wonderful student in our program, Thomas Clark, who is doing some work on alternation in the Russian comparative construction. So Russian has two different ways of doing a comparative construction, one of which is a lot like the "than" construction in English, like, "So-and-so is faster than so-and-so." But there's another one which doesn't involve a word "than". It involves just a case marking change. You change from the nominative case to the genitive case. And it turns out that there are some very interesting theoretical things we can say about that.

We started off doing a corpus study, and we got this very intriguing pattern, and my first reaction was, "This is really cool. Let's go try to break that pattern by changing the way we do the analysis, because if we can break the pattern, then I don't believe it." And we couldn't break the pattern. The next thing was, "Let's do a controlled experiment that tests at least part of what we found." And so we did the controlled experiment with my colleague Ted Gibson and also Ethan Wilcox, and the controlled experiment came out the same way, pretty much, as the corpus study in terms of the qualitative effects. And so that's the kind of situation that makes me comfortable. That's what I go for. You have different kinds of evidence that should both adjudicate on the same theoretical question and will point in the same direction, because science is hard, and we need to look at the same thing from multiple different angles to get the clearest view.

Chris Potts:I love it. Yeah. I feel exactly the same way.

It's a hallmark of the Chomskyan position that native speaker intuitions are privileged above all of the other sources of evidence that you mentioned. What do you think about that claim?

Roger Levy:Depending on what you're doing, intuitions may be reliable or not. As I said before, I think the self-interested scientist's intuitions are always the most suspect intuitions.

There's two dimensions of this. One is the intuitions part, and the other is the native speaker part. And I think we talked a little bit about the intuitions part. Of course, I've actually tried to avoid intuition data as much as I can, but I do think that they're valuable. First of all, I think, not as a source of evidence, but as a guide of where to look for evidence, I think intuitions are extraordinarily valuable.

For example, acceptability judgements. I think that there's something to the science of acceptability judgements. It's one thing about the human mind and psychophysics that blows me away is that you can actually ask people to rate how good sentences are, and you ask different people to rate the goodness of different sentences, and you can get anything at all that is interpretable and meaningful. You can! And that's sort of an amazing thing – both the access and the responses that we have to our internally-generated experiences or externally-generated sounds or written sequences, and also our ability to map those thing to numbers. It blows me away that that works, but it does. And acceptability judgements are intuitions, but they're the intuitions of a disinterested third party, and that, I think, is very important.

Now, the native part. I happen to be good at learning languages. I was inspired to study linguistics because of my experiences with Mandarin and with Japanese. And I had a lot of pride for a long time in my Mandarin. I'm pretty good at Mandarin. And I remember I would try to see how well I could do matching up to the intuitions there, or the judgments that are in the literature, when I was a graduate student, in Mandarin, and I was pretty good at it. But over time, it's become very clear to me thatmy intuitions are different than somebody who grew up from childhood speaking Mandarin.

So they're different. Now, that doesn't mean that they're the most important thing. So I think there is something to being a native speaker. Okay. Whether that comes from quantity of experience and time of immersion. Setting aside questions of critical period, I think there is something to being a native speaker, while I'll punt on exactly what a native speaker is, because I think that's an open, interesting scientific question.

Another thing is that, at the end state of acquisition, there is variability, but it also has an impressive degree of homogeneity in a lot of cases. And those are convenient populations to study, in a sense, because you get this large pool of people who have rather consistent states of their language, knowledge, and use. If for no other reason, that's a very convenient thing in some cases.
But it carries a risk. Well, actually, multilingualism is an extraordinarily common thing, and I think there's no question that we as a field don't study. The study of, let's call it non-native, of non-native language, knowledge, use, learning, doesn't receive as much glory, historically. And I think that's something we can all do to change and improve upon. I do think it's qualitatively more complex, because, in my impression, and this is not a scientific judgement, but I think, if we look at it really carefully, we'll see that the experiences of non-native speakers are more heterogeneous than the experience of monolingual speakers, and so there's that much more complexity to get your head around, but it's important.

I have a former postdoc, Yevgeni Berzak, who i now at the Technion. He joined my group as a postdoc, and he was passionate about machine learning, NLP, eye movements in reading, and non-native language processing. And so he collected this large corpus of eye movements in reading from native speakers of English and also non-native speakers of English on the same texts, non-native speakers of Spanish who are native in Spanish, Portuguese, Arabic, Chinese, and Japanese. And also varying in proficiency levels of English.

You can, to a reasonable degree, predict how good somebody is at English if they're a non-native speaker by looking at their eye movements in reading. You can even get a little bit of signal about what their native language was among those languages. Those are all extremely, extremely interesting things. And also, we can sort of see in terms of the way that linguistic knowledge is manifested in eye movement patterns is different for the less proficient speakers, but becomes more and more native-like for the more efficient speakers. So this has opened my eyes to this whole space that we can study. And I think there's a huge amount out there. I think that's very much the future.

Chris Potts:I think all of that makes sense. Let me share a few thoughts, though, to get your reaction. First, here's the dynamic that I worry about. Let's say you give a conference talk. You have marked example (7) as grammatical, and someone stands up and says, "I actually find that that example is outside of my experience. It seems ungrammatical to me." And so now we have a bit of a tension. And the way we resolve the tension culturally as linguists sometimes is by figuring out who's the native speaker and then weighing the contribution that they have offered on that basis. And if one of them is not a native speaker, they're very likely to have their contribution dismissed on those grounds, whereas I feel like both of those intuitions, if offered in a way that we regard as scientifically important, should be further investigated. Linguistics is one of the only sciences where your intuition about the world could also be the endpoint of your investigation, whereas elsewhere, all these intuitions would come together in something that you would call on experiment. What do you think of that?

Roger Levy:Yeah. I think the experiment should have been done before the presentation happened. And, basically, the variability would be there in the data or not. There would be an understanding of the population that was studied, where the data came from, and the data, the experiment can actually be replicated in a particular population.

Chris Potts:I feel like that's something that you all ushered in. That's something you offered. And I think, ultimately, this is going to make linguists happier, because instead of having that standoff about judgments, you instead just say, "Well, we did this experiment. Let's engage at the level of the materials and methods and not worry so much about what's happening in the room right here with these two people introspecting quickly about their language."

Roger Levy:Oh, absolutely.

Chris Potts:I think that's so important, methodologically, for the field.

Roger Levy:Yes. And, frankly, I think that, there's going to be a time in the future, and I don't think it's that far in the future, where we're going to look back and think, "I can't believe we spent our time, in real time, disagreeing about intuitions during a presentation." You can disagree about them when you're conceiving a project and figuring out how you're going to get data, but, I mean, that should have been sewn up way long ago, well before the presentation!

Chris Potts:To extent that we can, because we're always going to observe some variation.

Roger Levy:Oh, yeah, that's absolutely true. But then it wouldn't be challenging the data. Well, one can always challenge data, of course, but it would be of the form of, "Well, this isn't the way it is for me," and probably, there's some information in that.

Chris Potts:Exactly. You're in a minority of speakers, or you might have some attributes about your experience that have led you to a different intuition. We can figure it out. It's no longer deciding who was "right" based on this pair of opposing judgements.

Roger Levy:Absolutely. Absolutely. And in fact, of course, the studied population that the data came from could have been a minority as well, because there's correlational structure in terms of regional variation, social networks, and so forth.

Chris Potts:This is the sense in which even a question of syntax will become a question of sociolinguistics, just like you observed when you first got into the field.

The second dimension that I wanted to highlight here: so when I worry about native speaker intuitions, it's not that I think everyone is an equally proficient speaker of all languages. And if I learn Russian now, I don't think that my judgments are going to be especially valuable to you and your experiments on Russian. But, I worry about the very conservative way in which "native speaker" is often used, which would be, for example, even to exclude someone who had been speaking a language with their parents since they were seven years old, and then they wonder whether that has qualified them to be a native speaker, despite all of their experiences using the language in a social context and all of that stuff. That's the dark side, I think – that we would actually be weighing such questions, because I'm not sure that there is a categorical distinction that we could be thinking about here.

Roger Levy:Well, I think there are monolingual native speakers, and then there are multilingual native speakers. Even in the case of monolingual native speaker, there is a question of when a native... Native speaker, of course, is itself a theoretical construct, and I think it's a reasonably useful one. There are boundary cases, even in the case of monolingual native speaker who are monolingual, because there are people who were, for tragic reasons of neglect or failure of understanding of the situation by the parents, not exposed to language until they were relatively older, and their outcomes do look different, so they may be monolingual non-native speakers.

Chris Potts:But I could be a boundary speaker if I had just spent my first four years in the US and then moved to Japan for a long enough time that I did my elementary schooling there and then returned or gone to Germany for college. And at that point, I feel like linguists might ask me, "Do you have a native language?" And I might feel anxious about answering the question as a result of all of these diverse experiences.

Roger Levy:Yeah. That's a very good point.

Chris Potts:They don't need to be tragic to be tragic in the sense that I'm worried about for people and identity.

Roger Levy:That's very interesting. So, right. So certainly there are people who report feeling that they don't have a native language. And I think we want to understand that phenomenon better. So my impression is that sometimes that feeling indicates that there are speakers who are highly proficient at interacting in multiple milieux, and the context of interaction is correlated with subject matter of what is being talked about. So, I could conjecture that such a speaker might even be, in a meaningful sense, more proficient, if you sort of go across all of the languages and kinds of contexts in which they interact, than a typical monolingual. And so, yeah, I guess you're pointing to the loadedness of the term "native speaker" and its potential used for marginalization and othering, to use one term for this. And I agree. I think that's a risk. Yeah.

Chris Potts:I assume that here, again, the field is moving in a good direction with study of things like heritage languages and intuitions that people who have heritage languages have, and just more awareness of the role of being multilingual in the context of our experiences of language and cognition. Yeah. So maybe I'm worried for no reason, because I think things are getting better.

Roger Levy:I think that's true. In general, I think that's true. The field growing, and this is fertile territory. And so I think you're right.

Chris Potts:That's great. Yeah. And I think the experimental turn has been instrumental in this – yeah, absolutely, the merging of psycholinguistics with the rest of the field.

But let me circle back a little bit. So if you're at a party with a bunch of MIT professors who don't know you, and they say, "Hey, Roger, what do you work on?" what do you say in response?

Roger Levy:I work on the cognitive science of language.

Chris Potts:Ah. And you don't mention this huge computational aspect to your work, the NLP connections, all that stuff?

Roger Levy:I consider that as part of the cognitive science!

Chris Potts:Oh, okay!

Roger Levy:The way that I do it, yeah, absolutely.

Chris Potts:Yeah. That was a big question that I had for you I'm just so curious about – which is like, for this thriving field of computational linguistics with deep learning, which I feel is full of exciting research and researchers, what are the central questions guiding research in that area?

Roger Levy:Well, there are a number. So one of the things that I've been very excited about is that now that we have models, data sets, and compute power that allow us to actually learn from approximations of human childhoods or lifetimes worth of linguistic input, we can start to put... And this is now going to be doubly-hedged. We can put a lower bound on the upper bound of what could be acquired under certain conditions. It's a lower bound because we're talking about "from linguistic input only," and of course, humans learn from much richer multimodal information, and we're informed not only by the joint use of language or the use of language in context, but we're learning in non-linguistic contexts as well, and that is very plausibly feeding into how we make sense of the linguistic input itself.

So that's the lower bound part. The upper bound is that these models, from input to input token, are more efficient at extracting information than humans are. So it's doubly-hedged, but I still think it's a very useful thing, which is we can ask what generalizations can be extracted, with what degree of signal-to-noise ratio, in terms of the behavior of the resulting system from what kind of input? And it turns out that deep learning is really good for this. So, for example, we've used this to show that very abstract features of syntax, like filler-gap dependencies, island constraints on filler gap dependencies, actually, those do show up if you just apply these models to a childhood's worth of data. And it's so far beyond what I had anticipated.

You look at the incredibly decreasing perplexity, the better and better ability to predict words in corpora, and you ask: where's that coming from? And having spent a lot of time looking at naturalistic corpora and their difference from like, "I'm carefully constructing this grammatically interesting example," I think, well, you can win a lot of predictive value just by memorizing big multi-word sequences and pasting them together. Is that what they're doing? And it turns out they're doing things that are much more sophisticated than that. And it was not at all obvious to me that that would be the case. So that's very exciting. As you say, the models are incredibly powerful and incredibly flexible, and we can do amazing things with them.

Chris Potts:You used the phrase "a childhood's worth of data." Can you just say a bit more about what that is?

Roger Levy:Sure. I did this as a back-of-the-envelope exercise one time for a footnote in a 2012 paper, and it's the most useful part of the paper, which is about something totally different. So, there are various ways you can get to this computation, but, in the WEIRD environment of a middle-class American family, a child is probably getting about 15 million words of spoken language input a year, within a factor of two. Obviously, there's going to be variation from child to child, from household to household. Hart and Risley and many others have written about the correlation with socioeconomic status, for example. So there are these variations, but they're unlikely to be order of magnitude variations, except in very extreme cases. So I use like 15 million words, which is in the middle of Hart and Risley's bands. So 15 million words, you extrapolate that out. One good LSTM, by Kristina Gulordava, was trained on 90 million words, so that's six years. That's sort of a childhood's worth of data.

Chris Potts:But this might relate to your upper/lower-bound framing, because it's only one very thin slice of the overall social-linguistic input that these kids are presumably getting. And so, to just listen to 15 million words of podcasts with nothing attached to the voice coming through the speaker is not going to be the same as what these kids are experiencing, right? And that's important.

Roger Levy:Right. But then, yeah, yeah, yeah. So that's why I'm careful to frame it as learnability and not actually learning, right? So showing what's learnable is showing that, hey, there's some system that can get this out of the data. And you could say, "Well, that's not that exciting," but actually, the learnability debate in cognitive science was considered perhaps the most important theoretical debate for many decades. And now, we can actually put some bounds from data on it. It's just very interesting.

And, right, and so you can say, "Well, look. From this characterization, this thin-sliced characterization of the data, you could learn this much. How much more could we learn with something thicker that's closer to real human experience?" That's a complex future frontier. Obviously, many, many smart people and many, many wealthy companies are working on this kind of problem in various ways, and so I'm sure we'll learn more about that going forward.

Chris Potts:The learnability thing seems super important to me, because the strong version of the poverty of stimulus hypothesis would say: some aspects of our linguistic behavior that are systematic are unattested in data, or so vanishingly rare, that it would be impossible to induce them from data-driven processes alone, and therefore, they must be innate. And it does seem like a deep learning model, even if it just does have a weird slice of the world coming from text, if it could induce some of those systematic behaviors even when they're unattested, that should be enormously informative to people advocating for that poverty of stimulus, right?

Roger Levy:I think so. And I think, basically, we see now that those arguments were wrong. And I think one important thing for us to keep in mind is that it's easy for us to be insufficiently imaginative about what would constitute the right kind of evidence to learn a generalization. You hear a particular generalization, you might state it in a certain set of terms, and there might be another set of terms under which the ingredients are there. And well then, you go look in a corpus and see whether the generalization you're looking for is attested or not. That sort of depends on the way that you represent the generalization, right? And deep learning systems learn their own abstractions on which to generalize. The abstractions themselves are emergent. And so I think that, in a sense, the possibilities for being insufficiently imaginative are much... It's a much better picture. They're able to find generalizations very effectively.

So I just think that those arguments were wrong. And so we can move past that, and we can move to the question, for example, why do we see some structures commonly, perhaps universally – which, those questions of are things universal or not, also open questions. But we can move beyond the, "Well, it's because of learnability considerations," at least for some of these things. We can move to, "No, we may need to look for other kinds of explanations." And I think there are other kinds of explanations.

Chris Potts:This is so important, because I think one mistake that you see made, which is understandable in retrospect, is saying, "This particular construction that we have knowledge of is never in data, and therefore, it has to be innate," overlooking the possibility that that construction might be the product of a bunch of more primitive things that are widely attested. It was a kind of easy solution.

Roger Levy:Yes.

Chris Potts:There's a wonderful paper by Fernando Pereira that just talks about sentences being attested or not and how complicated it is to associate attestation with probability if you have a sophisticated language model, and he really does explode some basic assumptions of that Chomskyan argument in that paper, and I feel like those arguments, now in the era of deep learning, have only gotten stronger.

Roger Levy:Gotten much stronger. So the Pereira argument was about n-grams. It was about computing n-grams on word classes rather than specific words. It gets that much more powerful when you're dealing with a deep learning system, where the abstractions are layered on top of each other and potentially overlapping.

Chris Potts:Yeah. His argument was dead simple, which is: "Colorless green ideas sleep furiously" has a non-negligible probability for a language model trained on data where that sentence, of course, never appears – much higher probability than that sentence in reverse order, whereas Chomsky said they would have the same probability of zero, and it's that zero assumption that is the failure of imagination.

Roger Levy:So I have a conjecture. My read is that Chomsky thought... Because he didn't say zero. He would not say zero probability, but he would talk about "probability indistinguishable from zero".

Chris Potts:Okay, fair.

Roger Levy:And my conjecture is that he's thinking in raw probability spaces. If people would take one thing away from my work, my whole body of work, it's: "think in log probabilities." And of course, it's not just me. It's the whole field of NLP also saying compute your computing perplexities, and your cross-entropies. They're not raw probabilities. So if you're in log probability space, I mean, Chomsky was right! Even in Pereira's model, with the "colorless green ideas" case if you're talking about raw probabilities. They're not very different. But in log probability space, they're vastly different. And I feel very proud that we've been able to show that log probability's what matter for a lot of things for humans.

Chris Potts:Yeah. That's one thing you've taught me. Log probabilities are the only probabilities we should be thinking of in this context. But now I have my question. If I'm at a Chomsky talk, I try to ask a question if there's an opportunity. and my question will be, "When you gave this argument, were you thinking of probabilities or log probabilities?"

Roger Levy:Oh, I think you'd be summarily dismissed for thinking that it might matter.

Chris Potts:No, to his credit, he would answer in a dismissive tone, but he would definitely answer, right?

Roger Levy:Okay. That's fair.

Chris Potts:This is great. So the deep learning models are a kind of proxy learner, but why not just study people at that point? I mean, you're accustomed to having kids, adults, come into your lab. Why not study them directly as opposed to bringing in all these artifacts?

Roger Levy:Well, specifically with respect to the learnability case, I mean, those are counterfactual studies.

Chris Potts:Got it, yeah.

Roger Levy:It's: if we expose a learner to data of type X, what comes out? And it's crucial, because, well, we don't understand the inductive bias of the deep learning models. They do have an inductive bias, and we need to recognize that. And it would be nice if we understood better. Some of the nicest studies that sort of start to get at what's relevant for learnability are ones that show that the architectures can learn patterns that are very rarely occurring in natural languages. So that's a good way of showing that whatever the inductive bias is, it's not a sharply human language-like inductive bias.

This also brings me to another topic – another area where I think that these models are just so valuable for somebody, even studying humans, which is doing expectation estimation. There are a lot of things in the kind of theories and hypotheses that I work with where human language use or human language processing is sensitive to the expectation conditions of something, often a word, but not always a word, conditioned on some context. And boy, if you want to do that at scale, it can be very cumbersome! It's very hard, actually, to do that with humans. You have to worry about the relationship between units. You have to worry about the linking function from me giving you the context and you making a prediction of an expectation or a prediction about what'll happen in that context. And that linking function is actually hard to understand. We could talk more about that if you want, but we've written a lot about that.

In contrast, the models – it's practically effortless, and they do very well. There are going to be systematic discrepancies in those kinds of expectations between humans and machines, but I think those are also ones that are theoretically interesting to study, because they may actually shed light on what humans are bringing to the table in both acquisition but also in processing.

Chris Potts:Very cool. So they're like idealized human subjects for whom you don't require IRB approval to do weird experiments on them.

Roger Levy:Yeah. Yeah.

Roger Levy:So let me give you a really, really early version. This is... We have a paper on this, and I'll do the experiment on you. So, back before we were using deep learning, we had the n-grams, and so this was at a moment in time when we had the Google Ngrams web and Books corpora. And so, if you had a very short beginning of a sentence, you could actually just compute a relative frequency. If I give you a prefix, what comes next? So if I give you the prefix, "In the spring and," what comes next?

Chris Potts:"Summer."

Roger Levy:Okay. Can we keep on going? What other things, besides "summer"?

Chris Potts:Oh, so "in the spring and fall," "in the spring and summer." I guess I do all the seasons first.

Roger Levy:You do all the seasons first. Yeah. Okay. So that's what people do. If you look at corpora, "early" is more common than "fall" or "winter".

Chris Potts:Ah.

Roger Levy:And so why is that? So we have a hypothesis, which is that our expectations, at least in this particular task, they're sort of biased away from the "ground truth" of our linguistic experience, of our raw linguistic experience, by things like word–word associations.

Chris Potts:Right!

Roger Levy:So that's just one very specific kind of example. The question's like, "What's the relationship between the experience that we have and what we deploy to understand and use language?" And so within that big picture, that's just one sort of little microexample. Well, people are really, really good – we're really, really good at tuning our expectations so that they're pretty well-matched to the linguistic environment, which is good, because that allows us to understand and speak efficiently – but there are also these interesting sort of refractive processes that sort of change the angle of what goes in and what comes out.

Chris Potts:Your mention of the Google Ngrams, which I haven't thought about in a while, I confess, it reminded me to ask: you've been doing computational psycholinguistics for a long time now, since way before the return of deep learning. How has it changed? Is it different now that we have all these powerful language models and things?

Roger Levy:Well, I mean, I think the single biggest thing is that the quality of the research just keeps on getting better and better. There's just more and more good people entering the field and really wanting to bridge cognitive science, psychology, and linguistics and NLP. And so, that's the thing that I'm just the most thrilled about, is just the personnel. We have just a really, really robust field now. And part of that is just growth. All of NLP has grown, and so all the fields have grown. But I feel like it's really just made a really, really, really tangible difference in terms of the sense of community and the frequency of exciting results that I see.

One perennial question in human language processing is, "What is the causal role of the underlying representations – for example, syntactic structures – in how we understand and how we predict and how we learn?" And researchers in the field, myself included, we spend a lot of time hypothesizing what are the representations like, or what representations should we use to ask the questions that we want? And so there were a lot of parsing models, and that, of course, has changed a lot, because we are attracted to the tools that are being used most ubiquitously. And so, there's a lot less of that, and a lot more of simply autoregressive language models or sequence-to-sequence or bidirectional masked language models. Those are things, like in everywhere else, they're being used more in our field.

Now, I happen to believe that the causal role of the underlying grammatical structures is actually quite significant. But it was actually hard before to show that. And interestingly, I think we're in a better position to show it, because the neural language models give us a clearer picture of what things would be like if those grammatical representations weren't causally involved in the processes. So you can build models that do have the symbolic component explicitly represented and playing a causal role, and then you can build the models in which those are only implicitly represented and they play no direct causal role, and so we can start to do theory comparisons that way.

And it's made us a more quantitative field, rather than a qualitative field. A lot of my early computational psycholinguistics papers we're showing qualitative effects in probability space that arose, and now, we can actually build quantitative theories and really quantitatively calibrate them. We can aspire to and sometimes succeed in quantitatively calibrating them to human language processing, and we can do that much better than we used to be able to.

Chris Potts:Really cool. I'm curious, when you, right now, in 2022, when you think about syntax in the context of all this computational psycholinguistics, do you think about designing a tree-structured model, or do you think about having that syntax somehow be inferred or emergent or found latent in an otherwise plain old flat Transformer-like autoregressive model?

Roger Levy:I think that, for the most part, my leading hypothesis – for me – is that we compose meaning in our heads in ways that are largely tree-structured and that that is very important in human language understanding. And now, of course, Transformers do too. They just do so implicitly, right?

Chris Potts:Well, that was kind of my question, yeah, yeah.

Roger Levy:There can be some non-tree-structured things, too. For example, like anaphoric dependencies like between a pronoun and its antecedent, those break tree structure if you take them to be of the same kind of link as a syntactic tree. And those are dependencies that we clearly represent and we use.

So here's one big question, one perennial question. The role of the discrete compositional structure being causal in a theory is maybe best explicated by the serial/parallel processing debate in psycholinguistics.

The serial position is language is ambiguous, but generally, we can only pursue one interpretation at once. And the parallel view is we can pursue more than one interpretation at once. The full parallel view is we pursue all the interpretations all the time. That's probably implausible.

But, there's no such thing as ambiguity in terms of the actual representations that are deployed in any deep learning model. You have a context. You get new input. You have new representations. And it's a vector. And that's all deterministic, right? And so, it implicitly represents ambiguity. A cognitive theory built on those representations would have to be that we actually are full parallel – all the possible interpretations are represented implicitly, and we're entertaining them all at once, or the representations intrinsically fail to represent all the possible ambiguities, all the possible resolutions. So you're forced into one of those positions. Whereas if you have multiple competing structures and you can do things like put weights or probabilities on them, then you can actually have that debate. And I think that it's a pretty rich debate.

I don't think full parallel is right. The most obvious way of seeing that is it's hard for you to even bring to mind all of the interpretations of a sentence at once. So there's something to the notion of an interpretation of a sentence when a sentence can have multiple interpretations. So I do think that these these discrete structures, different paths of composition, do play a causal role, an important causal role in human language understanding, and they're going to need to be there in the ultimate theory of language in the human mind.

Chris Potts:Let me make sure I understand. So a deep learning model, in the terms you're giving, is a deterministic device for potentially representing many, many hypotheses that could be highly structured, and I think those structures could play a causal role in whatever the final behavior is. Where does a device like that fit into that psycholinguistic debate?

Roger Levy:It's an engineering convenience. If it's implicitly representing the alternative possibilities that, in the human mind, are being either simultaneously explicitly represented or selected among, then there's an engineering convenience of packing them all together.

Chris Potts:Right, but it might be important that the deep learning model doesn't select.

Roger Levy:That's true. No, that's absolutely true. I was saying within that particular theory.

Chris Potts:I see.

Roger Levy:Another possibility is that some of the mechanisms that we see deep learning models use are real for people. This is a very wonky detail point, but one of the remarkable things to me about the GPT architecture is that it's a Transformer decoder. That means that a word cannot influence the representation of previous words. So if a word late in a sentence influences the correct interpretation or the preferred interpretation of a word early, that information has to be packed on the late word's representation. And we've done some initial experiments with this. GPT can do that, and that's a real interesting thing about how it does that. And maybe there are some lessons to be learned for us.

Chris Potts:But can do it only because, at a global level, it knows about the influences, right? So the representations are kind of already infused with whatever interaction you're seeing that would go back in time, right? Sorry, forward in time.

Roger Levy:Yeah. Yeah, yeah. That's right. It only ever manifests in the prediction of future stuff, of course. I think that's what you're saying.

Chris Potts:Oh, okay. Yeah.

Roger Levy:Yeah. But my point is that a lot of information has to be loaded onto late words. But I think that also highlights an important feature of the psychological implausibility of the transformer decoder architecture specifically – and this is if you want to say like, "Is a GPT-like model something like what we have in the human mind?" – is that we know that late information can influence our representational context. We revise our representations of context as humans. And, furthermore, there's this point where the recurrent architectures seem, intuitively, at least, to be a pretty nice match for how that language is proceesed in real time.

And as far as we know, we don't have a buffer that can hold 1,024 tokens of input in high fidelity form in the human mind. But on the other hand, maybe another way of saying this is that there are interesting properties and behaviors that the models have that minimally can serve as hypotheses. In particular, I think they're the most interesting when they're hypotheses for the parts where it's very hard for the theories that are built purely on symbolic mechanisms to make a lot of headway.

So, a longstanding area of interest before the present era of deep learning was whether, for example, we could derive human performance limitations in language processing, like the inability to do good center embedding processing, whether those would fall out of neural network processing architectures. And I would say that the evidence, even to date, has been mixed. But minimally, we now have another tool in the arsenal to generate interesting hypotheses. And there's very interesting work out there today that tries to test those hypotheses with humans. Yeah.

Chris Potts:So, stepping back a little bit from our general question here, I think you've articulated some really nice general questions for computational psycholinguistics that would be an argument for presenting all of this work to the Cognitive Science Society, because they should know about this new set of tools and the way they could be used as proxies for human subjects, with caveats. But that would be the new methodological thing, which is an old methodological thing, but maybe now more successful. But a lot of this research is appearing at the ACL conferences. What do you make of that? Is there also a message you have from cognitive science to NLP about all these questions, or what?

Roger Levy:Oh, definitely. I mean, well, one beautiful thing about the *ACL community is that it has always been very welcoming of psycholinguistic research that is computationally respectable. I think that for decades, there's been, and this goes back to, for example, some of Fernando Pereira's early work as well, like in the 1980s. Mark Johnson, Stuart Shieber. Many, many figures in our field have found interest in, and tried to work on, themselves, psycholinguistic theory. And I think that's been a wonderful feature of the community.

That's a different question from, "How do I justify myself to the current community in relevance?" And I think there, actually, there's a very easy answer, which is that a complete model for an agent that needs to interact through language with humans has to have a good model embedded inside of it of how humans use language. So, cognitive science of language is intrinsically of interest for natural language processing. And it may appear not to be of interest only to the extent that NLP hasn't gotten as far as it needs to yet, because maybe we're not there. We're not ready to actually build and embed the model of the human language user inside real NLP systems. So that speaks to how rudimentary NLP technology still is. But I think it's inevitable that all the research in computational psycholinguistics is intrinsically of interest to NLP.

Chris Potts:And I think smart scholars will look to cognitive science for the future in all of this. I'm fond of pointing out that in the era when Geoff Hinton felt like he was an outcast from the AI community, he had received the highest honor from the Cognitive Science Society in recognition of all his contributions to what we now call deep learning. So the next big thing for NLP and AI is probably happening somewhere at the CogSci conference this year.

Do you have time for a few questions that are kind of about you, as a way of wrapping up?

Roger Levy:Yes. Yeah, sure.

Chris Potts:One question that's kind of intellectual is, I think, it's fair to say that you were doing probabilistic linguistics way before it was cool. And I think it was cool at Stanford first. That might have helped. And I think you might have played a part in making it kind of cool at Stanford. But how has your feeling about all of this changed over the years as probabilistic linguistics has become more accepted, and what was that journey like for you?

Roger Levy:Wow, that's a very interesting question. I feel like there are some papers that I wanted to write that I never wrote, which would've been more relevant before it was cool.

I'm not taking this as a more personal question, but I think it's a very interesting intellectual question. The lens focus and how sharp our view is thing is very, very, very interesting and important. When I was a graduate student, the debates at Stanford about probabilistic versus nonprobabilistic approaches to, generally, language variation, revolved around, "Well, is it really probabilistic, or are we just not sharpening our lens enough, and we're not seeing the right conditioning environments?"

Chris Potts:Right.

Roger Levy:So if you have a blurry lens, and you are just averaging over – technically speaking, marginalizing over – different kinds of conditioning environments, variable behavior will look probabilistic, but maybe it's much, much less probabilistic as you refine your view. And I think that that issue has receded in salience, but I think it's still there. I think the question of how rich and sensitive our conditioning is – and this can be formalized in terms of conditional entropies. It's basically like, the conditional entropies of the systems, of choices, of usage patterns, they don't go to zero, so there's still probabilistic residue. But one thing that we have seen is that there's mutual information among all sorts of things. So, as you condition more and more, things do become less high entropy and a little bit more systematic. And what that overall picture looks like, I think those issues are still there.

I've been around enough to see questions taken up and then become less focused on not because they were resolved, but because people got interested in doing other things instead. And that's a weird feeling, because I had heard about it, and it's a weird thing to see that happen in communities that I'm part of.

Chris Potts:This is so interesting, because it's making me think like... So Ivan Sag, right, our dearly departed colleague, over the course of his career, it seems like he went from having a pretty full-on syntactic view of syntactic islands, as we call them – extraction islands – to embracing the fact that there could be many, many factors shaping our judgements about sentences that look like they involve an island violation. But I don't know whether that meant that he had embraced probabilities as part of his linguistic theory or whether he was just becoming increasingly open-minded about the factors that you would want to include in your theory. And I guess I don't really care either way, because the important thing for me was that he led the way in terms of bringing in all that evidence, which had been neglected for so long. Does that make sense?

Roger Levy:It does make sense. I'm trying to think of what I think.

Chris Potts:Whereas Joan [Bresnan], it seems like what Joan did is the same thing of like, while a lot of factors could be impinging on this construction, let's develop probabilistic grammars to attack them, right?

Roger Levy:Yeah. I think that's right. And I think Ivan never... For whatever reason, he didn't quite want to go in that direction, or he just wasn't inclined to – not that he would reject them, but that wasn't what he wanted to focus on.

Chris Potts:That might be right, yeah.

Roger Levy:Yeah, yeah. Yeah, I think that's right.

Chris Potts:But you never experienced the thing of like a reviewer saying, "I didn't actually have to read beyond the first sentence of this abstract, because I know that language doesn't use numbers. Reject!"?

Roger Levy:I didn't submit enough to the wrong linguistic journals to have that. I submitted to Language when I submitted to linguistics journals.

Chris Potts:Oh, and was that more open?

Roger Levy:Oh yeah. Yes, yes, yes.

Chris Potts:Okay, that's good.

Roger Levy:Yeah. Yeah. I mean, my paper on binomials, well it's really Sarah Benor's paper on binomials that I was fortunate to be a part of, was... They were like, "Oh, this is great. We now have probabilities on these things." Yeah, no, they were extremely positive. But you are making me think about... I think there might have been some discussion about that in the the action editor's letter at some point, about like, "I think this is the paper we need in Language now," or something like that.

Chris Potts:Okay. Signs of changing times. No, that's good.

Your pinned tweet right now is about how you won't review for Elsevier journals. So what are your thoughts on that particular stand and also on scientific publishing in general?

Roger Levy:Oh, absolutely. So somehow, I got attuned to these issues around 2010 or so. I don't know how. If you're a psycholinguist, you're in a field that is pretty heavily Elsevier-controlled. So two of the crucial journals, Cognition and the Journal of Memory and Language, which are top coin-of-the-realm journals, are Elsevier journals. And at some point, I started reading the author agreement to these things, and I just got very offended, realizing that basically, I was handing over my hard labor for free to this publisher, who was then proceeding to make an enormous profit margin on it.

Oh, it was a combination of that and the terrible experiences I had with copy editing for Elsevier, and where basically, very rarely if ever, did things improve between the version of the paper that I submitted and the thing that they were going to publish. And usually, they introduced a lot of problems that I had to fix and I had to insist on them fixing, because they wouldn't fix it unless you really made a massive fuss. And I started getting very offended, and I got sensitized to open access. Eric Bakovic at UC San Diego, where I was faculty at the time, was quite involved in open access. He brought Stuart Shieber, who has been a real leader in that, to give a talk at UC San Diego. And I just got very passionate about this.

That has led to many things, but one of them is, one of the biggest academic service things that I'm involved in, is that since 2017 or '18, I've been the chair of MIT's committee on the library system, which is the committee that basically provides supervision and guidance of MIT Libraries' activities. MIT has been a longtime leader in open access. And, well, I have been sort of an advisor and sometimes collaborator in doing things to help advance that. One of the things that we did was that, in 2019, MIT Libraries and my committee collaborated to produce what's called the MIT Framework for Publisher Contracts, and what that is, it's sort of a declaration of a set of principles about protecting authors' rights to share their research and institutions' rights to have the research that they're supporting getting done to be shared.

Why is it a framework for publisher contracts? Because university libraries, they have much stronger positions of agency and efficacy in the complex landscape of scholarly publishing than individual academics. So a library can say, "Here's what we're going to stand up for. Here's what we want," every time it renegotiates a contract. And the Framework for Publisher Contracts said, basically, in short, "We want better open access terms in a variety of specific ways."

For almost every publisher, it's led to major progress, and sometimes really, really exciting new models for publication, with the Association for Computing Machinery, with AAAS, that publishes science with the American Physical Society. And we're starting to get there with major commercial scholarly publishers, as well. But we have had tremendous difficulty with Elsevier. Elsevier is the odd one out in terms of its behavior. So, MIT has been out-of-contract with Elsevier since 2020. We're just about two and a half years in now.

I was not part of the negotiating team, but I was part of the advisory process. I was fully in support of MIT Libraries' decision to not renew the contract with Elsevier. And, I mean, I'm as affected as anybody else because of these major journals that our field uses. But it turns out thatm actually, you can get by just fine. Most of the articles, the older articles, are no longer under embargo, and there are funder mandates, so a lot of them, you can get on open access repositories, like at the NIH, the NSF now. You can even just email the author and say, "Hey, I don't have access because my library's taking a stand for open access and for equitable publication practices and scholarly publishing, can you send me the final published version of your article?" And the answer is yes, Elsevier's sharing policy allows you to do that.

So we're getting along just fine. I hope that what we're doing will serve as inspiration. We were inspired by the German University Consortium project DEAL. There have been other university consortia in other parts of Europe that have also taken very strong stands against Elsevier, Springer Nature, and other publishers, and made a lot of progress. And the University of California, as well, did this a couple years ago, as well. We're now sort of in a leadership position in the US in terms of open access publication, and I hope that's something that other institutions can build on and learn from. And so, I'm very excited about that. So, that was a long answer to a simple question.

Chris Potts:This is encouraging, but you must have dear, beleaguered friends who are editors at Elsevier journals. What do you do when they come calling to you and say, "Roger, please, I'm having trouble finding a reviewer for this paper that is a response to your paper. Could you please make an exception for me and review the paper"? What do you do?

Roger Levy:Nobody's ever asked me to make an exception.

Chris Potts:Oh.

Roger Levy:No. I have a copy-paste.

Chris Potts:Ah.

Roger Levy:I have a template, and it lives in the root directory of my reviewing folder. I just pull it out, insert editor name, journal name, and then I add one to the tally that I have of the number of Elsevier papers that I have refused to review.

Chris Potts:Do you want to share the tally?

Roger Levy:I'd have to look it up, but, I mean, it's over a dozen at this point, certainly. Yeah. Yeah. People probably are getting wind of it. I mean, there are more and more alternatives. So, for example, I was also very fortunate to be involved in the launch of a wonderful new diamond open access journal. Diamond open access means that not only is it open access, the final version of the paper is open access, but the author does not have to pay to publish if they don't have the funds. And so, basically, Fernanda Ferreira and Brian Dillon are the co-founding editors of Glossa Psycholinguistics, which is the first sister journal to Glossa, which, of course, is this groundbreaking journal in linguistics for open access, led by the one and only Johan Rooryck.

Chris Potts:What about NLP? Is it good in this respect?

Roger Levy:NLP is a shining light of goodness!

Chris Potts:Oh, wait a second! But how much does it cost to register for the ACL conferences this year?

Roger Levy:That part, very tough. But I think that selling the proceedings would not be the right way to go. The economic models need to be worked out, and NLP absolutely has an issue with this. I think it's very manifest, for example, in the different amounts of funding that are available to different parts of the community, for example, people in linguistics departments versus in computer science departments, a lot of the time. But I think that there's a longer conversation we could have about how to deal with that, and that's complex. But in terms of actually opening this research, not just in comparison to non-computer science fields, but in comparison to other parts of computer science, a lot of the time, NLP is way ahead. Computer vision is also excellent in this respect. So it's something that if you're an NLP researcher, you may not even have to think about. But then, for example, oh, I was very glad when Springer Nature created Nature Machine Intelligence as a non-open access AI journal. There was actually quite a bit of backlash from the community, and I think it hasn't established itself that strongly, and it shouldn't, because the world needs no new non-open access journals.

Chris Potts:Are you still doing triathlons?

Roger Levy:I was, and I injured my knee, and so, I'm not running anymore. But I'm still biking and swimming a lot. I'm actually doing my first triathlon relay with a couple of friends in just over two weeks up in the White Mountains in New Hampshire, so, that's going to be fun. I'm doing the swimming leg.

Chris Potts:Oh great. Oh, that sounds wonderful.

Roger Levy:Yeah, so...

Chris Potts:Final question. I think of you as a man of the American West, first Arizona, and then California. Now you're in Boston and Cambridge. Have you fallen in love with the East Coast?

Roger Levy:You know, I love urban density. You're asking me at the right time of year. This time of year, it's great here. I love urban density. I've lived here without a car for almost six years now, and that is really, really fun to have been able to do that. I think that's going to come to an end, because I now have a little human being to travel along with. But urban density's great. I miss California and Arizona quite a lot, but this is a really wonderful place. I've also been really pleasantly impressed with the beauty of New England. And I know that you have a soft spot for the Adirondacks.

Chris Potts:Sure. Yeah, that's right. Yeah.

Roger Levy:We will need to connect outside of this conversation on where I should go. I've gone once, and it was beautiful, but there's so much beautiful nature up here, too. It feels a little more smaller scale and intimate a lot of the time, but it's really beautiful.

Chris Potts:Well, I'm happy to be a booster for the Adirondacks, but since I'm now a Californian, let me close by just asking – so surely, there is something that you miss about California. What is it?

Roger Levy:Oh, there's so many things I miss about California. The last place I lived before I moved here was a six-minute walk from La Jolla Cove, which is a prime place for swimming. This is in San Diego. There's a giant swim lane a half a mile long in the ocean, and I would just get up first thing in the morning, walk out to La Jolla Cove, and jump into the ocean and swim for half an hour to an hour and come back, and that was an amazing way to start a day. That's one thing I miss. There's many other things too.

Chris Potts:That sounds outstanding. Thank you so much for doing this, Roger. This was great fun, very rewarding.

Roger Levy:Likewise here. It's a super fun conversation. Thank you, Chris.

CS224U: Natural Language Understanding

Podcast episode: Roger Levy

Show notes

Transcript