Neuroscientists find inner workings of next word prediction models resemble that of language processing centers in the brain


Over the past few years, artificial intelligence language models have become very good at certain tasks. In particular, they are good at predicting the next word in a text string; this technology helps search engines and texting applications predict the next word you are about to type.

The most recent generation of predictive language models also seem to be learning something about the underlying meaning of language. These models can not only predict the next word, but also perform tasks that seem to require some degree of actual understanding, such as answering questions, synthesizing documents, and completing a story.

Such models have been designed to optimize performance for the specific function of text prediction, without attempting to mimic anything about how the human brain performs this task or understands language. But a new study from neuroscientists at MIT suggests that the underlying function of these models resembles the function of language processing centers in the human brain.

Computer models that work well on other types of language tasks do not show this similarity to the human brain, offering evidence that the human brain can use next word prediction to drive language processing.

“The better the model is at predicting the next word, the better it matches the human brain,” says Nancy Kanwisher, Walter A. Rosenblith Professor of Cognitive Neuroscience, member of the McGovern Institute for Brain Research and MIT’s Center for Brains, Spirits and machines (CBMM) and author of the new study. “It’s amazing that the models fit so well, and it suggests very indirectly that maybe what the human language system is doing is predicting what’s going to happen next.”

Joshua Tenenbaum, professor of cognitive computational science at MIT and member of CBMM and MIT’s Artificial Intelligence Laboratory (CSAIL); and Evelina Fedorenko, Frederick A. and Carole J. Middleton Associate Professor of Career Development in Neuroscience and a member of the McGovern Institute, are the lead authors of the study, which appears this week in the Proceedings of the National Academy of Sciences. Martin Schrimpf, an MIT graduate student who works at CBMM, is the first author of the article.

Make predictions

The new, high performing next word prediction models belong to a class of models called deep neural networks. These networks contain computational “nodes” that form connections of varying strength and layers that transmit information between them in prescribed ways.

Over the past decade, scientists have used deep neural networks to create vision models capable of recognizing objects as well as the brains of primates. Research at MIT has also shown that the underlying function of visual object recognition models matches the organization of the visual cortex of primates, even though these computer models were not specifically designed to mimic the brain.

In the new study, the MIT team used a similar approach to compare language processing centers in the human brain with models of language processing. The researchers analyzed 43 different language models, including several optimized for next word prediction. These include a template called GPT-3 (Generative Pre-trained Transformer 3), which, following a prompt, can generate text similar to what a human would produce. Other models were designed to perform different linguistic tasks, such as filling in a blank in a sentence.

As each model was presented with a string of words, the researchers measured the activity of the nodes that make up the network. They then compared these patterns to human brain activity, measured in subjects performing three language tasks: listening to stories, reading sentences one by one, and reading sentences in which one word is revealed at a time. These human data sets included functional magnetic resonance (fMRI) data and intracranial electrocorticographic measurements taken in people undergoing brain surgery for epilepsy.

They found that the best-performing next-word prediction models had activity patterns very similar to those seen in the human brain. Activity in these same models was also strongly correlated with measures of human behavior such as how quickly people were able to read text.

“We found that models that predict neural responses well also tend to predict responses of human behavior better, in the form of reading time. And then both are explained by the model’s performance on word prediction. next. This triangle really connects everything. together, “says Schrimpf.

Game changer

One of the main computational features of predictive models such as GPT-3 is an element known as the direct unidirectional predictive transformer. This type of transformer is able to make predictions about what will come next, based on the previous sequences. An important feature of this transformer is that it can make predictions based on a very long prior context (hundreds of words), not just the last words.

Scientists have not found any brain circuitry or learning mechanism that matches this type of treatment, Tenenbaum says. However, the new findings are consistent with previously proposed assumptions that prediction is one of the key functions of language processing, he says.

“One of the challenges of language processing is the real-time aspect of it,” he says. “The language comes into play, and you have to follow it and be able to make sense of it in real time. “

Researchers now plan to create variations of these language processing models to see how small changes in their architecture affect their performance and their ability to adapt to human neural data.

“For me, this result was a game changer,” Fedorenko said. “It totally transforms my research agenda, because I wouldn’t have predicted that in my lifetime we would come up with these computationally explicit models that capture the brain enough that we can actually harness them to understand how the brain works.”

The researchers also plan to try to combine these high-performing language models with some computer models that Tenenbaum’s lab has already developed that can perform other types of tasks such as building perceptual representations of the physical world.

“If we are able to understand what these language models do and how they can connect to models that do things that are more like perception and thought, then that can give us more integrative models of how whose things work in the brain, ”says Tenenbaum. . “This could lead us to better models of artificial intelligence, as well as give us better models of how more of the brain works and the emergence of general intelligence, than we have had in the past. pass.”

The research was funded by a Takeda grant; the MIT Shoemaker scholarship; the Semiconductor Research Corporation; the MIT Media Lab consortia; the MIT Singleton Fellowship; the MIT Presidential Graduate Fellowship; the Friends of the McGovern Institute scholarship; the MIT Center for Brains, Minds, and Machines, through the National Science Foundation; national institutes of health; the Department of Brain and Cognitive Sciences at MIT; and the McGovern Institute.

The other authors of the article are Idan Blank PhD ’16 and graduate students Greta Tuckute, Carina Kauf and Eghbal Hosseini.

Source link


About Author

Comments are closed.