Neural networks see the world differently than humans

Human sensory systems are very good at recognizing objects we see or words we hear, even if the object is upside down or the word is spoken by a voice we’ve never heard.

Computational models known as deep neural networks can be trained to do the same, to correctly recognize a picture of a dog regardless of its coat color, or a word regardless of the speaker’s voice pitch. But a new study by MIT neuroscientists found that these patterns often also respond to images or words that bear no resemblance to the target.

When these neural networks were used to generate an image or word that they responded to in the same way as a specific natural input, such as a picture of a bear, most of them generated images or sounds that were unrecognizable to human observers. This suggests that these patterns develop their own distinctive “invariances”—meaning they respond in the same way to stimuli with very different properties.

Want more breaking news?

Subscribe Technology networksA daily newsletter that delivers the latest science news straight to your inbox every day.

Subscribe for FREE

The findings give researchers a new way to assess how well these models mimic the organization of human sensory perception, says Josh McDermott, an associate professor of brain and cognitive sciences at MIT and a member of MIT’s McGovern Brain Research Institute and Brain Center. , Minds and Machines.

“This paper shows that you can use these patterns to get unnatural signals that are ultimately very diagnostic of pattern representations,” says McDermott, senior author of the study. “This test should become part of the tests we as a field use to evaluate models.”

Jenelle Feather PhD ’22, who is now a research associate at the Flatiron Institute’s Center for Computational Neuroscience, is the lead author of the open access article. appears today Nature Neuroscience. The paper is also co-authored by Guillaume Leclerc, a graduate student at MIT, and Aleksandar Mądry, a professor of computing at MIT’s cadence Design Systems.

Different perceptions

In recent years, researchers have developed deep neural networks that can analyze millions of inputs (sounds or images) and learn common features that allow them to classify a target word or object with about as much accuracy as humans do. These models are currently considered the main models of biological sensing systems.

It is thought that when the human sensory system makes such a classification, it learns to ignore features unrelated to the object’s basic identity, such as how much light it is illuminated by or from which angle it is viewed. This is known as invariance, which means that objects are perceived as the same even if their less important properties differ.

“Classically, we think of sensory systems as creating invariances for all the sources of variation that different samples of the same thing might have,” says Feather. “The organism has to recognize that they are the same thing, even if they appear as very different sensory signals.

The researchers wondered whether deep neural networks trained to perform classification tasks could produce similar invariants. To answer this question, they used these models to create stimuli that produced the same response within the model as the sample stimulus that the researchers gave the model.

They call these stimuli “model metamers,” reviving the idea from classical perceptual research that stimuli that are intrinsic to the system can be used to diagnose its invariants. The concept of metamers was originally developed in the study of human perception to describe colors that appear identical even though they are composed of different wavelengths of light.

To their surprise, the researchers found that most of the images and sounds created in this way looked and sounded exactly like the models originally presented. Most of the images were a jumble of random-looking pixels, and the sounds resembled unintelligible noise. When the researchers showed the images to human observers, in most cases the people did not assign the model-synthesized images to the same category as the original target example.

“People don’t really recognize them. They don’t look or sound natural, and they don’t have interpretable features by which a person can classify an object or word,” says Feather.

The findings suggest that the models have somehow developed their own invariances that differ from those found in human perceptual systems. As a result, models perceive pairs of stimuli as the same, despite being very different from humans.

Idiosyncratic invariants

The researchers found the same effect for many different models of vision and hearing. However, each of these models seemed to develop its own unique invariants. When metamers from one model were shown to another model, the metamers were as unrecognizable to the second model as they were to human observers.

“The main takeaway from this is that these models seem to have what we call idiosyncratic invariances,” says McDermott. “They’ve learned to be invariant to these particular dimensions in the stimulus space, and that’s specific to the model, so other models don’t have the same invariants.”

The researchers also found that they could make the pattern’s metamers more recognizable to humans using a technique called adversarial training. This approach was originally developed to combat another limitation of object recognition models, which is that introducing small, almost imperceptible changes to an image can cause the model to misrecognize it.

The researchers found that counter-training, which involved adding some of these slightly altered images to the training data, produced models whose metamers were better recognized by humans, although they were still not as recognizable as the original stimuli. This improvement appears to be independent of the effects of training on the models’ ability to resist counterattacks, the researchers say.

“This particular form of training has a big effect, but we don’t really know why it has that effect,” says Feather. “This is an area for future research.”

Analyzing metamers generated by computational models can be a useful tool for assessing how well a computational model mimics the basic organization of human sensory perception systems, the researchers say.

“It’s a behavioral test that you can run on a particular model to see if the invariance distributions between the model and human observers are distributed,” says Feather. “It could also be used to estimate how many special invariants are present in a given model, which could help reveal possible ways to improve our models in the future.”

Link: Feather J, Leclerc G, Mądry A, McDermott JH. Model metamers reveal different invariants of biological and artificial neural networks. Nat Neurosci. in 2023 doi: 10.1038/s41593-023-01442-0

This article has been republished from the following materials. Note: Material may be edited for length and content. Please contact the source for more information.

Godfrey Kemp

"Bacon fanatic. Social media enthusiast. Music practitioner. Internet scholar. Incurable travel advocate. Wannabe web junkie. Coffeeaholic. Alcohol fanatic."