The more the AI evolves, the more it resembles the human brain!

AI learns to work like a human brain
Briefly, in this study, the researchers focused on the problem of speech processing, comparing the self-supervised model Wav2Vec 2.0 with the brain activity of 412 volunteers.

Of the 412 volunteers, 351 spoke English, 28 spoke French and 33 spoke Chinese. The researchers listened to the audiobook for about an hour and recorded their brain activity with fMRI during the process.
On the model side, the researchers trained Wav2Vec 2.0 with over 600 hours of unlabeled speech.

Corresponding to the native language of the volunteers, the models are also divided into three types: English, French, and Chinese, and another model is trained with a non-voice acoustic scene dataset.

Then these models also listened to the same audiobook from volunteers. The researchers extracted the activations of the model from it.

From the results, self-supervised learning does allow Wav2Vec 2.0 to generate brain-like representations of speech.

As you can see from the graph above, in the primary and secondary auditory cortex, the AI clearly predicted brain activity in almost all cortical regions.

The researchers also further discovered which layer of AI’s “auditory cortex” and “prefrontal cortex” grows.

The auditory cortex best fits the Transformer’s first layer (blue), while the prefrontal cortex best fits the Transformer’s deepest layer (red).

In addition, the researchers quantified the differences in human ability to perceive native and non-native phonemes and compared them with the Wav2Vec 2.0 model.

They found that AI, like humans, has a stronger ability to discriminate “native languages”, for example, French models are more likely to perceive stimuli from French than English models.

The above results demonstrate that 600 hours of self-supervised learning is sufficient for Wav2Vec 2.0 to learn language-specific representations—comparable to the “volume of data” that babies are exposed to during the process of learning to speak.

Reignites discussion in neuroscience and AI community
For this research, some scholars believe that it has indeed made some new breakthroughs.

For example, according to Jesse Engel from Google Brain, this research takes visualization filters to a new level.

Now, not only can you see what they look like in “pixel space”, but you can also simulate what they look like in “brain-like space”:
He argues that the study doesn’t really prove that it measures “speech processing.”

Compared with the speed of human speech, the speed of fMRI measuring signals is actually very slow, so it is unscientific to conclude that “Wav2vec 2.0 has learned the behavior of the brain”.

Of course, Patrick Mineault says he’s not denying the study’s point of view, and he’s “one of the authors’ fans,” but the study should give some more convincing data.

In addition, some netizens believe that the input of Wav2vec and the human brain is not the same, one is the processed waveform, but the other is the original waveform.