🧠 New technology enables almost real-time speech from brain signals
The technology can help people with severe paralysis communicate in a more natural way. The system works with different types of brain interfaces and can generate sound within 1 second after the person attempts to speak.
Share this story!
- Researchers have developed a technology that converts brain signals into speech with minimal delay.
- The technology can help people with severe paralysis communicate in a more natural way.
- The system works with different types of brain interfaces and can generate sound within 1 second after the person attempts to speak.
Rapid speech synthesis from the brain
A research team from UC Berkeley and UC San Francisco has developed a method to restore natural speech for people with severe paralysis. This technology solves the problem of latency in speech neuroprostheses, the time delay between when a person attempts to speak and when sound is produced.
Using artificial intelligence, the researchers created a streaming method that converts brain signals into audible speech in near real-time. The results were recently published in the journal Nature.
Works with multiple types of brain sensors
The researchers also showed that their method works well with various types of brain sensors, including microelectrode arrays that penetrate the brain's surface and non-invasive recordings that use sensors on the face to measure muscle activity.
“By demonstrating accurate brain-to-voice synthesis on other silent-speech datasets, we showed that this technique is not limited to one specific type of device,” says Kaylo Littlejohn, PhD student at UC Berkeley's department of electrical engineering and computer sciences and co-lead author of the study.
How the technology works
The neuroprosthesis works by collecting neural data from the motor cortex, the part of the brain that controls speech production, and then uses AI to decode brain function into speech.
"We are essentially intercepting signals where the thought is translated into articulation," explains Cheol Jun Cho, also a PhD student at UC Berkeley. "What we're decoding happens after a thought has occurred, after we've decided what to say and how to move our vocal organs."
To collect the data needed to train their algorithm, the researchers asked their test subject Ann to look at a phrase on the screen and then silently attempt to say the sentence.
Since Ann has no residual vocalization, the researchers used AI to fill in the missing details. They used a pretrained text-to-speech model to generate audio and simulate a target. They also used Ann's pre-injury voice, so when they decode the output, it sounds more like her own voice.
Speech in near real-time
In their previous study, the researchers had a long latency for decoding, about 8 seconds for a single sentence. With the new streaming approach, audible results can be generated in near real-time while the person is attempting to speak.
This higher speed did not come at the cost of precision. The faster interface delivered the same high level of decoding accuracy as their previous, non-streaming approach.
The researchers also tested the model's ability to synthesize words that were not part of the training dataset - in this case 26 rare words from the NATO phonetic alphabet, such as "Alpha," "Bravo," "Charlie" and so on. The model solved that also.
WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with WALL-Y GPT about this news article and fact-based optimism (requires the paid version of ChatGPT.)
By becoming a premium supporter, you help in the creation and sharing of fact-based optimistic news all over the world.