With just 15 seconds of audio, AI can help aphasics “regain their voice”?

OpenAI shared some of their progress in AI speech synthesis on its official website, announcing initial insights and results from a small-scale preview of a model called "Voice Engine."

According to the report, the model uses text input and a single 15-second audio sample to generate natural speech that is very similar to the original speaker . It is worth noting that a small model with only a 15-second sample can generate emotional and realistic voices.

As early as the end of 2022, OpenAI developed Voice Engine and used it to support preset voices in the text-to-speech API as well as ChatGPT voice and reading.

Today, through some real-world cases, OpenAI shared some early applications of Voice Engine.

For example, the Voice Engine was used to help restore the voice of a young patient who had lost his ability to speak fluently due to a vascular brain tumor.

In addition, Voice Engine can also be used to provide reading assistance, translate content, provide support for people who cannot speak, etc.

1) Provide reading assistance to non-readers and children through natural-sounding and emotional voices

These voices represent a wider range of speakers rather than canned voices. Age of Learning is an educational technology company that has been using Voice Engine to generate canned voice-over content. They are also using Voice Engine and GPT-4 to create real-time, personalized responses to interact with students.

2) Translate content such as videos and podcasts

Voice Engine allows creators and businesses to communicate fluently with their voices to more people around the world. According to OpenAI, HeyGen is one of the early adopters in this regard. HeyGen is an AI visual storytelling platform that uses Voice Engine for video translation, translating the speaker's voice into multiple languages and reaching a global audience. When used for translation, Voice Engine retains the native accent of the original speaker: for example, generating English with an audio sample of a French speaker will produce speech with a French accent.

3) Provide support for the mute population

Voice Engine can provide therapeutic applications for people with diseases that affect language, educational enhancements for people with learning needs, and more. Livox is an AI alternative communication application that provides support for assistive and alternative communication (AAC) devices to enable people with disabilities to communicate. Voice Engine is able to provide unique non-robotic voices in multiple languages for people who cannot speak. Users can choose the voice that best represents themselves, and for multilingual users, each spoken language can maintain a consistent voice. In addition, Voice Engine also reaches into the global community by improving basic service provision in remote areas. For example, Dimagi is developing tools for community health workers to provide various basic services such as "counseling for breastfeeding mothers." To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in each worker's primary language, including Swahili or more informal languages.

OpenAI said that due to the potential for misuse of synthetic speech, they took a cautious and informed approach to a wider release, choosing to preview but not release the technology widely at this time.

The terms they sign with these partners require explicit and informed consent from the original speakers and do not allow developers to create their own voices for individual users. These partners must also clearly disclose to the audience that the voice they hear is generated by artificial intelligence.

Additionally, OpenAI has implemented a number of security measures, including watermarking to track the origin of any audio generated by Voice Engine, and actively monitoring its usage.

OpenAI said they encourage the accelerated development and adoption of technologies that track the origin of audiovisual content in the future, so that people are always clear whether they are interacting with real people or artificial intelligence, and help the public understand the capabilities and limitations of artificial intelligence technology, including the possibility of deceptive content from artificial intelligence.

References:

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

<<: Is Shanghai’s beverage “nutrition grading” healthier?

>>: What kind of tea is Matcha? Why was it so popular that the emperor of the Song Dynasty personally promoted it?

Can eating wolfberry really nourish the kidneys?

With just 15 seconds of audio, AI can help aphasics “regain their voice”?

Can eating wolfberry really nourish the kidneys?

Where does the ice in the ice and snow world come from?

The efficacy and function of Vetiver root

These are the stars we should chase! Today, the protagonists are them →

Pregnant women can't eat crabs? This kind of crab can be eaten in moderation →

The efficacy and function of fishtail palm leaves

[Creative Cultivation Program] Why don’t you spit out the grape skins when eating grapes?

What are the effects and functions of Cynomorium songaricum

The efficacy and function of Cordyceps sinensis

AI beats human champion again! There may be new breakthroughs in the field of driverless cars →

Recommend

A woman got blisters on her skin during an MRI exam, just because she was wearing a pair of nice yoga pants?

The fruit shop owner will never tell you the secrets of choosing fruits. I will tell you all today!

Launch later, arrive earlier? The lunar orbit has many mysteries

What are the effects and uses of Polygonum multiflorum

How powerful will smart equipment be in subverting future battlefields?

What is seal oil used for?

If I have joint problems, will taking glucosamine help? Is the MSM in glucosamine toxic? Is it banned?

What are the medicinal values of Babao Jingtian?

Chinese medicine for improving female sexual function

The efficacy and function of Angelica Root

What are the specific ways to eat donkey-hide gelatin?

Today is New Year’s Eve丨This New Year’s blessing comes from 325 million kilometers away!

What is Tianchong Chinese medicine?

The efficacy and function of Xiaosusu Kehua

Fan-shaped ears, long nose, what you see is just my appearance