With just 15 seconds of audio, AI can help aphasics “regain their voice”?

With just 15 seconds of audio, AI can help aphasics “regain their voice”?

OpenAI shared some of their progress in AI speech synthesis on its official website, announcing initial insights and results from a small-scale preview of a model called "Voice Engine."

According to the report, the model uses text input and a single 15-second audio sample to generate natural speech that is very similar to the original speaker . It is worth noting that a small model with only a 15-second sample can generate emotional and realistic voices.

As early as the end of 2022, OpenAI developed Voice Engine and used it to support preset voices in the text-to-speech API as well as ChatGPT voice and reading.

Today, through some real-world cases, OpenAI shared some early applications of Voice Engine.

For example, the Voice Engine was used to help restore the voice of a young patient who had lost his ability to speak fluently due to a vascular brain tumor.

In addition, Voice Engine can also be used to provide reading assistance, translate content, provide support for people who cannot speak, etc.

Copyright images in the gallery. Reprinting and using them may lead to copyright disputes.

1) Provide reading assistance to non-readers and children through natural-sounding and emotional voices

These voices represent a wider range of speakers rather than canned voices. Age of Learning is an educational technology company that has been using Voice Engine to generate canned voice-over content. They are also using Voice Engine and GPT-4 to create real-time, personalized responses to interact with students.

2) Translate content such as videos and podcasts

Voice Engine allows creators and businesses to communicate fluently with their voices to more people around the world. According to OpenAI, HeyGen is one of the early adopters in this regard. HeyGen is an AI visual storytelling platform that uses Voice Engine for video translation, translating the speaker's voice into multiple languages ​​and reaching a global audience. When used for translation, Voice Engine retains the native accent of the original speaker: for example, generating English with an audio sample of a French speaker will produce speech with a French accent.

3) Provide support for the mute population

Voice Engine can provide therapeutic applications for people with diseases that affect language, educational enhancements for people with learning needs, and more. Livox is an AI alternative communication application that provides support for assistive and alternative communication (AAC) devices to enable people with disabilities to communicate. Voice Engine is able to provide unique non-robotic voices in multiple languages ​​for people who cannot speak. Users can choose the voice that best represents themselves, and for multilingual users, each spoken language can maintain a consistent voice. In addition, Voice Engine also reaches into the global community by improving basic service provision in remote areas. For example, Dimagi is developing tools for community health workers to provide various basic services such as "counseling for breastfeeding mothers." To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in each worker's primary language, including Swahili or more informal languages.

OpenAI said that due to the potential for misuse of synthetic speech, they took a cautious and informed approach to a wider release, choosing to preview but not release the technology widely at this time.

The terms they sign with these partners require explicit and informed consent from the original speakers and do not allow developers to create their own voices for individual users. These partners must also clearly disclose to the audience that the voice they hear is generated by artificial intelligence.

Additionally, OpenAI has implemented a number of security measures, including watermarking to track the origin of any audio generated by Voice Engine, and actively monitoring its usage.

OpenAI said they encourage the accelerated development and adoption of technologies that track the origin of audiovisual content in the future, so that people are always clear whether they are interacting with real people or artificial intelligence, and help the public understand the capabilities and limitations of artificial intelligence technology, including the possibility of deceptive content from artificial intelligence.

References:

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

<<:  Is Shanghai’s beverage “nutrition grading” healthier?

>>:  What kind of tea is Matcha? Why was it so popular that the emperor of the Song Dynasty personally promoted it?

Recommend

The efficacy and function of southern snakehead

For the Chinese medicinal material such as Snakeh...

The Nature, Flavor and Meridians of Danshen

The medicinal herb Salvia miltiorrhiza is used to...

The efficacy and function of snail shells

As people's living standards improve, they pa...

Can Polygonum multiflorum treat gray hair?

It must be a blow to have gray hair at such a you...

The efficacy and function of Perilla

There are so many medicinal herbs in the world, a...

I suddenly have a headache. Is it a serious illness?

One minute with the doctor, the postures are cons...

Can air fryers produce carcinogens? How to use them healthily

In recent years, many families have bought air fr...

What is the function of snow lotus?

At present, snow lotus is a very good health-care...

The efficacy and function of night walking

Xingye is a very good medicinal herb. It is often...

What are the benefits of using mercury to make a liquid telescope?

Astronomical telescopes are the eyes of human bei...