🎧 Gutenberg puts 5000 audiobooks online for free using synthetic speech

Thousands of free e-books are now available as audiobooks thanks to a new system that uses neural text-to-speech The system can customize the speaking speed, style, emotion, and voice of the audiobooks. It can also detect and skip irrelevant text such as tables, footnotes, and page numbers.

Warp Editorial Staff 23.Sep.2023 2 min read

Share this story!

Project Gutenberg, a non-profit organization that promotes literacy, has made 5,000 audiobooks available for free online, thanks to a new system that uses neural text-to-speech.
The system can customize the speaking speed, style, emotion, and voice of the audiobooks.
It can also detect and skip irrelevant text such as tables, footnotes, and page numbers.

Project Gutenberg, a website that offers over 60,000 free e-books, has added a new feature that allows users to listen to audiobooks generated by a synthetic speech system.

The system, developed by researchers from Microsoft Research and the University of Washington, can create high-quality audiobooks from online e-books in a matter of minutes.

How does it work?

The system uses a combination of recent advances in neural text-to-speech, emotive reading, scalable computing, and automatic detection of relevant text.

Neural text-to-speech is a technique that uses deep neural networks to synthesize natural-sounding speech from text.

Emotive reading is a technique that adds emotional intonation to the speech based on the context and sentiment of the text. Scalable computing enables distributed orchestration of the entire audiobook creation process using SynapseML, a scalable machine learning framework. Automatic detection of relevant text identifies and skips text that would not be relevant for audio readers, such as tables, footnotes, page numbers, and illustrations.

The system can also customize the audiobooks according to the user’s preferences. Users can choose the speaking speed and style of the audiobooks, such as fast or slow, formal or casual. They can also choose the emotional intonation of the audiobooks, such as happy or sad, calm or excited. It is even possible to match the voice of the audiobooks to their own voice or to a desired voice using only a few seconds of example sound.

What are the benefits?

The system has several benefits for both readers and authors. For readers, it can make literature more accessible and engaging. Audiobooks can allow existing readers to enjoy content on the go, and can help make content accessible to communities such as children, the visually impaired, and new language learners. Audiobooks can also enhance reader engagement by adding emotion and personality to the text.

For authors, the system can increase their reach and impact. Authors can publish their e-books as audiobooks without any additional cost or effort. Authors can also reach new audiences who prefer listening to reading or who have difficulty accessing printed or digital books.

Where can you find them?

The system has contributed over 5000 open-license audiobooks totaling approximately 35,000 hours of speech to Project Gutenberg. Users can browse and listen to the audiobooks on the website or download them for offline listening.

The researchers have also published a paper describing their system and its evaluation. The paper shows that their system can generate audiobooks that are comparable or superior to human narration in terms of naturalness, intelligibility, and emotion.

They hope that their system will inspire more people to enjoy literature and foster a culture of reading and listening. They also hope that their system will encourage more authors to publish their works online and share them with the world.

WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.