A text-to-speech tool

environment

ubuntu 18.04 64bit
Nvidia GTX 1070Ti 8G

Introduction

Tortoise is an open source Text-To-Speech program with powerful text-to-speech capabilities and highly realistic voice and intonation.

to build

Create a brand new python virtual environment

 conda create -n tts python=3.8 conda activate tts

Then, pull the source code and install dependencies

 git clone https://github.com/neonbjb/tortoise- tts .git cd tortoise-tts pip install -r requirements.txt python setup.py install

test

Convert a single sentence of text to speech

 python tortoise/do_tts.py --text "I'm going to speak this" --voice random --preset fast

After the script is executed successfully, 3 audio wav files will be generated in the folder results . The sounds are randomly matched. You can listen to the generated audio and feel the effect.

https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_0.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_2.wav

All available voices in the current system are stored under the directory tortoise/voices . If you like someone’s voice, you can specify it in the script parameters. The effect of starting with train_ will be better

 python tortoise/do_tts.py --text "I'm going to speak this" --voice tom --preset fast

Feel the sound of tom version

https://xugaoxiang.com/wp-content/uploads/2023/03/tom_0_0.wav https://xugaoxiang.com/wp-content/uploads/2023/03/tom_0_1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/tom_0_2.wav

If you have a lot of text to process, you can put them in a text file, such as

 Hello world. Hello Rust. Nice to meet you.

then execute the script

 python tortoise/read.py --textfile test.txt --voice random

The script breaks down the text file into individual sentences and converts each to speech. After all the statements are generated, combine them into one file and output

https://xugaoxiang.com/wp-content/uploads/2023/03/0.wav

Finally, let’s take a look at the performance of Chinese

 python tortoise/do_tts.py --text "你好，世界" --voice random --preset fast

https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_0-1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_1-1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_2-1.wav

This effect is too bad, look at issues , https://github.com/neonbjb/tortoise-tts/issues/5 , currently the official does not support other languages, you need to train wav2vec model by yourself

custom sound

If you want to add a specific sound to tortoise , you need the following steps

Collect audio clips of specific people
Organize the audio into small clips of about 10 seconds, at least 3 clips are needed, the more the better
Audio clips use wav format, sampling rate 22050
Create a new folder under the directory tortoise/voices , name it with the name of the voice, for easy memory, such as zhangsan , and then copy the wav files organized above into it
The final use is to specify the parameter --voice as zhangsan in the script

Model download

During the running of the script, a bunch of model files will be downloaded from huggingface site, packaged here, stored in the cloud disk, and picked up after replying to the article (content is optional)

Link: https://pan.baidu.com/s/1EJD4N2yamDNh6X_0GtoaRQ

Extract code: 3qrq

After downloading, unzip and copy to the directory ~/.cache , the file structure is as follows

This article is transferred from https://xugaoxiang.com/2023/03/02/tortoise-tts/
This site is only for collection, and the copyright belongs to the original author.