environment
- ubuntu 18.04 64bit
- Nvidia GTX 1070Ti 8G
Introduction
Tortoise
is an open source Text-To-Speech
program with powerful text-to-speech capabilities and highly realistic voice and intonation.
to build
Create a brand new python
virtual environment
conda create -n tts python=3.8 conda activate tts
Then, pull the source code and install dependencies
git clone https://github.com/neonbjb/tortoise- tts .git cd tortoise-tts pip install -r requirements.txt python setup.py install
test
Convert a single sentence of text to speech
python tortoise/do_tts.py --text "I'm going to speak this" --voice random --preset fast
After the script is executed successfully, 3 audio wav
files will be generated in the folder results
. The sounds are randomly matched. You can listen to the generated audio and feel the effect.
All available voices in the current system are stored under the directory tortoise/voices
. If you like someone’s voice, you can specify it in the script parameters. The effect of starting with train_
will be better
python tortoise/do_tts.py --text "I'm going to speak this" --voice tom --preset fast
Feel the sound of tom
version
If you have a lot of text to process, you can put them in a text file, such as
Hello world. Hello Rust. Nice to meet you.
then execute the script
python tortoise/read.py --textfile test.txt --voice random
The script breaks down the text file into individual sentences and converts each to speech. After all the statements are generated, combine them into one file and output
https://xugaoxiang.com/wp-content/uploads/2023/03/0.wavFinally, let’s take a look at the performance of Chinese
python tortoise/do_tts.py --text "你好,世界" --voice random --preset fast
https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_0-1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_1-1.wav https://xugaoxiang.com/wp-content/uploads/2023/03/random_0_2-1.wav
This effect is too bad, look at issues
, https://github.com/neonbjb/tortoise-tts/issues/5 , currently the official does not support other languages, you need to train wav2vec
model by yourself
custom sound
If you want to add a specific sound to tortoise
, you need the following steps
- Collect audio clips of specific people
- Organize the audio into small clips of about 10 seconds, at least 3 clips are needed, the more the better
- Audio clips use
wav
format, sampling rate 22050 - Create a new folder under the directory
tortoise/voices
, name it with the name of the voice, for easy memory, such aszhangsan
, and then copy thewav
files organized above into it - The final use is to specify the parameter
--voice
aszhangsan
in the script
Model download
During the running of the script, a bunch of model files will be downloaded from huggingface
site, packaged here, stored in the cloud disk, and picked up after replying to the article (content is optional)
Link: https://pan.baidu.com/s/1EJD4N2yamDNh6X_0GtoaRQ
Extract code: 3qrq
After downloading, unzip and copy to the directory ~/.cache
, the file structure is as follows
This article is transferred from https://xugaoxiang.com/2023/03/02/tortoise-tts/
This site is only for collection, and the copyright belongs to the original author.