Post

【Audio】05 TTS

【Audio】05 TTS

1 fishspeech

python: 3.9.16 kernel: 5.4.0-42-generic

doc: https://speech.fish.audio/samples/

it requires python 3.10, after i installed python3.10, and install env refer to the document, and start it, it shows nvidia-driver version is too low, i don’t know how to solve it, so i ignore it for now.

2 sambert

2.1 inference directly

python: 3.8.19 kernel: 5.4.0-42-generic

https://www.modelscope.cn/models/iic/speech_sambert-hifigan_tts_zh-cn_16k

1
2
pip install torch==1.11.0 "numpy<2" torchvision==0.12.0 torchaudio==0.11.0 modelscope[framework] matplotlib pytorch_wavelets tensorboardX
pip install kantts -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
1
2
3
4
5
6
7
8
9
10
11
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

text = '待合成文本'
model_id = 'damo/speech_sambert-hifigan_tts_zh-cn_16k'
sambert_hifigan_tts = pipeline(task=Tasks.text_to_speech, model=model_id)
output = sambert_hifigan_tts(input=text, voice='zhitian_emo')
wav = output[OutputKeys.OUTPUT_WAV]
with open('output.wav', 'wb') as f:
    f.write(wav)
This post is licensed under CC BY 4.0 by the author.