【Audio】02 ASRT
1 some concepts
1.1 difference of models
Model | Regularization | Strengths | Best Use Scenario |
---|---|---|---|
SpeechModel24 | High Dropout | Very robust against overfitting, good for noisy data | Datasets with high variability and prone to overfitting |
SpeechModel25 | Moderate Dropout | Balance between complexity and performance, general use | General speech recognition tasks with standard datasets |
SpeechModel251 | Moderate to High Dropout | Higher complexity, slightly more prone to overfitting than 25 | Datasets with complex patterns but not much noise |
SpeechModel251BN | Batch Normalization | Fastest convergence, stable training, efficient learning | Large and complex datasets needing quick model training |
docs:
1.2 about the asrt_config.json
here is one key part:
1
2
3
4
5
6
{
"name": "thchs30_train",
"data_list": "datalist/thchs30/train.wav.lst",
"data_path": "/data/speech_data",
"label_list": "datalist/thchs30/train.syllable.txt"
},
download_default_datalist.py:
- this script not download the real data, only the supporting files.
- this script should be run in the project root directory, because it will store to
./datalist
.
real data:
- so we need download the real data which in here.
- and store it to
/data/speech_data
.
So that mean:
- supporting files:
- run
download_default_datalist.py
in the project root directory.- for me: project root dir is
/home/vayne/.a/ASRT_SpeechRecognition
.
- for me: project root dir is
- do not change config:
data_list
,label_list
.
- run
1
2
[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/thchs30/
cv.syllable.txt cv.wav.lst test.syllable.txt test.wav.lst train.syllable.txt train.wav.lst
- real data:
- run
wget
in the data directory.- for me: data dir is
/home/vayne/.a/ASRT_SpeechRecognition/datalist
.
- for me: data dir is
- change the config:
data_path
.- for me:
/home/vayne/.a/ASRT_SpeechRecognition/datalist
- for me:
- run
1
2
[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/data_thchs30
data dev lm_phone lm_word README.TXT test train
And, if you only download one dataset, then asrt_config.json
only keep one.
Here is the my asrt_config.json
for only one dataset thchs30
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
"dict_filename": "dict.txt",
"dataset":{
"train":[
{
"name": "thchs30_train",
"data_list": "datalist/thchs30/train.wav.lst",
"data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
"label_list": "datalist/thchs30/train.syllable.txt"
}
],
"dev":[
{
"name": "thchs30_dev",
"data_list": "datalist/thchs30/cv.wav.lst",
"data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
"label_list": "datalist/thchs30/cv.syllable.txt"
}
],
"test":[
{
"name": "thchs30_test",
"data_list": "datalist/thchs30/test.wav.lst",
"data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
"label_list": "datalist/thchs30/test.syllable.txt"
}
]
}
}
2 steps
2.1 train one dataset: thchs30
my env:
- gpu card: 8 * NVIDIA TITAN RTX
- cuda: 11.5
record:
- use model: SpeechModel251BN
- dataset size: 6.1G
- one epoch size: 625
- epoch: 50
- train time:
- one epoch: 250ms/step * 625step / 1000s/ms / 60min/s = 2min
- all: 23:00 - 01:06
- end loss: about 6.9
- evaluate WER: 26.46877356550353 %
And, here is the test result of this wav:
1
2
3
(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py
{'result': '何解样有无数五胜的激情与世间合然一体层金利时养和业均每一的民天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.9976277351379395 s
2.2 add two datasets: primewords_md_2018_set1, ST-CMDS-20170001_1-OS
how to add two dataset, and train based on the before train model? here is the doc.
my env:
- gpu card: 4 * NVIDIA TITAN RTX
- cuda: 11.5
record:
- use model: SpeechModel251BN
- dataset size: 6.1G + 8.5G + 7.7G = 22.3G
- one epoch size: 9416
- epoch: 50
- train time:
- one epoch: 250ms/step * 9416step / 1000s/ms / 60min/s = 36min
- all: 11.03 - 21.07(+1d) # about 34h
- end loss: about 7.9232
- evaluate WER: 25.32841307383239 %
well, one epoch time is too long, after ask for help, i know even i set 4 gpu card, it still use one gpu card…
so i need to do a research for how to use multi gpu card.
but no time to do that, so i try “FunASR”.
by the way, after training 3 dataset, the client_http.py
result is:
1
2
3
(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py
{'result': '何写一样有吴秀舞胸的基勤雨似见横染易体曾经历时养和有研与内衣的明天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.8850913047790527 s
well, the result is also not good, and by fortunately, i found these:
- modelscope: looks a bit like HuggingFace.
- FunASR
and here is my test result use the FunASR pre-trained model directly, it obviously better.
1
[{'key': 'D21_829', 'text': '和血一样有无数无声的激情与世界浑然一体曾经历史养活又孕育美丽的明天'}]
So, i will try to use FunASR.
3 docs
about fine-tuning:
100 problems
100.1 base env problem
100.1.1 libcudart.so.11.0 error
error:
1
2
2029-07-25 20:25:07.305613: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libc
udart.so.11.0: cannot open shared object file: No such file or directory
after install pip packages, start run python3 train_speech_model.py
, got this error.
https://blog.csdn.net/qq_44703886/article/details/112393149
1
2
3
$ grep -A1 PATH ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
100.1.2 install cuda record
install cuda 11.5 on ubuntu 18.04.
apt install got the follow error:
1
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.
doc:
- https://stackoverflow.com/questions/30768828/e-error-pkgproblemresolverresolve-generated-breaks-this-may-be-caused-by-he
try aptitude install cuda
, the soluation it gave is uninstall multiple packages. I can’t do that.
then try to install cuda manually.
download and cuda 11.5 from nvidia website.
1
2
wget https://developer.download.nvidia.cn/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh ./cuda_11.5.0_495.29.05_linux.run --toolkit --silent --override
doc:
- https://developer.nvidia.com/cuda-11-5-0-download-archive
- https://askubuntu.com/questions/1211919
100.1.3 libcudnn.so.8 error
error:
1
2029-07-25 21:54:22.043634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64
doc:
- https://stackoverflow.com/questions/66977227/could-not-load-dynamic-library-libcudnn-so-8-when-running-tensorflow-on-ubun
1
apt install libcudnn8 libcudnn8-dev