【Audio】02 ASRT

Posted Jul 25, 2024

By FWalkerVayne

5 min read

【Audio】02 ASRT

1 some concepts

1.1 difference of models

Model	Regularization	Strengths	Best Use Scenario
SpeechModel24	High Dropout	Very robust against overfitting, good for noisy data	Datasets with high variability and prone to overfitting
SpeechModel25	Moderate Dropout	Balance between complexity and performance, general use	General speech recognition tasks with standard datasets
SpeechModel251	Moderate to High Dropout	Higher complexity, slightly more prone to overfitting than 25	Datasets with complex patterns but not much noise
SpeechModel251BN	Batch Normalization	Fastest convergence, stable training, efficient learning	Large and complex datasets needing quick model training

docs:

1.2 about the `asrt_config.json`

here is one key part:

  
{
    "name": "thchs30_train",
    "data_list": "datalist/thchs30/train.wav.lst",
    "data_path": "/data/speech_data",
    "label_list": "datalist/thchs30/train.syllable.txt"
},

download_default_datalist.py:

this script not download the real data, only the supporting files.
this script should be run in the project root directory, because it will store to ./datalist.

real data:

so we need download the real data which in here.
and store it to /data/speech_data.

So that mean:

supporting files:
- run download_default_datalist.py in the project root directory.
  - for me: project root dir is /home/vayne/.a/ASRT_SpeechRecognition.
- do not change config: data_list, label_list.

[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/thchs30/
cv.syllable.txt  cv.wav.lst  test.syllable.txt  test.wav.lst  train.syllable.txt  train.wav.lst

real data:
- run wget in the data directory.
  - for me: data dir is /home/vayne/.a/ASRT_SpeechRecognition/datalist.
- change the config: data_path.
  - for me: /home/vayne/.a/ASRT_SpeechRecognition/datalist

[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/data_thchs30
data  dev  lm_phone  lm_word  README.TXT  test  train

And, if you only download one dataset, then asrt_config.json only keep one.

Here is the my asrt_config.json for only one dataset thchs30:

  
{
    "dict_filename": "dict.txt",

    "dataset":{
        "train":[
            {
                "name": "thchs30_train",
                "data_list": "datalist/thchs30/train.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/train.syllable.txt"
            }
        ],

        "dev":[
            {
                "name": "thchs30_dev",
                "data_list": "datalist/thchs30/cv.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/cv.syllable.txt"
            }
        ],

        "test":[
            {
                "name": "thchs30_test",
                "data_list": "datalist/thchs30/test.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/test.syllable.txt"
            }
        ]
    }
}

2 steps

2.1 train one dataset: thchs30

my env:

gpu card: 8 * NVIDIA TITAN RTX
cuda: 11.5

record:

use model: SpeechModel251BN
dataset size: 6.1G
one epoch size: 625
epoch: 50
train time:
- one epoch: 250ms/step * 625step / 1000s/ms / 60min/s = 2min
- all: 23:00 - 01:06
end loss: about 6.9
evaluate WER: 26.46877356550353 %

And, here is the test result of this wav:

(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py
{'result': '何解样有无数五胜的激情与世间合然一体层金利时养和业均每一的民天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.9976277351379395 s

2.2 add two datasets: primewords_md_2018_set1, ST-CMDS-20170001_1-OS

how to add two dataset, and train based on the before train model? here is the doc.

my env:

gpu card: 4 * NVIDIA TITAN RTX
cuda: 11.5

record:

use model: SpeechModel251BN
dataset size: 6.1G + 8.5G + 7.7G = 22.3G
one epoch size: 9416
epoch: 50
train time:
- one epoch: 250ms/step * 9416step / 1000s/ms / 60min/s = 36min
- all: 11.03 - 21.07(+1d) # about 34h
end loss: about 7.9232
evaluate WER: 25.32841307383239 %

well, one epoch time is too long, after ask for help, i know even i set 4 gpu card, it still use one gpu card…

so i need to do a research for how to use multi gpu card.

but no time to do that, so i try “FunASR”.

by the way, after training 3 dataset, the client_http.py result is:

(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py 
{'result': '何写一样有吴秀舞胸的基勤雨似见横染易体曾经历时养和有研与内衣的明天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.8850913047790527 s

well, the result is also not good, and by fortunately, i found these:

modelscope: looks a bit like HuggingFace.
FunASR

and here is my test result use the FunASR pre-trained model directly, it obviously better.

[{'key': 'D21_829', 'text': '和血一样有无数无声的激情与世界浑然一体曾经历史养活又孕育美丽的明天'}] 

So, i will try to use FunASR.

3 docs

about fine-tuning:

100 problems

100.1 base env problem

100.1.1 libcudart.so.11.0 error

error:

2029-07-25 20:25:07.305613: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libc
udart.so.11.0: cannot open shared object file: No such file or directory 

after install pip packages, start run python3 train_speech_model.py, got this error.

https://blog.csdn.net/qq_44703886/article/details/112393149

$ grep -A1 PATH ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

100.1.2 install cuda record

install cuda 11.5 on ubuntu 18.04.

apt install got the follow error:

E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

doc:

https://stackoverflow.com/questions/30768828/e-error-pkgproblemresolverresolve-generated-breaks-this-may-be-caused-by-he

try aptitude install cuda, the soluation it gave is uninstall multiple packages. I can’t do that.

then try to install cuda manually.

download and cuda 11.5 from nvidia website.

  
wget https://developer.download.nvidia.cn/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh ./cuda_11.5.0_495.29.05_linux.run  --toolkit --silent --override

doc:

https://developer.nvidia.com/cuda-11-5-0-download-archive
https://askubuntu.com/questions/1211919

100.1.3 libcudnn.so.8 error

error:

2029-07-25 21:54:22.043634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64

doc:

https://stackoverflow.com/questions/66977227/could-not-load-dynamic-library-libcudnn-so-8-when-running-tensorflow-on-ubun

apt install libcudnn8 libcudnn8-dev

AI, Audio

ASRT

This post is licensed under CC BY 4.0 by the author.

1 some concepts

1.1 difference of models

1.2 about the asrt_config.json

2 steps

2.1 train one dataset: thchs30

2.2 add two datasets: primewords_md_2018_set1, ST-CMDS-20170001_1-OS

3 docs

100 problems

100.1 base env problem

100.1.1 libcudart.so.11.0 error

100.1.2 install cuda record

100.1.3 libcudnn.so.8 error

Trending Tags

1.2 about the `asrt_config.json`