Post

【Audio】02 ASRT

【Audio】02 ASRT

1 some concepts

1.1 difference of models

ModelRegularizationStrengthsBest Use Scenario
SpeechModel24High DropoutVery robust against overfitting, good for noisy dataDatasets with high variability and prone to overfitting
SpeechModel25Moderate DropoutBalance between complexity and performance, general useGeneral speech recognition tasks with standard datasets
SpeechModel251Moderate to High DropoutHigher complexity, slightly more prone to overfitting than 25Datasets with complex patterns but not much noise
SpeechModel251BNBatch NormalizationFastest convergence, stable training, efficient learningLarge and complex datasets needing quick model training

docs:

1.2 about the asrt_config.json

here is one key part:

1
2
3
4
5
6
{
    "name": "thchs30_train",
    "data_list": "datalist/thchs30/train.wav.lst",
    "data_path": "/data/speech_data",
    "label_list": "datalist/thchs30/train.syllable.txt"
},

download_default_datalist.py:

  • this script not download the real data, only the supporting files.
  • this script should be run in the project root directory, because it will store to ./datalist.

real data:

  • so we need download the real data which in here.
  • and store it to /data/speech_data.

So that mean:

  • supporting files:
    • run download_default_datalist.py in the project root directory.
      • for me: project root dir is /home/vayne/.a/ASRT_SpeechRecognition.
    • do not change config: data_list, label_list.
1
2
[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/thchs30/
cv.syllable.txt  cv.wav.lst  test.syllable.txt  test.wav.lst  train.syllable.txt  train.wav.lst
  • real data:
    • run wget in the data directory.
      • for me: data dir is /home/vayne/.a/ASRT_SpeechRecognition/datalist.
    • change the config: data_path.
      • for me: /home/vayne/.a/ASRT_SpeechRecognition/datalist
1
2
[vayne@localhost ~/.a/ASRT_SpeechRecognition]$ ls datalist/data_thchs30
data  dev  lm_phone  lm_word  README.TXT  test  train

And, if you only download one dataset, then asrt_config.json only keep one.

Here is the my asrt_config.json for only one dataset thchs30:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
    "dict_filename": "dict.txt",

    "dataset":{
        "train":[
            {
                "name": "thchs30_train",
                "data_list": "datalist/thchs30/train.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/train.syllable.txt"
            }
        ],

        "dev":[
            {
                "name": "thchs30_dev",
                "data_list": "datalist/thchs30/cv.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/cv.syllable.txt"
            }
        ],

        "test":[
            {
                "name": "thchs30_test",
                "data_list": "datalist/thchs30/test.wav.lst",
                "data_path": "/home/vayne/.a/ASRT_SpeechRecognition/datalist",
                "label_list": "datalist/thchs30/test.syllable.txt"
            }
        ]
    }
}

2 steps

2.1 train one dataset: thchs30

my env:

  • gpu card: 8 * NVIDIA TITAN RTX
  • cuda: 11.5

record:

  • use model: SpeechModel251BN
  • dataset size: 6.1G
  • one epoch size: 625
  • epoch: 50
  • train time:
    • one epoch: 250ms/step * 625step / 1000s/ms / 60min/s = 2min
    • all: 23:00 - 01:06
  • end loss: about 6.9
  • evaluate WER: 26.46877356550353 %

And, here is the test result of this wav:

1
2
3
(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py
{'result': '何解样有无数五胜的激情与世间合然一体层金利时养和业均每一的民天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.9976277351379395 s

2.2 add two datasets: primewords_md_2018_set1, ST-CMDS-20170001_1-OS

how to add two dataset, and train based on the before train model? here is the doc.

my env:

  • gpu card: 4 * NVIDIA TITAN RTX
  • cuda: 11.5

record:

  • use model: SpeechModel251BN
  • dataset size: 6.1G + 8.5G + 7.7G = 22.3G
  • one epoch size: 9416
  • epoch: 50
  • train time:
    • one epoch: 250ms/step * 9416step / 1000s/ms / 60min/s = 36min
    • all: 11.03 - 21.07(+1d) # about 34h
  • end loss: about 7.9232
  • evaluate WER: 25.32841307383239 %

well, one epoch time is too long, after ask for help, i know even i set 4 gpu card, it still use one gpu card…

so i need to do a research for how to use multi gpu card.

but no time to do that, so i try “FunASR”.

by the way, after training 3 dataset, the client_http.py result is:

1
2
3
(asrt) [vayne@localhost ~/.a/ASRT_SpeechRecognition]$ python3 ./client_http.py 
{'result': '何写一样有吴秀舞胸的基勤雨似见横染易体曾经历时养和有研与内衣的明天', 'status_code': 200000, 'status_message': 'all level'}
time: 1.8850913047790527 s

well, the result is also not good, and by fortunately, i found these:

and here is my test result use the FunASR pre-trained model directly, it obviously better.

1
[{'key': 'D21_829', 'text': '和血一样有无数无声的激情与世界浑然一体曾经历史养活又孕育美丽的明天'}] 

So, i will try to use FunASR.

3 docs

about fine-tuning:

100 problems

100.1 base env problem

100.1.1 libcudart.so.11.0 error

error:

1
2
2029-07-25 20:25:07.305613: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libc
udart.so.11.0: cannot open shared object file: No such file or directory 

after install pip packages, start run python3 train_speech_model.py, got this error.

https://blog.csdn.net/qq_44703886/article/details/112393149

1
2
3
$ grep -A1 PATH ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

100.1.2 install cuda record

install cuda 11.5 on ubuntu 18.04.

apt install got the follow error:

1
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

doc:

  • https://stackoverflow.com/questions/30768828/e-error-pkgproblemresolverresolve-generated-breaks-this-may-be-caused-by-he

try aptitude install cuda, the soluation it gave is uninstall multiple packages. I can’t do that.

then try to install cuda manually.

download and cuda 11.5 from nvidia website.

1
2
wget https://developer.download.nvidia.cn/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sh ./cuda_11.5.0_495.29.05_linux.run  --toolkit --silent --override

doc:

  • https://developer.nvidia.com/cuda-11-5-0-download-archive
  • https://askubuntu.com/questions/1211919

100.1.3 libcudnn.so.8 error

error:

1
2029-07-25 21:54:22.043634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64

doc:

  • https://stackoverflow.com/questions/66977227/could-not-load-dynamic-library-libcudnn-so-8-when-running-tensorflow-on-ubun
1
apt install libcudnn8 libcudnn8-dev
This post is licensed under CC BY 4.0 by the author.