Post

【Audio】04 KWS

【Audio】04 KWS

1 first try kws

1.1 prepare env

python: 3.9.16 kernel: 5.4.0-42-generic

1
2
3
git clone https://www.modelscope.cn/iic/speech_charctc_kws_phone-xiaoyun.git
pip install torch==1.11.0 "numpy<2" torchvision==0.12.0 torchaudio==0.11.0 simplejson python-dateutil sortedcontainers addict modelscope[framework] matplotlib
pip install kwsbp -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

1.2 inference directly

1
2
3
4
5
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
kwsbp_16k_pipline = pipeline(task=Tasks.keyword_spotting, model='./speech_charctc_kws_phone-xiaoyun')
kws_result = kwsbp_16k_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/KWS/pos_testset/kws_xiaoyunxiaoyun.wav')
print(kws_result)

1.3 finetune demo

1.3.1 train a new version model

1
pip install tensorboardX matplotlib kaldiio
1
2
cd unittest
python ./example_kws.py

then will get these under test_kws_training dir:

1
2
3
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ ls test_kws_training/
0.pt    1.pt    2.pt    3.pt    4.pt    5.pt    6.pt    7.pt    8.pt    9.pt    avg_10.pt    config.yaml        dump_cv.txt     init.pt          tb_test
0.yaml  1.yaml  2.yaml  3.yaml  4.yaml  5.yaml  6.yaml  7.yaml  8.yaml  9.yaml  config.json  convert.kaldi.txt  dump_train.txt  origin.torch.pt  test_dir

1.3.2 use new version model

after a hard research, I found the new version model should like these steps:

  • package
1
2
3
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ cp test_kws_training/convert.kaldi.txt ../runtime/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ cd ../runtime
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ./run.sh

then, will get these:

1
2
3
4
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ls output/
convert_kaldi.net  kwsbp_resource  kwsbp_resource_quant16  kwsbp_resource_quant16.bin
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ls output/kwsbp_resource_quant16
keywords.json  kwsr.ccl  kwsr.cfg  kwsr.gbg  kwsr.lex  kwsr.mdl  kwsr.mvn  kwsr.net  kwsr.phn  kwsr.prior  kwsr.tree
  • relase new model to a new dir
1
2
3
4
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ mkdir 20240730
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../../../runtime/output/kwsbp_resource_quant16/k* ./20240730/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../20240728/configuration.json ./20240730/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../20240728/config.yaml ./20240730/
  • test new model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest/test_kws_training]$ python3
Python 3.9.16 (main, Jul 25 2024, 15:54:18) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope.outputs import OutputKeys
>>> from modelscope.pipelines import pipeline
>>> from modelscope.utils.constant import Tasks
>>> kwsbp_16k_pipline = pipeline(task=Tasks.keyword_spotting, model='./release/20240730')
2024-07-30 12:57:01,129 - modelscope - INFO - initiate model from ./release/20240730
2024-07-30 12:57:01,129 - modelscope - INFO - initiate model from location ./release/20240730.
2024-07-30 12:57:01,129 - modelscope - INFO - initialize model from ./release/20240730
2024-07-30 12:57:01,133 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-07-30 12:57:01,133 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'filter_conf': {'max_length': 1500, 'min_length': 10}, 'feature_extraction_conf': {'feature_type': 'fbank', 'num_mel_bins': 80, 'frame_shift': 10, 'frame_length': 25, 'dither': 1.0}, 'spec_aug': True, 'spec_aug_conf': {'num_t_mask': 2, 'num_f_mask': 2, 'max_t': 50, 'max_f': 30}, 'shuffle': True, 'shuffle_conf': {'shuffle_size': 1500}, 'context_expansion': True, 'context_expansion_conf': {'left': 2, 'right': 2}, 'frame_skip': 3, 'batch_conf': {'batch_size': 256}, 'model_dir': './release/20240730'}. trying to build by task and model information.
2024-07-30 12:57:01,133 - modelscope - WARNING - No preprocessor key ('speech_kws_fsmn_char_ctc_nearfield', 'keyword-spotting') found in PREPROCESSOR_MAP, skip building preprocessor.
>>> kws_result = kwsbp_16k_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/KWS/pos_testset/kws_xiaoyunxiaoyun.wav')
2024-07-30 12:57:07,719 - modelscope - INFO - Decoding with pcm mode ...
>>> print(kws_result)
{'kws_type': 'pcm', 'kws_list': [{'keyword': '小云小云', 'offset': 1.92, 'length': 0.51, 'confidence': 0.940286, 'type': 'wakeup'}], 'wav_count': 1}
>>> 
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest/test_kws_training]$

we can obviously see the confidence is different from the old version model. so we successful to use the new model.

1.4 start server

skip now, wait research for stream audio.

This post is licensed under CC BY 4.0 by the author.