【Audio】04 KWS
【Audio】04 KWS
1 first try kws
1.1 prepare env
python: 3.9.16 kernel: 5.4.0-42-generic
1
2
3
git clone https://www.modelscope.cn/iic/speech_charctc_kws_phone-xiaoyun.git
pip install torch==1.11.0 "numpy<2" torchvision==0.12.0 torchaudio==0.11.0 simplejson python-dateutil sortedcontainers addict modelscope[framework] matplotlib
pip install kwsbp -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
1.2 inference directly
1
2
3
4
5
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
kwsbp_16k_pipline = pipeline(task=Tasks.keyword_spotting, model='./speech_charctc_kws_phone-xiaoyun')
kws_result = kwsbp_16k_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/KWS/pos_testset/kws_xiaoyunxiaoyun.wav')
print(kws_result)
1.3 finetune demo
1.3.1 train a new version model
1
pip install tensorboardX matplotlib kaldiio
1
2
cd unittest
python ./example_kws.py
then will get these under test_kws_training
dir:
1
2
3
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ ls test_kws_training/
0.pt 1.pt 2.pt 3.pt 4.pt 5.pt 6.pt 7.pt 8.pt 9.pt avg_10.pt config.yaml dump_cv.txt init.pt tb_test
0.yaml 1.yaml 2.yaml 3.yaml 4.yaml 5.yaml 6.yaml 7.yaml 8.yaml 9.yaml config.json convert.kaldi.txt dump_train.txt origin.torch.pt test_dir
1.3.2 use new version model
after a hard research, I found the new version model should like these steps:
- package
1
2
3
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ cp test_kws_training/convert.kaldi.txt ../runtime/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest]$ cd ../runtime
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ./run.sh
then, will get these:
1
2
3
4
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ls output/
convert_kaldi.net kwsbp_resource kwsbp_resource_quant16 kwsbp_resource_quant16.bin
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/runtime]$ ls output/kwsbp_resource_quant16
keywords.json kwsr.ccl kwsr.cfg kwsr.gbg kwsr.lex kwsr.mdl kwsr.mvn kwsr.net kwsr.phn kwsr.prior kwsr.tree
- relase new model to a new dir
1
2
3
4
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ mkdir 20240730
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../../../runtime/output/kwsbp_resource_quant16/k* ./20240730/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../20240728/configuration.json ./20240730/
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/release]$ cp ../20240728/config.yaml ./20240730/
- test new model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest/test_kws_training]$ python3
Python 3.9.16 (main, Jul 25 2024, 15:54:18)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelscope.outputs import OutputKeys
>>> from modelscope.pipelines import pipeline
>>> from modelscope.utils.constant import Tasks
>>> kwsbp_16k_pipline = pipeline(task=Tasks.keyword_spotting, model='./release/20240730')
2024-07-30 12:57:01,129 - modelscope - INFO - initiate model from ./release/20240730
2024-07-30 12:57:01,129 - modelscope - INFO - initiate model from location ./release/20240730.
2024-07-30 12:57:01,129 - modelscope - INFO - initialize model from ./release/20240730
2024-07-30 12:57:01,133 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-07-30 12:57:01,133 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'filter_conf': {'max_length': 1500, 'min_length': 10}, 'feature_extraction_conf': {'feature_type': 'fbank', 'num_mel_bins': 80, 'frame_shift': 10, 'frame_length': 25, 'dither': 1.0}, 'spec_aug': True, 'spec_aug_conf': {'num_t_mask': 2, 'num_f_mask': 2, 'max_t': 50, 'max_f': 30}, 'shuffle': True, 'shuffle_conf': {'shuffle_size': 1500}, 'context_expansion': True, 'context_expansion_conf': {'left': 2, 'right': 2}, 'frame_skip': 3, 'batch_conf': {'batch_size': 256}, 'model_dir': './release/20240730'}. trying to build by task and model information.
2024-07-30 12:57:01,133 - modelscope - WARNING - No preprocessor key ('speech_kws_fsmn_char_ctc_nearfield', 'keyword-spotting') found in PREPROCESSOR_MAP, skip building preprocessor.
>>> kws_result = kwsbp_16k_pipline(audio_in='https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/KWS/pos_testset/kws_xiaoyunxiaoyun.wav')
2024-07-30 12:57:07,719 - modelscope - INFO - Decoding with pcm mode ...
>>> print(kws_result)
{'kws_type': 'pcm', 'kws_list': [{'keyword': '小云小云', 'offset': 1.92, 'length': 0.51, 'confidence': 0.940286, 'type': 'wakeup'}], 'wav_count': 1}
>>>
(kws) [vayne@localhost ~/.a/speech_charctc_kws_phone-xiaoyun/unittest/test_kws_training]$
we can obviously see the confidence is different from the old version model. so we successful to use the new model.
1.4 start server
skip now, wait research for stream audio.
This post is licensed under CC BY 4.0 by the author.