【Audio】03 FunASR
1 prepare env
1.1 install funasr
python: 3.9.16 kernel: 5.4.0-42-generic
here is the two key docs:
So, let’s install funasr.
pip install torch can not run ok.
1
2
3
4
5
6
7
8
9
(funasr) [vayne@localhost ~/.a/FunASR/release/20240727]$ python
Python 3.9.16 (main, Jul 25 2024, 15:54:18)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11050). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
with these docs:
So, pip install torch==1.11.0 numpy==1.29 torchvision==0.12.0 torchaudio==0.11.0 funasr modelscope
then,
1
2
3
4
5
6
7
(funasr) [vayne@localhost ~/.a/FunASR/release/20240727]$ python
Python 3.9.16 (main, Jul 25 2024, 15:54:18)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
2 use model
python: 3.9.16 kernel: 5.4.0-42-generic
2.1 export model paraformer-zh
refer this doc.
but not success, here is error
1
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
other people meet the same question, here is the issue.
so skip it for now. try to train model instead of export model.
2.2 infer directly by the pre-trained model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
(funasr) [vayne@localhost ~/.a/FunASR/examples/industrial_data_pretraining/paraformer]$ ./infer_from_local.sh
Notice: ffmpeg is not installed. torchaudio is used to load audio
If you want to use ffmpeg backend to load audio, please install it by:
sudo apt install ffmpeg # ubuntu
# brew install ffmpeg # mac
If you want use mossformer, lease install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
If you want use mossformer, lease install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
If you want use mossformer, lease install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
If you want use mossformer, lease install rotary_embedding_torch by:
pip install -U rotary_embedding_torch
/usr/local/src/Python-3.9.16/lib/python3.9/runpy.py:127: RuntimeWarning: 'funasr.bin.inference' found in sys.modules after import of package 'funasr.bin',
but prior to execution of 'funasr.bin.inference'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
You are using the latest version of funasr-1.1.4
[2024-07-27 23:32:51,751][root][INFO] - Loading pretrained params from /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_m
odels/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-07-27 23:32:51,756][root][INFO] - ckpt: /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/iic/speech_paraform
er-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-07-27 23:32:52,293][root][INFO] - scope_map: ['module.', 'None']
[2024-07-27 23:32:52,294][root][INFO] - excludes: None
[2024-07-27 23:32:53,427][root][INFO] - Loading ckpt: /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/iic/speech_
paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt, status: <All keys matched successfully>
rtf_avg: 0.009: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.55it/s]
[{'key': 'D21_829', 'text': '和血一样有无数无声的激情与世界浑然一体曾经历史养活又孕育美丽的明天'}]
3 fine tune model
3.1 in old kernel, not ok
python: 3.9.16 kernel: 5.4.0-42-generic
all before work, is on the kernel 5.4.0-42-generic, which limit the version of many packages. such as pytorch, numpy, torchvision, torchaudio.
Like, pip install torch==1.11.0 "numpy<2" torchvision==0.12.0 torchaudio==0.11.0 funasr modelscope
1
2
(funasr) [vayne@localhost ~]$ uname -a
Linux localhost 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
And, it will cause the below training error when try to run sample finetune script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
(funasr) [vayne@localhost ~/.a/FunASR/examples/industrial_data_pretraining/paraformer]$ cat outputs/log.txt
Traceback (most recent call last):
File "/home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/../../../funasr/bin/train_ds.py", line 22, in <module>
from torch.distributed.fsdp.sharded_grad_scaler import ShardedGradScaler
ModuleNotFoundError: No module named 'torch.distributed.fsdp.sharded_grad_scaler'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25734) of binary: /home/vayne/.e/funasr/bin/python
Traceback (most recent call last):
File "/home/vayne/.e/funasr/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
../../../funasr/bin/train_ds.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-07-27_22:59:52
host : localhost
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 25734)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
i don’t know how to fix it. maybe i don’t need fix it now.
I just need to try to run the flow in the new kernel, and if it is ok, it’s enough.
3.2 in new kernel, ok
python: 3.10.12 kernel: 5.15.0-105-generic
So, i try to run the flow in the new kernel, like the below.
1
2
root@localhost:~# uname -a
Linux localhost 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
and install packages:
1
pip install torch numpy torchvision torchaudio funasr modelscope
then, run the sample finetune script.
It passed!!!
1
2
3
4
(funasr) vayne@localhost:~/.a/FunASR/examples/industrial_data_pretraining/paraformer$ ls outputs/
2024-07-28 log.txt model.pt.ep31 model.pt.ep34 model.pt.ep37 model.pt.ep40 model.pt.ep43 model.pt.ep46 model.pt.ep49
configuration.json model.pt model.pt.ep32 model.pt.ep35 model.pt.ep38 model.pt.ep41 model.pt.ep44 model.pt.ep47 model.pt.ep50
config.yaml model.pt.best model.pt.ep33 model.pt.ep36 model.pt.ep39 model.pt.ep42 model.pt.ep45 model.pt.ep48 tensorboard
So, transfer learning this model is ok.
4 run server
python: 3.9.16 kernel: 5.4.0-42-generic
refer these scripts:
- runtime/python/http/start_server.sh
- runtime/python/http/client.py
then, can call http api, to recognize the audio file.
So, for now, ASR over.
100 some concepts
100.1 Voice Activity Detection (VAD)
Voice Activity Detection (VAD) refers to the process of detecting pauses or lack of speech in a voice signal.
It is used in voice compression systems to reduce bandwidth by transmitting fewer or no packets during periods of silence.
VAD is implemented using a function called a voice activity detector, which helps in halving the bandwidth used by identifying periods of no voice activity.
doc: https://www.sciencedirect.com/topics/computer-science/voice-activity-detection
100.2 punctuation restoration
used for add punctuation to the text.
可用于语音识别模型输出文本的标点预测.
doc: https://modelscope.cn/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large/summary