【Audio】03 FunASR

Posted Jul 27, 2024

By FWalkerVayne

5 min read

【Audio】03 FunASR

1 prepare env

1.1 install funasr

python: 3.9.16 kernel: 5.4.0-42-generic

here is the two key docs:

So, let’s install funasr.

pip install torch can not run ok.

(funasr) [vayne@localhost ~/.a/FunASR/release/20240727]$ python
Python 3.9.16 (main, Jul 25 2024, 15:54:18) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11050). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

with these docs:

So, pip install torch==1.11.0 numpy==1.29 torchvision==0.12.0 torchaudio==0.11.0 funasr modelscope

then,

(funasr) [vayne@localhost ~/.a/FunASR/release/20240727]$ python
Python 3.9.16 (main, Jul 25 2024, 15:54:18) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True

2 use model

python: 3.9.16 kernel: 5.4.0-42-generic

2.1 export model `paraformer-zh`

refer this doc.

but not success, here is error

Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.

other people meet the same question, here is the issue.

so skip it for now. try to train model instead of export model.

2.2 infer directly by the pre-trained model

(funasr) [vayne@localhost ~/.a/FunASR/examples/industrial_data_pretraining/paraformer]$ ./infer_from_local.sh
Notice: ffmpeg is not installed. torchaudio is used to load audio
If you want to use ffmpeg backend to load audio, please install it by:
        sudo apt install ffmpeg # ubuntu                  
        # brew install ffmpeg # mac                    
If you want use mossformer, lease install rotary_embedding_torch by:
 pip install -U rotary_embedding_torch                             
If you want use mossformer, lease install rotary_embedding_torch by:
 pip install -U rotary_embedding_torch                             
If you want use mossformer, lease install rotary_embedding_torch by:
 pip install -U rotary_embedding_torch                             
If you want use mossformer, lease install rotary_embedding_torch by:
 pip install -U rotary_embedding_torch                             
/usr/local/src/Python-3.9.16/lib/python3.9/runpy.py:127: RuntimeWarning: 'funasr.bin.inference' found in sys.modules after import of package 'funasr.bin', 
but prior to execution of 'funasr.bin.inference'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
You are using the latest version of funasr-1.1.4
[2024-07-27 23:32:51,751][root][INFO] - Loading pretrained params from /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_m
odels/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-07-27 23:32:51,756][root][INFO] - ckpt: /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/iic/speech_paraform
er-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt
[2024-07-27 23:32:52,293][root][INFO] - scope_map: ['module.', 'None']
[2024-07-27 23:32:52,294][root][INFO] - excludes: None
[2024-07-27 23:32:53,427][root][INFO] - Loading ckpt: /home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/modelscope_models/iic/speech_
paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pt, status: <All keys matched successfully>
rtf_avg: 0.009: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 10.55it/s]
[{'key': 'D21_829', 'text': '和血一样有无数无声的激情与世界浑然一体曾经历史养活又孕育美丽的明天'}] 

3 fine tune model

3.1 in old kernel, not ok

python: 3.9.16 kernel: 5.4.0-42-generic

all before work, is on the kernel 5.4.0-42-generic, which limit the version of many packages. such as pytorch, numpy, torchvision, torchaudio.

Like, pip install torch==1.11.0 "numpy<2" torchvision==0.12.0 torchaudio==0.11.0 funasr modelscope

(funasr) [vayne@localhost ~]$ uname -a
Linux localhost 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

And, it will cause the below training error when try to run sample finetune script.

(funasr) [vayne@localhost ~/.a/FunASR/examples/industrial_data_pretraining/paraformer]$ cat outputs/log.txt
Traceback (most recent call last):
  File "/home/vayne/.a/FunASR/examples/industrial_data_pretraining/paraformer/../../../funasr/bin/train_ds.py", line 22, in <module>
    from torch.distributed.fsdp.sharded_grad_scaler import ShardedGradScaler
ModuleNotFoundError: No module named 'torch.distributed.fsdp.sharded_grad_scaler'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25734) of binary: /home/vayne/.e/funasr/bin/python
Traceback (most recent call last):
  File "/home/vayne/.e/funasr/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/vayne/.e/funasr/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
../../../funasr/bin/train_ds.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-27_22:59:52
  host      : localhost
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 25734)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

i don’t know how to fix it. maybe i don’t need fix it now.

I just need to try to run the flow in the new kernel, and if it is ok, it’s enough.

3.2 in new kernel, ok

python: 3.10.12 kernel: 5.15.0-105-generic

So, i try to run the flow in the new kernel, like the below.

root@localhost:~# uname -a
Linux localhost 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

and install packages:

pip install torch numpy torchvision torchaudio funasr modelscope

then, run the sample finetune script.

It passed!!!

(funasr) vayne@localhost:~/.a/FunASR/examples/industrial_data_pretraining/paraformer$ ls outputs/
2024-07-28          log.txt        model.pt.ep31  model.pt.ep34  model.pt.ep37  model.pt.ep40  model.pt.ep43  model.pt.ep46  model.pt.ep49
configuration.json  model.pt       model.pt.ep32  model.pt.ep35  model.pt.ep38  model.pt.ep41  model.pt.ep44  model.pt.ep47  model.pt.ep50
config.yaml         model.pt.best  model.pt.ep33  model.pt.ep36  model.pt.ep39  model.pt.ep42  model.pt.ep45  model.pt.ep48  tensorboard

So, transfer learning this model is ok.

4 run server

python: 3.9.16 kernel: 5.4.0-42-generic

refer these scripts:

runtime/python/http/start_server.sh
runtime/python/http/client.py

then, can call http api, to recognize the audio file.

So, for now, ASR over.

100 some concepts

100.1 Voice Activity Detection (VAD)

Voice Activity Detection (VAD) refers to the process of detecting pauses or lack of speech in a voice signal.

It is used in voice compression systems to reduce bandwidth by transmitting fewer or no packets during periods of silence.

VAD is implemented using a function called a voice activity detector, which helps in halving the bandwidth used by identifying periods of no voice activity.

doc: https://www.sciencedirect.com/topics/computer-science/voice-activity-detection

100.2 punctuation restoration

used for add punctuation to the text.

可用于语音识别模型输出文本的标点预测.

doc: https://modelscope.cn/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large/summary

AI, Audio

FunASR

This post is licensed under CC BY 4.0 by the author.

1 prepare env

1.1 install funasr

2 use model

2.1 export model paraformer-zh

2.2 infer directly by the pre-trained model

3 fine tune model

3.1 in old kernel, not ok

3.2 in new kernel, ok

4 run server

100 some concepts

100.1 Voice Activity Detection (VAD)

100.2 punctuation restoration

Trending Tags

2.1 export model `paraformer-zh`