Prochain Science: Kaldi: 使用 GStreamer 辨識 nnet2 語音模型

本篇說明使用 Gstreamer 來進行語音解碼，上線解碼伺服器。

Kaldi 的 nnet2 模型解碼說明

Kaldi 支持使用 DNN 來訓練語音模型 (Acoustic Model)，使用 Kaldi 訓練出來的模型會有 DNN 訓練的模型叫 nnet，實際訓練出來的 DNN 模型到 nnet2。

Kaldi 目前有內建針對普通 GMM 模型來進行解碼的即時解碼器 (例如 egs/voxforge/online_demo 使用的 src/onlinebin/online-gmm-decode-faster)，但官方敘述沒有意願對麥克風做即時解碼的功能 [1]，因此需要靠 GStreamer 來做解碼。

本文章使用的 GStreamer 工具是: alumae/kaldi-gstreamer-server 這個 Repo，它是使用 Web Socket 來做語音接收進行辨識，而且工具裡面自帶 client 的解碼器。

如果希望移植到 Android 平台做即時辨識，可以參考: truongdo/kaldi-gstreamer-client 這個 Repo。

語音模型

從 Kaldi 所訓練出來的模型中，需要準備 nnet2 的模型資料，本篇文章以 tedlium (TED 演講) 的語音模型來做範例說明，可以從 phon.ioc.ee 網站下載到 tedlium 模型: 下載網址 ，總共 1.4GB 左右大小。

下載後，解壓縮放到本地端的 /media/kaldi-models 這個目錄下，由於壓縮檔案格式是 tgz ，所以用 tar -xvzf [filename] 來解壓縮。

如果沒意外，整個路徑結構會是這樣:

/media/kaldi_models/
├── english
│ └── tedlium_nnet_ms_sp_online
│ ├── conf
│ │ ├── ivector_extractor.conf
│ │ ├── mfcc.conf
│ │ ├── online_cmvn.conf
│ │ ├── online_nnet2_decoding.conf
│ │ └── splice.conf
│ ├── final.mdl
│ ├── G.carpa
│ ├── G.fst
│ ├── HCLG.fst
│ ├── ivector_extractor
│ │ ├── final.dubm
│ │ ├── final.ie
│ │ ├── final.mat
│ │ ├── global_cmvn.stats
│ │ ├── online_cmvn.conf
│ │ └── splice_opts
│ ├── phones.txt
│ ├── word_boundary.int
│ └── words.txt
└── nnet2.yaml (之後要自己寫一個)

Gstreamer Server 環境設置

由於 Server 需求環境設定太複雜了，所以我決定用 Docker 來實現，這個 Repo: docker-kaldi-gstremer-server 提供原本 alumae/kaldi-gstreamer-server 的需求環境，就不需要在新的 Server 重新編譯 Kaldi 還有安裝 Python 套件。

首先要把 Dokcer Container 的映像下載回來:

docker pull jcsilva/docker-kaldi-gstreamer-server

然後，使用指令先把 Docker 跑起來:

docker run --privileged -it -p 8080:80 -v /media/kaldi_models:/opt/models jcsilva/docker-kaldi-gstreamer-server:latest /bin/bash

加 --privileged 是因為原本的 docker 不具訪問 docker 中 /opt/models 目錄的權限，可能會在除錯時有點複雜，因此這個指令可以讓你跑起來是賦予權限的。

另外會把 Docker 外的 /media/kaldi-models 跟 Docker 內的 /opt/models 做連結。

輸入指令後就會開啟 Docker 容器裡面的 /opt 目錄，還未打開伺服器，需要進行 config 設定才能讓伺服器保持正常執行。

Gstreamer Server Config 設定

要打開 GStreamer ，該 Repo 指示需要設定語音模型的檔案位置和參數，才會讀取，使用的範例是 Tedlium 模型，所以這個模型設定有兩個主要的步驟:

全域 Yaml 檔案

在容器內 (/opt/models) 或外 (/media/kaldi-models) 都可以設定，推薦在外面設定 (/media/kaldi-models)，因為裡面的容器沒有帶 vim ，要自行用 apt-get 安裝才有。

首先需要建立一個 nnet2.yaml 檔案在該目錄內，並且輸入 Tedlium 模型使用的參數(來源):

# You have to download TEDLIUM "online nnet2" models in order to use this sample
# Run download-tedlium-nnet2.sh in 'test/models' to download them.
use-nnet2: True
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    use-threaded-decoder:  true
    model : test/models/english/tedlium_nnet_ms_sp_online/final.mdl
    word-syms : test/models/english/tedlium_nnet_ms_sp_online/words.txt
    fst : test/models/english/tedlium_nnet_ms_sp_online/HCLG.fst
    mfcc-config : test/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf
    ivector-extraction-config : test/models/english/tedlium_nnet_ms_sp_online/conf/ivector_extractor.conf
    max-active: 10000
    beam: 10.0
    lattice-beam: 6.0
    acoustic-scale: 0.083
    do-endpointing : true
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    traceback-period-in-secs: 0.25
    chunk-length-in-secs: 0.25
    num-nbest: 10
    #Additional functionality that you can play with:
    #lm-fst:  test/models/english/tedlium_nnet_ms_sp_online/G.fst
    #big-lm-const-arpa: test/models/english/tedlium_nnet_ms_sp_online/G.carpa
    #phone-syms: test/models/english/tedlium_nnet_ms_sp_online/phones.txt
    #word-boundary-file: test/models/english/tedlium_nnet_ms_sp_online/word_boundary.int
    #do-phone-alignment: true
out-dir: tmp

use-vad: False
silence-timeout: 10

# Just a sample post-processor that appends "." to the hypothesis
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
full-post-processor: ./sample_full_post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

如果你是用 vim 複製貼上，請小心一開頭的 # 註解符號沒有被貼上，導致 yaml 檔出錯。

貼上後，建議修改上面的路徑，變成絕對路徑，因為我測試好幾次都讀不到正確路徑，所以可以把 test/models 的路徑改為 /opt/models 這個路徑，變成以下檔案:

# You have to download TEDLIUM "online nnet2" models in order to use this sample
# Run download-tedlium-nnet2.sh in 'test/models' to download them.
use-nnet2: True
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    use-threaded-decoder:  true
    model : /opt/models/english/tedlium_nnet_ms_sp_online/final.mdl
    word-syms : /opt/models/english/tedlium_nnet_ms_sp_online/words.txt
    fst : /opt/models/english/tedlium_nnet_ms_sp_online/HCLG.fst
    mfcc-config : /opt/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf
    ivector-extraction-config : /opt/models/english/tedlium_nnet_ms_sp_online/conf/ivector_extractor.conf
    max-active: 10000
    beam: 10.0
    lattice-beam: 6.0
    acoustic-scale: 0.083
    do-endpointing : true
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    traceback-period-in-secs: 0.25
    chunk-length-in-secs: 0.25
    num-nbest: 10
    #Additional functionality that you can play with:
    #lm-fst:  /opt/models/english/tedlium_nnet_ms_sp_online/G.fst
    #big-lm-const-arpa: /opt/models/english/tedlium_nnet_ms_sp_online/G.carpa
    #phone-syms: /opt/models/english/tedlium_nnet_ms_sp_online/phones.txt
    #word-boundary-file: /opt/models/english/tedlium_nnet_ms_sp_online/word_boundary.int
    #do-phone-alignment: true
out-dir: tmp

use-vad: False
silence-timeout: 10

# Just a sample post-processor that appends "." to the hypothesis
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
full-post-processor: /opt/kaldi-gstreamer-server/sample_full_post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

ivector 設定檔案

由於模型是 tedlium ，所以設定檔的位置一定不對，剛好在 ivector 這個設置上，裡面用的路徑都是 test/model.... ，這會導致解碼伺服器啟動後找不到模型檔案，所以也要修改 ivector config 使用絕對路徑 [7]:

修改 /opt/models/english/tedlium_nnet_ms_sp_online/conf/ivector_extractor.conf 這個檔案，把裡面的 test/models 路徑改為 /opt/models 路徑，檔案參數如下:

--splice-config=/opt/models/english/tedlium_nnet_ms_sp_online/conf/splice.conf
--cmvn-config=/opt/models/english/tedlium_nnet_ms_sp_online/conf/online_cmvn.conf
--lda-matrix=/opt/models/english/tedlium_nnet_ms_sp_online/ivector_extractor/final.mat
--global-cmvn-stats=/opt/models/english/tedlium_nnet_ms_sp_online/ivector_extractor/global_cmvn.stats
--diag-ubm=/opt/models/english/tedlium_nnet_ms_sp_online/ivector_extractor/final.dubm
--ivector-extractor=/opt/models/english/tedlium_nnet_ms_sp_online/ivector_extractor/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100

啟動 GStreamer Server

確認設定檔完成後，再做啟動，在容器內的 /opt 目錄使用指令:

./start.sh -y /./models/nnet2.yaml

這個指令就會產生兩個檔案: worker.log, master.log ，可以觀看這個 log 是否有正確啟動。

伺服器外測試

Dokcer 伺服器確定啟動後，就可以在外面做測試，範例使用 docker-kaldi-gstremer-server
原本提供的 client 來測試，所以如果沒有把專案 clone 回來，就要做一次，然後要安裝 python 的套件 ws4py:

pip install ws4py==0.3.2

版本指定 0.3.2 是因為作者說明他發現了 ws4py 的 Bug 導致 GStreamer Server 出現問題。

另外該 ws4py 似乎無法在 windows 正確執行，可能會出現

Traceback (most recent call last):
  File "client.py", line 6, in <module>
    from ws4py.websocket import WebSocket
ImportError: No module named ws4py.websocket

這樣的 Bug，換 Linux 執行則正常，整個 client.py 檔案要用 python 2.7 來執行。

docker-kaldi-gstremer-server 這個 repo 有提供測試的音檔，所以直接在該 repo 目錄執行指令，就可以測試辨識器:

python kaldigstserver/client.py -u ws://localhost:8080/client/ws/speech -r 8192 test/data/bill_gates-TED.mp3

Reference:
[1] https://groups.google.com/forum/#!topic/kaldi-help/iR40XhDJnC8
[2] https://github.com/alumae/kaldi-gstreamer-server
[3] https://github.com/jcsilva/docker-kaldi-gstreamer-server
[4] https://github.com/alumae/kaldi-gstreamer-server/blob/master/sample_english_nnet2.yaml
[5] https://github.com/luisivan/pebble-slides/issues/9
[6] http://blog.csdn.net/beyissi2020/article/details/48626329
[7] https://groups.google.com/forum/#!topic/kaldi-help/1feWYAoOO7A
[8] https://github.com/luisivan/pebble-slides/issues/9

Prochain Science

2018年3月5日星期一

Kaldi: 使用 GStreamer 辨識 nnet2 語音模型

Kaldi 的 nnet2 模型解碼說明

語音模型

Gstreamer Server 環境設置

Gstreamer Server Config 設定

全域 Yaml 檔案

ivector 設定檔案

啟動 GStreamer Server

伺服器外測試

沒有留言:

張貼留言

2018年3月5日 星期一

Kaldi: 使用 GStreamer 辨識 nnet2 語音模型

Kaldi 的 nnet2 模型解碼說明

語音模型

Gstreamer Server 環境設置

Gstreamer Server Config 設定

全域 Yaml 檔案

ivector 設定檔案

啟動 GStreamer Server

伺服器外測試

沒有留言:

張貼留言

2018年3月5日星期一