2018年2月3日 星期六

HTK Tools (4): 單音素建模 - 修正靜音模型

前面幾個章節所訓練的馬可夫模型,從一開始就排除了說話單字中,比較短暫的停留狀態,稱作 "sp" ,前面訓練的最多止於 "sil" (silence) ,也就是比較長的靜音狀態,多發生在說話結束後暫停的狀態。  則本章節說明用靜音模型調整目前的馬可夫模型的方式。

其意義,就是將原本的馬可夫模型,串上一個靜音模型,將原有的馬可夫模型狀態中心,和靜音模型串起來。

圖示 - 串起靜音模型示例,圖片來自 HTK Books (p.43, Fig. 3.9 Silence Models)

準備靜音模型


上章節已經把 hmm模型建到第 3 個,也就是 hmm3,現在要把 hmm3 的內容,全部複製到新的資料夾 hmm4 ,然後用編輯器打開 hmm4 的 hmmdefs 檔案,把 "sil" 這裡所有的參數包含 sil ,複製貼上到最後結尾,把 sil 改成 sp,記得要在最後換行:

範例:

~h "sil"  //複製但不要刪除
<BEGINHMM>
<NUMSTATES> 5
<STATE> 2
<MEAN> 39
 -1.498677e+001 4.093102e+000 4.470369e+000 9.550881e+000 5.525516e+000 7.084024e+000 5.142689e+000 6.013014e+000 4.645503e+000 5.539509e+000 6.325500e+000 6.309093e+000 5.749937e+001 -6.906049e-003 6.200859e-002 1.112403e-002 3.576368e-002 8.758731e-003 1.378206e-002 -2.123800e-002 -7.497694e-003 5.298624e-002 1.609821e-002 -5.841954e-003 -3.580781e-002 -5.639551e-002 -1.552888e-003 -1.564077e-002 6.936034e-003 -2.219586e-002 -2.337704e-003 -5.731292e-003 -4.651209e-003 8.145710e-003 1.182639e-003 -4.194723e-003 6.376632e-003 2.554819e-003 2.414685e-002
<VARIANCE> 39
 1.643824e+000 3.894905e+000 3.663061e+000 4.946204e+000 3.898954e+000 4.214682e+000 5.191493e+000 4.351866e+000 5.781250e+000 4.500092e+000 4.461145e+000 4.605597e+000 2.719475e+000 7.932547e-002 1.842212e-001 2.402062e-001 3.245045e-001 3.278915e-001 4.611880e-001 4.742737e-001 5.285383e-001 5.803781e-001 4.833222e-001 4.731556e-001 5.138031e-001 8.708426e-002 1.568178e-002 3.380568e-002 5.433224e-002 6.767875e-002 6.143730e-002 9.275308e-002 9.806089e-002 1.116426e-001 1.291828e-001 9.715064e-002 9.311121e-002 1.028403e-001 1.796106e-002
<GCONST> 3.857685e+001
<STATE> 3
<MEAN> 39
 -1.715477e+001 4.400246e+000 6.262936e+000 7.177123e+000 6.923150e+000 6.086653e+000 6.596479e+000 5.483385e+000 4.031187e+000 6.449147e+000 4.948737e+000 6.442664e+000 5.879964e+001 -3.977152e-001 4.400318e-002 4.941491e-001 -1.007472e-001 2.828232e-002 -5.974878e-003 2.506801e-001 -2.503631e-001 -3.735806e-002 -5.886459e-003 -1.815426e-001 9.265565e-002 1.569314e-001 -1.779113e-001 -6.644941e-002 1.010883e-001 -1.543404e-001 7.227802e-002 -7.839736e-002 4.901925e-002 -2.085987e-002 3.523882e-002 1.870989e-002 -6.068126e-002 4.070071e-002 1.934501e-001
<VARIANCE> 39
 1.453472e+001 3.282893e+000 1.309321e+001 1.138998e+001 6.731030e+000 5.037028e+000 1.200170e+001 1.046143e+001 6.047815e+000 1.058800e+001 6.869959e+000 4.896886e+000 6.478782e+000 3.302830e+000 2.376229e-001 2.080417e+000 1.771493e+000 1.024571e+000 1.037700e+000 2.329357e+000 1.914135e+000 1.042424e+000 1.704532e+000 8.336554e-001 5.220092e-001 1.452260e+000 5.798435e-001 4.099488e-002 3.843521e-001 3.120943e-001 2.126025e-001 1.956420e-001 4.509460e-001 3.321507e-001 1.947594e-001 2.625585e-001 1.538810e-001 1.324593e-001 2.504321e-001
<GCONST> 8.199227e+001
<STATE> 4
<MEAN> 39
 -1.704935e+001 8.153662e-003 6.034267e+000 4.172133e+000 7.304586e+000 2.624785e+000 3.512644e+000 4.518258e+000 2.936766e+000 3.133998e+000 4.160763e+000 4.870670e+000 6.802085e+001 -3.250551e-002 -2.811043e-002 1.102287e-002 -2.990296e-002 5.529329e-002 5.707249e-003 -7.093425e-003 1.310036e-002 -8.518350e-003 -3.768544e-002 -3.535427e-002 2.529903e-002 2.381820e-002 7.676242e-002 -8.619049e-003 -5.807225e-002 5.872397e-002 -1.823968e-003 4.815067e-003 -4.655287e-002 1.761478e-002 -2.065457e-002 -4.206782e-002 1.772777e-002 -2.844845e-003 -3.901509e-002
<VARIANCE> 39
 5.909045e+001 2.491252e+001 3.028853e+001 3.908095e+001 3.259219e+001 3.694969e+001 7.296276e+001 2.848757e+001 2.728870e+001 5.725907e+001 1.883418e+001 1.637582e+001 1.037448e+002 3.022137e+000 1.264405e+000 1.697435e+000 2.535551e+000 1.947650e+000 2.348791e+000 3.577286e+000 2.126656e+000 2.145338e+000 3.731926e+000 1.500666e+000 1.437876e+000 2.975770e+000 5.847414e-001 2.247958e-001 3.302156e-001 5.230563e-001 2.971367e-001 4.237480e-001 5.890771e-001 4.158721e-001 4.379262e-001 6.548804e-001 2.821163e-001 2.804883e-001 4.466936e-001
<GCONST> 1.169426e+002
<TRANSP> 5
 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000
 0.000000e+000 9.657938e-001 3.420626e-002 0.000000e+000 0.000000e+000
 0.000000e+000 0.000000e+000 8.181013e-001 1.818986e-001 0.000000e+000
 0.000000e+000 0.000000e+000 0.000000e+000 9.399628e-001 6.003720e-002
 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000
<ENDHMM>

~h "sp"   //這邊修改 sil 變成 sp
<BEGINHMM>
<NUMSTATES> 5
<STATE> 2
<MEAN> 39
 -1.498677e+001 4.093102e+000 4.470369e+000 9.550881e+000 5.525516e+000 7.084024e+000 5.142689e+000 6.013014e+000 4.645503e+000 5.539509e+000 6.325500e+000 6.309093e+000 5.749937e+001 -6.906049e-003 6.200859e-002 1.112403e-002 3.576368e-002 8.758731e-003 1.378206e-002 -2.123800e-002 -7.497694e-003 5.298624e-002 1.609821e-002 -5.841954e-003 -3.580781e-002 -5.639551e-002 -1.552888e-003 -1.564077e-002 6.936034e-003 -2.219586e-002 -2.337704e-003 -5.731292e-003 -4.651209e-003 8.145710e-003 1.182639e-003 -4.194723e-003 6.376632e-003 2.554819e-003 2.414685e-002
<VARIANCE> 39
 1.643824e+000 3.894905e+000 3.663061e+000 4.946204e+000 3.898954e+000 4.214682e+000 5.191493e+000 4.351866e+000 5.781250e+000 4.500092e+000 4.461145e+000 4.605597e+000 2.719475e+000 7.932547e-002 1.842212e-001 2.402062e-001 3.245045e-001 3.278915e-001 4.611880e-001 4.742737e-001 5.285383e-001 5.803781e-001 4.833222e-001 4.731556e-001 5.138031e-001 8.708426e-002 1.568178e-002 3.380568e-002 5.433224e-002 6.767875e-002 6.143730e-002 9.275308e-002 9.806089e-002 1.116426e-001 1.291828e-001 9.715064e-002 9.311121e-002 1.028403e-001 1.796106e-002
<GCONST> 3.857685e+001
<STATE> 3
<MEAN> 39
 -1.715477e+001 4.400246e+000 6.262936e+000 7.177123e+000 6.923150e+000 6.086653e+000 6.596479e+000 5.483385e+000 4.031187e+000 6.449147e+000 4.948737e+000 6.442664e+000 5.879964e+001 -3.977152e-001 4.400318e-002 4.941491e-001 -1.007472e-001 2.828232e-002 -5.974878e-003 2.506801e-001 -2.503631e-001 -3.735806e-002 -5.886459e-003 -1.815426e-001 9.265565e-002 1.569314e-001 -1.779113e-001 -6.644941e-002 1.010883e-001 -1.543404e-001 7.227802e-002 -7.839736e-002 4.901925e-002 -2.085987e-002 3.523882e-002 1.870989e-002 -6.068126e-002 4.070071e-002 1.934501e-001
<VARIANCE> 39
 1.453472e+001 3.282893e+000 1.309321e+001 1.138998e+001 6.731030e+000 5.037028e+000 1.200170e+001 1.046143e+001 6.047815e+000 1.058800e+001 6.869959e+000 4.896886e+000 6.478782e+000 3.302830e+000 2.376229e-001 2.080417e+000 1.771493e+000 1.024571e+000 1.037700e+000 2.329357e+000 1.914135e+000 1.042424e+000 1.704532e+000 8.336554e-001 5.220092e-001 1.452260e+000 5.798435e-001 4.099488e-002 3.843521e-001 3.120943e-001 2.126025e-001 1.956420e-001 4.509460e-001 3.321507e-001 1.947594e-001 2.625585e-001 1.538810e-001 1.324593e-001 2.504321e-001
<GCONST> 8.199227e+001
<STATE> 4
<MEAN> 39
 -1.704935e+001 8.153662e-003 6.034267e+000 4.172133e+000 7.304586e+000 2.624785e+000 3.512644e+000 4.518258e+000 2.936766e+000 3.133998e+000 4.160763e+000 4.870670e+000 6.802085e+001 -3.250551e-002 -2.811043e-002 1.102287e-002 -2.990296e-002 5.529329e-002 5.707249e-003 -7.093425e-003 1.310036e-002 -8.518350e-003 -3.768544e-002 -3.535427e-002 2.529903e-002 2.381820e-002 7.676242e-002 -8.619049e-003 -5.807225e-002 5.872397e-002 -1.823968e-003 4.815067e-003 -4.655287e-002 1.761478e-002 -2.065457e-002 -4.206782e-002 1.772777e-002 -2.844845e-003 -3.901509e-002
<VARIANCE> 39
 5.909045e+001 2.491252e+001 3.028853e+001 3.908095e+001 3.259219e+001 3.694969e+001 7.296276e+001 2.848757e+001 2.728870e+001 5.725907e+001 1.883418e+001 1.637582e+001 1.037448e+002 3.022137e+000 1.264405e+000 1.697435e+000 2.535551e+000 1.947650e+000 2.348791e+000 3.577286e+000 2.126656e+000 2.145338e+000 3.731926e+000 1.500666e+000 1.437876e+000 2.975770e+000 5.847414e-001 2.247958e-001 3.302156e-001 5.230563e-001 2.971367e-001 4.237480e-001 5.890771e-001 4.158721e-001 4.379262e-001 6.548804e-001 2.821163e-001 2.804883e-001 4.466936e-001
<GCONST> 1.169426e+002
<TRANSP> 5
 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000
 0.000000e+000 9.657938e-001 3.420626e-002 0.000000e+000 0.000000e+000
 0.000000e+000 0.000000e+000 8.181013e-001 1.818986e-001 0.000000e+000
 0.000000e+000 0.000000e+000 0.000000e+000 9.399628e-001 6.003720e-002
 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000
<ENDHMM>
//最後請斷行

然後,請把 sp 底下狀態的 STATE 2 的參數和 STATE 4 的參數移除,只保留 STATE 3 (這樣就可以在訓練參數的時候,只把 STATE 3 訓練到原有的馬可夫模型中,修正靜音模型)。

並有以下步驟:

  1. 把 STATE 2, 4 砍掉,留下 STATE 3
  2. 把 NUMSTATES 改為 3
  3. 把 STATE 改為 2
  4. 把最後的 TRANSP 改為 3
  5. 把 TRANSP 二維矩陣的內容 (原來是 5x5),刪除多餘的改成 3x3 (矩陣內容請複製下方)
TRANSP 3x3 矩陣內容:
 0.0 1.0 0.0
 0.0 0.9 0.1
 0.0 0.0 0.0


修正後範例:

~h "sp"
<BEGINHMM>
<NUMSTATES> 3
<STATE> 2
<MEAN> 39
 -1.715477e+001 4.400246e+000 6.262936e+000 7.177123e+000 6.923150e+000 6.086653e+000 6.596479e+000 5.483385e+000 4.031187e+000 6.449147e+000 4.948737e+000 6.442664e+000 5.879964e+001 -3.977152e-001 4.400318e-002 4.941491e-001 -1.007472e-001 2.828232e-002 -5.974878e-003 2.506801e-001 -2.503631e-001 -3.735806e-002 -5.886459e-003 -1.815426e-001 9.265565e-002 1.569314e-001 -1.779113e-001 -6.644941e-002 1.010883e-001 -1.543404e-001 7.227802e-002 -7.839736e-002 4.901925e-002 -2.085987e-002 3.523882e-002 1.870989e-002 -6.068126e-002 4.070071e-002 1.934501e-001
<VARIANCE> 39
 1.453472e+001 3.282893e+000 1.309321e+001 1.138998e+001 6.731030e+000 5.037028e+000 1.200170e+001 1.046143e+001 6.047815e+000 1.058800e+001 6.869959e+000 4.896886e+000 6.478782e+000 3.302830e+000 2.376229e-001 2.080417e+000 1.771493e+000 1.024571e+000 1.037700e+000 2.329357e+000 1.914135e+000 1.042424e+000 1.704532e+000 8.336554e-001 5.220092e-001 1.452260e+000 5.798435e-001 4.099488e-002 3.843521e-001 3.120943e-001 2.126025e-001 1.956420e-001 4.509460e-001 3.321507e-001 1.947594e-001 2.625585e-001 1.538810e-001 1.324593e-001 2.504321e-001
<GCONST> 8.199227e+001
<TRANSP> 3
 0.0 1.0 0.0
 0.0 0.9 0.1
 0.0 0.0 0.0
<ENDHMM>


建立調整參數的指令腳本


請建立一個 sil.hed 檔案,然後把以下指令複製貼上到檔案 (指令來自 HTK Books (p.34, 3.2.2 Step7,詳細指令說明請參考 p.35 上方):

AT 2 4 0.2 {sil.transP}
AT 4 2 0.2 {sil.transP}
AT 1 3 0.3 {sp.transP}
TI silst {sil.state[3],sp.state[2]}

請記得最後要斷行。

使用含有靜音標示的 monophones 檔案


前一章節使用的 monophones0 是來自於字典,上章節使用的檔案,是不希望有 "sp" 存在的,而這個章節使用的 monophones1 檔案,是要包含 "sp" 存在的,所以請直接複製 monophones0 貼上後改名稱為 monopones1 ,然後打開檔案,把 sp 寫入在最後面 (請順便檢查如果你的 monophones0 有 sp ,那麼你可能要考慮回到上一章節重新訓練沒有 sp 的 monophones0)。

使用指令進行調整


請建立一個 hmm5 的目錄,然後確保以下步驟已經完成,再進行指令:

  • 複製 hmm3 變成 hmm4
  • 修改 hmm4 裡面的 hmmdefs 模型,包含靜音模型 "sp" (但不要刪除 "sil")
  • 複製 monophones0 到 monophones1 這個 1 的檔案,是包含 sp 的 (沒有自行新增)
  • 建立目錄 hmm5
確認後,使用指令:

HHEd -H .\hmm4\proto -H .\hmm4\hmmdefs -M hmm5 sil.hed monophones1

(HTK Books 指的 marco 檔案都在這裡所有章節都是用 proto 當作名稱。)

接著,靜音模型要調整兩次,使用上一章節使用的指令 HERest 做調整,請先建立 hmm6 和 hmm7 的目錄,再執行指令:

HMM6
HERest - A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H .\hmm5\proto -H .\hmm5\hmmdefs -M hmm6 monophones0


HMM7
HERest - A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H .\hmm6\proto -H .\hmm6\hmmdefs -M hmm7 monophones0



Reference:
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/monophones/step-7/comments/problem-eith-step-7____any-help/3?layout=flat
https://github.com/ibillxia/htk_3_4_1/blob/master/HTKTools/HHEd.c
https://stackoverflow.com/questions/36838610/error-3219-hvite-hmm-list-file-name-expected
http://blog.sina.com.cn/s/blog_52e3868f0100vrvu.html
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/monophones/step-7/comments/hhed
http://www.voxforge.org/home/dev/acousticmodels/linux/create/htkjulius/tutorial/monophones/step-7

沒有留言:

張貼留言

© ERIC RILEY , 自由無須告知轉貼
Background Japanese Sayagata by Olga Libby