[试题] 109-1 李琳山 数位语音处理概论 期中考

楼主: unmolk (UJ)   2021-06-27 07:27:09
课程名称︰数位语音处理概论
课程性质︰电机系/资讯系选修
课程教师︰李琳山
开课学院:电资学院
开课系所︰电机系
考试日期(年月日)︰109.11.11
考试时限(分钟):120
试题 :
注:部分数学式以LaTeX语法表示。
1. (8 pts) What is GMM? How is it usually used in HMMS for speech recognition?
2. (8 pts) What is K-means algorithm? How is it used in speech recognition?
3. In HMM, Viterbi Algorithm is used to find the single best state sequence. V-
ariable \delta_t(i) is defined as:
\delta_t(i) =
\max_{q_1...q_{t-1} P[q_1,...,q_{t-1}, q_t = i, o_1,...,o_t | \lambda]
The induction step of the algorithm is:
\delta_{t+1}(j) = (\max_i [\delta_t(i)a_{ij}])b_j(o_{t+1})
(a) (3 pts) Can we change the induction step into the equation shown below?
Please explain why.
\delta_{t+1}(j) = \max_i [\delta_t(i)a_{ij}b_j(o_{t+1})]
(b) (5 pts) Can we change the induction step into the equation shown below?
Please explain why.
\delta_{t+1}(j) = (\max_i [\delta_t(i)])a_{\sigma(i)j}b_j(o_{t+1})
Where \sigma(i) = argmax_i \delta_t(i).
4. (9 pts) We wish to calculate the accuracy for some speech recognition resul-
ts. Please list the insertions, deletions, substitutions, and calculate the
accuracy with the formula taught in class (insertions, deletions, substitut-
ions have the same penalty weight).
reference: the dog sat on the mat
recognized: the dogs on the mat are
5. Below is a dataset for training a bi-gram language model.
dataset
<sos> I am Sam <eos>
<sos> I am legend <eos>
<sos> Bob I am <eos>
(a) (4 pts) Calculate the probabilities: P(I|<sos>), P(am|I), P(Sam|am),
P(<eos>|Sam).
(b) (3 pts) Calculate the probability of P(<sos> I am Sam <eos>) using uni-
gram plus bi-grams only.
(c) (3 pts) With the bi-grams trained above, for a given sentence "<sos> I
am Bob <eos>", the probability P("<sos> I am Bob <eos>") = 0 (note that this
given sentence is not in the training set). However, this sentence is a rea-
sonable sentence and should not have zero probability. Propose a method to
fix this problem (you do not have to explain your method in detail).
6. In language modelinng, perplexity is a very useful parameter.
(a) (4 pts) What is perplexity of a language model with respect to a testing
corpus?
(b) (3 pts) A training corpus consists of only a single sentence:
<sos> dsp so easy <eos>
The testing corpus also consists of only a single sentence:
<sos> so easy dsp <eos>
We use the training corpus to train a bi-gram language model (bi-grams that
do not exist in thetraining corpus have probabilities equal to 0). What is
the perplexity on the testing corpus?
(c) (3 pts) Following the previous question, what is the perplexity on the
testing corpus if the testing corpus consists of only a single sentence:
<sos> dsp so easy <eos>
7. (a) (5 pts) Speech signals arae roughly categorized into voiced and unvoiced
. Explain the distinction between the two.
(b) (5 pts) Explain how the derivatives of the 13 MFCC parameters (14th -
26th and 27th - 39th) are actually calculated.
8. There are many different strategies for search or decoding.
- Exhaustive Search: Exhaustively enumerate all possible output sequences with
their probabilities, then output the one with the highest probability.
- Beam Search (beam width k): At the first time index, we select k tokens with
the highest probabilities. At each subsequent time index, we continue to sel-
ect k tokens with the highest probabilities.
- Greedy Search: At any time index, we search for and output the token with the
highest probability. (you can view this as beam search with k = 1.)
Given a tree with tokens as nodes and the edge weights representing the bi-gram
probabilities (e.g., P(停 | 要) = 0.3):
https://imgur.com/VtE29I4
(a) (3 pts) What is the decoding output with exhaustive search?
(b) (3 pts) What is the decoding output with greedy search?
(c) (3 pts) What is the decoding output of beam search with k = 2?
9. (7 pts) Bob is a hard-working student. There are many courses for the new s-
emester. He made a tabke as below listing the attributes of the courses and th-
en decided whether to take a course or not as listed on the rightmost column in
the table. You are to analyze how he made the decision using a decision tree.
https://imgur.com/UpQ2mPT
Construct a decision tree so that each leaf node of the tree clearly indicate
he decided to take a course or not. (It is fine not to use all the attributes,
and you just only have to provide one solution if there are multiple solutions.
10. (8 pts) What is the context dependency when we try to train HMMs for small
sound units?
11. Below are two signals:
the reference signal [x_i, i=1,...,6] and test signal [y_j , j=1,...,7], r-
espectively.
https://imgur.com/Iq14IeF
We want to find an optimal path for matching two signals with Dynamic Time War-
ping (DTW). Define D(i,j) to be the accumulated minimum distance up to (i,j).
- endpoint constraints: the optimal path must begin at (i,j) = (1,1) and end at
(i,j) = (6,7).
- local constraints: only the thre moves shown in Fig. 1 are allowed.
- recursive relationship:
D(i,j) = \min (D(i,j-1) + d(i,j), D(i-1,j-1) + d(i,j)/2, D(i-1,j) + d(i,j))
for i = {2,...,6}, j = {2,...,7}, where d(i,j) = (x_i - y_i)^2.
(a) (9 pts) Finish the dynamic programming table (D(i,j)) shown in Fig. 2.
(The first row and column are done for you.)
https://imgur.com/KA7Zj0f
(b) (4 pts) Find an optimal path for matching the two signals (remember th-
at this path should begin at (1,1) and end at (6,7)).

Links booklink

Contact Us: admin [ a t ] ucptt.com