[试题] 95上 李琳山 数位语音处理概论 期中考

楼主: rod24574575 (天然呆)   2014-04-20 19:14:13
课程名称︰数位语音处理概论
课程性质︰选修
课程教师︰李琳山
开课学院:电资学院
开课系所︰电机、资工系
考试日期(年月日)︰2006.12.15
考试时限(分钟):120
是否需发放奖励金:是
(如未明确表示,则不予发放)
试题 :
Digital Speech Processing, Midterm
Dec. 15, 2006, 10:10-12:10
● OPEN EVERYTHING
● 除专有名词可用英文以外,所有文字说明一律以中文为限,未用中文者不计分
● Total points: 165
● Note that you don't need to be able to answer all the questions.
───────────────────────────────────────
1. (10) Explain the concept of "Corpus-based Text-to-Speech Synthesis", how it
works and why it is good.

2. (25) Given a HMM λ = (A, B, π), an observation sequence O = o_1 o_2 ...
o_t ... o_T and a state sequence q(上面加底线) = q_1 q_2 ... q_t ... q_T

(a) (10) Formulate and describe the forward algorithm to evaluate P(O│λ).
Explain how it works.
(b) (10) Formulate and describe the Viterbi algorithm to find the best
state sequence q*(上面加底线) = q_1* q_2* ... q_t* ... q_T* giving the
highest probability Prob(q*(上面加底线), O(上面加底线) │λ). Explain
how it works.
(c) (5) Now in order to recognize L words w_1, w_2, ... , w_L each with an
HMM respectively, λ_1, λ_2, ... , λ_L it is well known that one can
use either the forward algorithm or the Viterbi algorithm,
╴ ╴ ╴
arg max P(O│λ_k) arg max P(q*, O│λ_k)
k k
Explain why and discuss the difference between them.
3. (10) Write down the procedures for LBG algorithm and discuss why and how it
is better than the K-means algorithm.
4. (10) Explain: in designing the decision tree to train tri-phone models, how
the information theory is used to split a node n into two nodes a and b.
5. (10) In Classification and Regression Trees (CART), one can use composite
questions instead of simple questions only. Write down what you know about
this.
6. (10) The perplexity of a language source S is
H(S)
PP(S) = 2 , H(S) = -Σ p(x_i) log[p(x_i)],
i
where x_i is a word in the language, Explain why PP(S) is the estimate of
the branching factor for the language assuming a "virtual vocabulary"?
7. (10) Explain the detailed principles and process for Katz smoothing.
8. (10) Given a set of events {x_i, i = 1, 2, ... , M}, {p(x_i), i = 1, 2,
... , M} and {q(x_i), i = 1, 2, ... , M} are two probability distributions.
What is the Kullback-Leibler(KL) distance between p(x_i) and q(x_i) and
what does it mean?
9. (10)
(a) (5) What are the voiced/unvoiced speech signals and their time-domain
waveform characteristics?
(b) (5) What is pitch in speech signals and how is it related to the tones
in Mandarin Chinese?
10. (10) The Hamming window has much lower sidelobes but wider mainlobe as
compared to the rectangular window. Why is it good for front-end feature
extraction for speech recognition?
11. (10) For large vocabulary continuous speech recognition, explain how the
Viterbi algorithm can be performed such that the knowledge from the
acoustic models, lexicon and language model can be efficiently integrated?
12. (15) Under what kind of condition a heuristic search is admissible? Show
or explain why?
13. (15)
(a) (8) Explain why Maximum Likelihood Linear Regression (MLLR) approaches
can adjust a set of speaker-independent acoustic models to a new
speaker with very limited quantity of adaptation data, but the
performance is saturated at relatively lower accuracy?
(b) (7) Explain why tree-structured classes can be helpful here.
14. (10) In Latent Semantic Analysis the elements w_ij of the word-document
matrix W(上面加底线) is
c_ij
w_ij = (1 - ε_i) ──
n_j
Where c_ij is the number of times the word w_i occurs in the document d_j,
n_j is the total number of words in d_j, and
1 N c_ij c_ij N
ε_i = - ─── Σ (──) log(──), t_i = Σ c_ij
log N j=1 t_i t_i j=1
where N is the total number of documents. Explain the meaning of all these
parameters.

Links booklink

Contact Us: admin [ a t ] ucptt.com