[试题] 102下 陈信希 自然语言处理 期中考

楼主: a123zyx (小企)   2014-04-18 10:39:32
课程名称︰自然语言处理
课程性质︰资工所选修
课程教师︰陈信希
开课学院:电资学院
开课系所︰资工所
考试日期(年月日)︰4/10
考试时限(分钟):180min
是否需发放奖励金:是
(如未明确表示,则不予发放)
试题 :
1.Opinion mining and sentiment analysis is a very important NLP application
nowadays. A review is usually composed of some aspects about an opinion
target and the opinion words expressing polarity about the aspects. The
following review aboit Howard Civil Service International House (福华文教
会馆) is selected from the tripadvisor. Please indicate what explicit
aspects and opinion words are shown in this review. (10 points)
“我们的房间非常棒。饭店员工很不错,而且总是会有会说英文的人
可以服务我们。他们提供非常好的用餐建议,并确认是否有优良计程
车司机可以为我们服务。地点很适合商务旅行,只要走一点路就可到
达餐厅、银行和服务业。饭店自助餐还不错,咖啡馆也是。总之,这
是一个不错的住宿经验。”
2.Machine translation (MT) is another important NLP application. It aims to
translate a document in one language into a document in another language.
There are many challenging issues in designing MT systems. The following
shows an English sentence and three Chinese sentences translated by using
Google translate in 2008, 2012 and 2014, respectively. Please translate this
Englisg sentence into a Chinese one and analyze why MT is challenging from
this example. (10 points)
Source: Taiwan wins gold in woman's 75 kg powerlifting in Paralympics
2008 : 台湾胜金在妇女的75公斤 powerlifting 在残奥会
2012 : 台湾胜在残奥会举重女子75公斤黄金
2014 : 台湾胜金在女子75公斤级举重残奥会
3.Basically, an NLP system is a pipeline of four modules which deal with
different problems on different linguistic levels. Please explain the
functions of each module. (12 points)
4.A blog post may be composed of sentences with emoticons. The non-verbal
emotional expressions described the author's feelings with s/he wrote down
the post. The following shows some typical examples. Given a collection of
sentences, each of them containing an emoticon, we plan to learn an emotion
dictionary with mutual information. The dictionary keeps the emotion
tendency of each word. Please define mutual information (MI) at first, and
then discuss how you achieve the goal with MI. (10 points)
●今天跟你约吃饭 不知为什么特别紧张 :o
●谢谢你请我吃饭 还送我礼物:目
●但收到的时候还是很开心:P
(以上表情符号皆为图片,仅以相似之符号表达)
5.The t-test is a useful hypothesis testing tool. It can be used to learn
multi-word expressions from a large corpus. Moreover, it can also be used
to tell out if the performance of two models differ significantly. Please
specify the TWO applications of t-test in detail. (10 points)
6.A person found an old book inside a wall when restructing a historical
building. He claimed that the book was written in the 16th century. Assume
you have several book corpora written in the 15th, 16th, ..., 20th century,
respectively. How do you verify the claim is true based on the book content?
The person further claimed that the book was written by William Shakespeare
(1564-1616). Please design a method to verify if the book is fake based on
the written style of William Shakespeare. (10 points)
7.The following defines basic symbols for smoothing.
N:total occurrences of n-grams in a training dataset.
B:total types of n-grams
r:frequency of an n-grams
Nr:total number of n-grams of frequency r in a training dataset
Tr:total occurrences of n-grams of frequency r in further dataset
Please give a formula to estimate the probability of an unseen n-gram for
each of the smoothing methods. (12 points)
(a) Add a small value λ to all types of n-grams.
(b) Subtract a constant δ from each non-zero count.
(c) Estimate by held out dataset.
8.What are the differences between deleted interpolation and back-off model?
Please take the computation of P(Wn|Wn-3,Wn-2,Wn-1) as an example.(10 points)
9.Given a model λ and an observation sequence O,
(a) find the probability of the sequence with Backward algorithm. (8 points)
(b) find the best path with Viterbi algorithm. (8 points)
10.Forward probability and backward probability are often used to determine
the parameters in an HMM model. Please show how it works. (10 points)

Links booklink

Contact Us: admin [ a t ] ucptt.com