[试题] 105下 陈信希 自然语言处理 期中考

楼主: kevin1ptt (蚁姨椅yee)   2017-04-20 11:41:25
课程名称︰自然语言处理
课程性质︰系内选修
课程教师︰陈信希
开课学院:电资学院
开课系所︰资讯工程学系
考试日期(年月日)︰106/04/20
考试时限(分钟):180
试题 :
1. The following questions concern the resources used in natural language
processing (NLP) researches.
(a) Annual Meeting of Association for Computational Linguistics (ACL)
and International Conference on Computational Linguistics (COLING)
are two top tier/representative conferences in NLP. Please specify
the largest NLP archive in the world, which keeps the major
NLP conference proceedings. (5 points)
(b) If we need an English treebank to train a parser, please suggest
an organization where we can purchase the required treebank.
(5 points)
(c) If we need a balanced Chinese corpus to develop a Chinese segmentation
system, please suggest an organization where we can get the required
corpus. (5 points)
2. A pipelined NLP system can be composed of morphological processing
module, syntactic analysis module, semantic interpretation module
and discourse analysis module. Please use the following sentence
to describe any 5 operations in the pipelined system. The operations
can be selected from the same module or different modules. Please
also address to which module the mentioned operation belongs.
(20 points)
英国首相今天宣布提前大选,英镑转贬,但随后重升。
3. Labelling/tagging operation plays an important role in natural
language processing. Different labels are proposed at different
analysis levels. For example, a set of part-of-speech (POS) tags
are defined at the lexical level. POS tagger aims at labelling each
word in a sentence a POS tag. Here tagging is a labelling operation.
Please specify 3 other labelling (tagging) operations in NLP.
(15 points)
4. The following shows a review of a hotel:
客房古老,面积不大,不过景观很好,可以看见秦淮河。
The words "客房", "面积", and "景观" are aspect terms. In contrast, the
words "古老", "大", and "好" are opinion words, which modifies aspect
terms and shows the polarity on the aspect. In some case, only opinoin
words are used in a review, but aspect terms are absent (i.e.
implicit aspect). In the sentence "这是千万画素里最便宜的一台", we
know the opinion word "便宜" modifies an implicit aspect term "价钱".
Given a hotel review corpus, please propose a method to find the
collocation of opinion word and aspect term, and use the findings to
deal with implicit aspect problem. (10 points)
5. One of the applications of language model is to estimate the
probability of next word given previous n-1 words. Please compare
traditional language model and neural probability language model to
deal with this problem. (10 points)
6. In training HMM model, we need to compute number of each individual
arc (link) passed for a training instance. How can we compute this
number for each arc efficiently without enumerating all the paths?
(10 points)
7.
(a) What is zero probability problem? (5 points)
(b) What is the major problem of Laplace smoothing? (5 points)
(c) How does Kneser-Ney Smoothing word to deal with the zero probability
problem? (10 points)
(d) In traditional language modeling, smoothing technique is introduced
to avoid zero probability problem. In distributed representation, we
associate each word in the vocabulary with a distributed dense vector.
Similar words (semantically and syntactically) will be close in the
embedding space. Is it necessary to introduce smooth technique to
neural probability language model? Please explain why. You can use
the following examples to explain your answers. (10 points)
The cat is walking in the bedroom.
A dog was running in a room.
8. In analogy analysis, two pairs of words which share a relation are
given. We aim at identifying a hidden word based on three other words.
Word embedding is shown to be powerful in this application. Please
present three similarity computation methods to find the hidden word.
(10 points)

Links booklink

Contact Us: admin [ a t ] ucptt.com