[试题] 109-2 陈信希 自然语言处理 期末考

楼主: eayaps1788 (River1Z)   2021-07-09 15:32:18
课程名称︰自然语言处理
课程性质︰系选修
课程教师︰陈信希
开课学院:电资学院
开课系所︰资讯工程学系
考试日期(年月日)︰2021/6/24
考试时限(分钟):180
*因疫情改为线上,可查询网络资源
试题 :
1. Given the sentence “在 夫子庙 入口 遍布 我 喜欢 的 小吃店”, please show
the results after (a) constituency parser, (b) noun phrase chunker, and (c)
dependency parser. (15 points)
2. Assume arc-standard dependency parser is adopted. Please show the actions
to parse the sentence “在 夫子庙 入口 遍布 我 喜欢 的 小吃店”. (10 points)
3. Assume we have a set of four discourse relations – say, temporal,
contingency, comparison, and expansion, as defined in PDTB. Please judge if
”而” in each of the following sentences is a discourse connective.
If yes, please specify their relations based on the connective. (20 points)
(a) 1997 年发达国家经济形势的特点是[美国增长强劲]而[日本经济疲弱]。
(b) 开放起了[积极]而[关键]的作用。
(c) [这当然不是历史的巧合],而[是历史的累积和转接]。
(d) [水东开发区是适应乙烯工程的需要]而[建立的一个后继加工基地]。
4. In recent years, there are important advances in the quality of
state-of-the-art models, but those models are often less interpretable.
Nowadays “explainable NLP”is an emerging research when we develop a model.
Attention mechanism is widely used operation to enable explanations.
Please explain how it achieves "explanation." (10 points)
5. Nowadays newspapers become more partisan. Some research proposes a slant
index to measure the frequency of phrases to sway readers to the left or
the right in a media outlet. Some research investigates demographic
characteristics and political attitudes of newspaper readers in Taiwan from
1992 to 2004. Their studies conclude that media are biased, i.e.,
left-wing vs. right-wing in US and pan-green vs.panblue in Taiwan. Now you are
asked to design an NN model to transform a pan-green content to a pan-blue one.
Please show your idea. (10 points)
6. There are several ways to achieve semantic analysis. One possibility is a
sequenceto-sequence model to transform an NL sentence to a semantic form.
Another possibility is to extract the most important parts from an NL sentence,
such as Arg0, Arg1, and so on. Please explain the ideas behind these two
possible solutions. (15 points)
7. One major disadvantage of skip-gram and CBOW is the same representation for
different senses of a word. Do you have any idea to capture a suitable sense
of a word based on its context? (10 points)
8. To automatically interpret the semantics of written languages, the
analysis and understanding of causal relationships between facts stand as a
key point. The following shows three examples. The 2nd column shows a passage.
The cause and the effect extracted from the passage are shown in the 3rd and
4th columns, respectively.

Assume you are given a cause-effect corpus consisting of passages with
annotated cause and effect segments. You are asked to design a system to
identify the cause and effect segments from the given passage. (10 points)
9. For the privacy and security issues, electronic medical records (EMRs)
have to be de-identified before being released for potential applications.
According to HIPPA, 18 types of identifiable data must be removed, including
names, telephone, email addresses, IP addresses, social security numbers,
medical record numbers, and so on. Do you have any ideas to deal with this
problem? (10 points)

Links booklink

Contact Us: admin [ a t ] ucptt.com