课程名称︰劳动经济学一
课程性质︰经济系选修
课程教师︰樊家忠老师
开课学院:社会科学院
开课系所︰经济学系
考试日期(年月日)︰2017/11/08(三)
考试时限(分钟):14:20~16:20 共2小时
试题 :
满分180分
Question 1 (50 points)
Elliott is contemplating an estimation of the causal effect of the 90-day
operation of sobriety checkpoints, starting on June 1, 2012, on traffic
incidents caused by driving under influence (DUI). Elliott's data report the
daily number of DUI accidents in each country in 2011 and 2012. The sample
spans 150 days prior to June 1 and 180 days after June 1, for both years. No
other anti-DUI policy was carried out during the two years. The regression is
specified as:
DUI(c,d,t) = α + βT + γI + ρ(I * T) + X(c,d,t)π + ε(c,d,t)     (1)
In equation (1), DUI(c,d,t) is number of DUI accidents in country c on day d
of year t. I is a dummy variable indicating the post-intervention time period.
T is the treatment indicator, and set at 1 for the treatment year. X(d,c,t)
refers to other control variables.
(a) What is the economic meaning of the coefficient of α?
(b) What is the control group in this setting? Why is it a proper control
    group? [ explain in 30 words ]
(c) 
    Figure 1 presents DUI accidents for 2012 (black curve) and 2011 (gray
    curve). The 30-day period that begins with June 1 is called period 1; the
    next 30 days is period 2, whereas the 30 days before June 1 is period 0,
    and so on. Given the information provided by Figure 1, what do you expect
    the signs of β, γ, and ρ to be? Your answers can be positive, negative,
    or zero. Briefly explain each of your answers.
(d) Given the information provided by Figure 1, which of the following do you
    think X(d,c,t) should include? Why? [explain in 30 words]
    (1) A trend variable
    (2) A squared term of the trend variable
    (3) Both a trend variable and its squared term
    (4) None of above
(e) Design a regression to test on the parallel assumption. As shown in
    equation (1), specify the regression equation, explain the dependent,
    independent varibles, and the sample. In your regression, which
    coefficients should be used to test on the parallel assumption?
    DUI(c,d,t) = α + βT + γTrend + δ(Trned * T) + X(d,c,t) + ε(c,d,t)
Question2: (40 points)
Lee (2008) estimated the causal effect of party incumbency on re-election
probabilities, using U.S. data. His interest is whether the Democratic candidate
for the seat in the U.S. House of Representatives has an advantage if his
party won the seat last time. The widely-noted success of House incumbents
raises the question of whether representatives use the privilages and
resources of their office to gain advantage for themselves or their parties.
Lee applied a regression discontinuity design (RDD).
Figure 2 plots the probability a Democrat wins against the difference between
Democratic and Republican votes shares in the previous election. The dots in
the figure are local averages ( the average win rate in non-overlapping
windows of share margins that are .005 wide). The probability of a Democratic
win at election t + 1 is an increasing function of vote share won by the
Democratic candidates minus the vote share won by the Republican candidate at
election t. The most important feature of the plot is the dramatic jump in win
rates at the 0 mark, the point where the two candidates get the same votes.
Based on the size of jump, incumbency appears to raise party re-election
probabilities by about 40 percentage points.
(a) Define the treatment variable and outcome variable in this study.
(b) Is this a sharp or fuzzy RDD? Why?
(c) The two validity requirements for a RDD are (1) smoothness of density
    distribution of observations across the cutoff point; and (2) smoothness
    of the means of all observables at the cutoff point. To examine the
    validity of his RDD, Lee presented Figure 3 as shown above. The variable
    in the vertical axis refers to the number of Democratic victories in the
    elections before election t. Is Figure 3 useful to support requirement (1),
    (2), neither, or both? Why?    
(d) Design tand layout the regression equation for the estimation of Lee's RDD.
    Carefully define the dependent variable and independent variables used in
    the equation. Specify the coefficient that indicates the jump in win rates
    at the cutoff point.
Question 3: (50 points)
Dr. John Snow is regarded as one of the founding fathers of modern epidemiology
As London suffered a series of cholera(霍乱) outbreaks during the mid-19th
century, Snow theorized that cholera reproduced in the human body and was
spread through contaminated water. This contradicted the prevailing theory that
diseases were spread by "miasma" (瘴气) in the air. This question regards his
research design.
During the 1854 outbreak, there were only two water supplies in London - the
Southwark & Vauxhall Water (SVW) Company and the Lambeth Water (LW) company.
Note that SVW pumped water from a part of River Thames that was contaminated
with sewage, while LW pumped its water from further upstream, where the River
Thames was clean. The entire London can be catergorized into three areas:
● Area A was supplied only by SVW
● Area B was supplied only by LW, and
● Area C by both. In area C, a household is either supplied by SVW or LW, and
   adjacent houses often had different water suppliers and did not know who
   their suppliers was.
(a) With death data of all the three areas in 1854, Snow is intended to
    estimate the casual effect of contaminated water on the hazard of cholera
    defined by deaths caused by cholera in each 10,000 houses. There are two
    simple-difference approaches to estimate the effect: (1) comparing area A
    as the treatment group and area B as the control group; and (2) comparing
    households supplied by SVW in area C as the treatment group, and those
    supplied by LW in area C as the control group. Which approach is more
    desirable? Why or why not?
Snow obtained a list of all cholera deaths in areas A and B during the first
seven weeks of the 1854 outbreak. For each death, he determined the water
supplier to the house of the deceased. The following table presents his data.
______________________________________________________________________________
               │
               │  Number of houses         Death from        Death in each
               │
               │                             Cholera         10,000 houses
               │
______________________________________________________________________________
               │
SVW (area A)   │       40,048                 1,263              315
               │
               │
LW (area B)    │       26,107                   98                38
               │
               │
Other Cities   │       256,483                1,488               58
______________________________________________________________________________
(b) The treatment effect can be expressed by conditional expectation function:
    E(Y,dirty | SVW) - E(Y,clean | SVW), where SVW indicates the treatment
    (water supplied by SVW) and Y is the potential hazard if treated (Y,dirty)
    or not (Y,clean). Since E(Y,clean | SVW) cannot be observed, we can only
    measure E(Y,dirty | SVW) - E(Y,clean | LW) as a proxy. Please use figures
    in the above table to calibrate the treatment effect.
(c) In 30 words, explain why your answer in part (a) may suffer from selection
    bias,
(d) Present the selection bias using condition expectation function.
(e) Suppose earlier in 1849, LW still sourced dirty Thames water before the
    company moved the source up-steam in 1853. Now Snow also collected data
    from the 1849 outbreal in London. Please construct a
    difference-in-difference regression to estimate the treatment effect using
    the following two dummy variables:
    1. SVW: a dummy variable indicating water being supplied by Southwark &
            Vauxhall Water Company (as opposed to LW).
    2. Year1854: a dummy variable indicating year 1854 (as opposed to year 1849
Question 4: (40 points)
Elliott designs the following experiment. He picks a large primary school in
Taipei that has 24 year-one classes every year. At the beginning of the school
year, he randomly selects 50% of year-one students to catergorize them into the
treatment group, and the remaining 50% into the control group. Students in the
treatment group are assigned to a class according to their months of birth.
That is, January-born students are assigned to the one class, February-born
students to another class, etc. In total, there will be 12 classes in the
treatment group. For the control group, all students are randomly assigned to
another 12 classes without considering their months of birth. Teachers are
randomly assigned to the 24 classes. One year later, Elliott organizes a
mathematic test to all the students in the experiment. Assume that no student
or teachers quits the experiment or switches to another group.
(a) Students born in which month are the oldest at school entry in Taiwan? Why?
(b) If elliott finds that August-born students in the treatment group perform
    better than August-born students in the control group, provide a reason for
    such a difference. [ In 30 words ]
(c) If elliott finds that September-born students in the control group perform
    better in the math score than September-born students in the treatment
    group, provide a reason for such a difference. [In 30 words ]
(d) Provide a reason that the comparison made in pary (b) is biasesd because of
    the Hawthrone Effect. [ In 30 words]