[试题] 104-2 陈宏 多变量统计分析 期末考

楼主: SamBetty (sam)   2017-02-20 22:06:49
课程名称︰多变量统计分析
课程性质︰应数所数统组必修
课程教师︰陈宏
开课学院:理学院
开课系所︰数学系
考试日期(年月日)︰2016/6/15
考试时限(分钟):15:30~17:50
试题 :
1. (40%) Kernel funcitons implicitly define some mapping function ψ(‧) that
p
transforms an input instance x∈R to high dimensional space Q by giving
the form of inner product in
Q : K(x ,x )=<ψ(x ),ψ(x )>.
i j i j
1 2
Assume we use radial basis kernel function K(x ,x )=exp(-——||x -x || ).
i j 2 i j
Prove that for any two input instances x and x , the squared Euclidean
i j
distance of their corresponding points in the feature space Q is less than
2
2, i.e. prove that ||ψ(x )-ψ(x )|| ≦ 2.
i j
2. (30%) Given n training examples (x ,x ), i,j-1,...,n, the kernel matrix A
i j
is an n ×n square matrix, where A(i,j)=K(x ,x ). Prove that the kernel
i j
matrix A is semi-positive definite. (Hint: Recall that K(x ,x )=<ψ(x )
i j i
,ψ(x )> as stated in Question 1.)
j
Questions 1 and 2 are used to examine whether you know What is the kernel
trick?
p
3. (40%) Consider the set of all closed balls in R , that is sets of the form
p 2 2 p
{x∈R : ||x-x || ≦r } for some x ∈R and r≧0 is less than or equal to
0 0
p+2.
Hint: Convert it into the form of hyperplane in terms of x ,x ,...,x
1 2 p
2
and Σx .
i i
4. (30%) When p=2, a square that assigns points within as one class and points
outside as another class. Draw a scenario where this classifier shatters all
points for the VC dimension you have proposed.
Questions 3 and 4 are used to examine whether you know How to find out the
complexity of your learner?
5. (50%) (Chernoff Bound) Let X ,X ,...,X be independent random variables,
1 2 n
n
each receiving the values {-1,1} with probability 1/2. Define S =Σ X .
n i=1 i
Show that, for any real number t>0,
2
P(S ≧t)≦exp(-t /2n).
n
2
6. (40%) Consider logistic regression with (x ,y )∈R , 1≦i≦n, in which
i j
log p(x)/[1-p(x)] = β+βx. Note that both |β| and |β| are bounded above
0 1 0 1
by 1. For simplicity, consider x =i/n. Prove that the mle of (β,β) is
i 0 1
consistent under the setting x =i/n as n goes to the infinity.
in
Questions 5 and 6 are designed to examine whether you know trick to find
probability error bound on your learner? and derive its theoretical
property.
T
7. (60%) Suppose that X follows a bivariate distribution with E [X]=μ=(1,1)
1 1
T
in group 1, E [X]=μ=(0,0) in group 2, and common covariance matrix Ψ
2 2 1
which is
( 1 ρ)
( ), -1 < ρ < 1.
( ρ 1 )
(a)(40%) Find a which maximizes the following ratio
T 2
[a (μ-μ)]
1 2
—————————.
T
a Ψa
1
(b)(20%) Find the total probability of misclassification using Fisher's
linear discriminant function with equal prior probability on group 1 and
group 2.
P
8. (50%) Consider a two-class classification on R problem with densities f
1
= N(μ,Σ), f = N(μ,Σ), and class membership probabilities π=P(class 1)
1 2 2
= 1 - P(class 2). This model can be constructed hierarchically:
1. generate L~Bernoulli(π)
2. if L=1: then generate X~N(μ,Σ)
1
3. else: generate X~N(μ,Σ).
2
(a) Compute P(L=1|X) and show that P(L=1|X)/P(L=0|X) has a logistic form.
(b) Suppose now that the covariance matrix was not the same in each group
(Σ). Does the probability P(L=1|X) still have a logistic form? (You can
i
answer this question with p=2.)
9. (50%) I have three coins in my pocket, Coin 0 has probability λ of heads;
Coin 1 has probability p of heads; Coin 2 has probability p of heads. For
1 2
each trial I do the following:
First I toss Coin 0
If Coin 0 turns up heads, I toss coin 1 three times
If Coin 0 turns up tails, I toss coin 2 three times
I do not tell you whether Coin 0 came up heads or tails, or whether Coin 1
or 2 was tossed three times, but I do tell you how many heads/tails are
seen at each trial. You see the following sequence (H,H,H),(T,T,T),(H,H,H),
(T,T,T),(H,H,H).
(a) How do we find the maximum likelihood parameters of λ,p , and p ?
1 2
(b) How do you use EM to find the solution?

Links booklink

Contact Us: admin [ a t ] ucptt.com