从独立同分布的数据估计参数
假设我们要从独立同分布的随机变量
D
=
{
x
1
,
…
,
x
n
}
D=\{x_1,\dots,x_n\}
D={x1,…,xn}来估计出一个
θ
\theta
θ
最大似然估计:
L
(
θ
)
=
P
(
D
;
θ
)
=
P
(
x
1
,
…
,
x
n
;
θ
)
=
P
(
x
1
;
θ
)
P
(
x
2
;
θ
)
…
P
(
x
n
;
θ
)
=
Π
i
N
P
(
x
i
;
θ
)
L(\theta)=P(D;\theta)=P(x_1,\dots,x_n;\theta)\\=P(x_1;\theta)P(x_2;\theta)\dots P(x_n;\theta)\\=\Pi_{i}^NP(x_i;\theta)
L(θ)=P(D;θ)=P(x1,…,xn;θ)=P(x1;θ)P(x2;θ)…P(xn;θ)=ΠiNP(xi;θ)
θ
^
=
arg max
θ
L
(
θ
)
=
arg max
θ
l
o
g
(
L
(
θ
)
)
\hat{\theta}=\argmax_{\theta}L(\theta)=\argmax_{\theta}log(L(\theta))
θ^=θargmaxL(θ)=θargmaxlog(L(θ))
例1:伯努利模型
P
(
x
)
=
θ
x
(
1
−
θ
)
1
−
x
P(x)=\theta^x(1-\theta)^{1-x}
P(x)=θx(1−θ)1−x
那么
L
(
θ
)
=
Π
P
(
x
i
;
θ
)
=
Π
(
θ
x
i
(
1
−
θ
)
1
−
x
i
)
=
θ
∑
i
=
1
N
x
i
(
1
−
θ
)
∑
i
=
1
N
(
1
−
x
i
)
L(\theta)=\Pi P(x_i;\theta)=\Pi(\theta^{x_i}(1-\theta)^{1-x_i})=\theta^{\sum\limits_{i=1}^Nx_i}(1-\theta)^{\sum\limits_{i=1}^N(1-x_i)}
L(θ)=ΠP(xi;θ)=Π(θxi(1−θ)1−xi)=θi=1∑Nxi(1−θ)i=1∑N(1−xi)
log
L
(
θ
)
=
log
θ
∑
i
=
1
N
x
i
(
1
−
θ
)
∑
i
=
1
N
(
1
−
x
i
)
=
n
h
log
θ
+
(
N
−
n
h
)
log
(
1
−
θ
)
\log L(\theta)=\log \theta^{\sum\limits_{i=1}^Nx_i}(1-\theta)^{\sum\limits_{i=1}^N(1-x_i)}=n_h\log\theta+(N-n_h)\log(1-\theta)
logL(θ)=logθi=1∑Nxi(1−θ)i=1∑N(1−xi)=nhlogθ+(N−nh)log(1−θ)
∂
log
L
(
θ
)
∂
θ
=
n
h
θ
−
N
−
n
h
1
−
θ
=
0
\frac{\partial\log L(\theta)}{\partial\theta}=\frac{n_h}{\theta}-\frac{N-n_h}{1-\theta}=0
∂θ∂logL(θ)=θnh−1−θN−nh=0
(
1
−
θ
)
n
h
=
θ
(
N
−
n
h
)
⇒
θ
^
M
L
E
=
n
h
N
(1-\theta)n_h=\theta(N-n_h)\Rightarrow\hat{\theta}_{MLE}=\frac{n_h}{N}
(1−θ)nh=θ(N−nh)⇒θ^MLE=Nnh
例2:单变量正态分布模型
P
(
x
)
=
(
2
π
σ
2
)
−
1
2
exp
{
−
(
x
−
μ
)
2
/
2
σ
2
}
P(x)=(2\pi\sigma^2)^{-\frac{1}{2}}\exp\{-(x-\mu)^2/2\sigma^2\}
P(x)=(2πσ2)−21exp{−(x−μ)2/2σ2}
L
(
θ
)
=
Π
P
(
x
i
)
=
Π
(
2
π
σ
2
)
−
1
2
exp
{
−
(
x
i
−
μ
)
2
/
2
σ
2
}
L(\theta)=\Pi P(x_i)=\Pi (2\pi\sigma^2)^{-\frac{1}{2}}\exp\{-(x_i-\mu)^2/2\sigma^2\}
L(θ)=ΠP(xi)=Π(2πσ2)−21exp{−(xi−μ)2/2σ2}
log
L
(
θ
)
=
−
N
2
log
2
π
σ
2
−
1
2
σ
2
∑
i
=
1
N
(
x
i
−
μ
)
2
\log L(\theta)=-\frac{N}{2}\log{2\pi\sigma^2}-\frac{1}{2\sigma^2}\sum\limits_{i=1}^N(x_i-\mu)^2
logL(θ)=−2Nlog2πσ2−2σ21i=1∑N(xi−μ)2
∂
log
L
(
θ
)
∂
μ
=
1
σ
2
∑
i
=
1
N
(
x
i
−
μ
)
=
0
⇒
μ
M
L
E
=
∑
i
=
1
N
x
i
N
\frac{\partial\log L(\theta)}{\partial\mu}=\frac{1}{\sigma^2}\sum\limits_{i=1}^N(x_i-\mu)=0\Rightarrow\mu_{MLE}=\frac{\sum\limits_{i=1}^Nx_i}{N}
∂μ∂logL(θ)=σ21i=1∑N(xi−μ)=0⇒μMLE=Ni=1∑Nxi
∂
log
L
(
θ
)
∂
σ
2
=
−
N
2
σ
2
+
1
2
σ
4
∑
i
=
1
N
(
x
i
−
μ
)
2
=
0
⇒
σ
M
L
E
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
)
2
\frac{\partial\log L(\theta)}{\partial\sigma^2}=-\frac{N}{2\sigma^2}+\frac{1}{2\sigma^4}\sum\limits_{i=1}^N(x_i-\mu)^2=0\Rightarrow\sigma^2_{MLE}=\frac{1}{N}\sum\limits_{i=1}^N(x_i-\mu)^2
∂σ2∂logL(θ)=−2σ2N+2σ41i=1∑N(xi−μ)2=0⇒σMLE2=N1i=1∑N(xi−μ)2
为了防止过拟合(上面求解出来的太过于精确了)我们就会用 θ ^ M L = n h n h + n t \hat{\theta}_{ML}=\frac{n_h}{n_h+n_t} θ^ML=nh+ntnh
贝叶斯学习
P
(
θ
∣
D
)
=
P
(
D
∣
θ
)
P
(
θ
)
P
(
D
)
P(\theta|D)=\frac{P(D|\theta)P(\theta)}{P(D)}
P(θ∣D)=P(D)P(D∣θ)P(θ),其中
P
(
θ
∣
D
)
P(\theta|D)
P(θ∣D)就是后验,
P
(
D
∣
θ
)
P(D|\theta)
P(D∣θ)就是上面求的似然估计,
P
(
θ
)
P(\theta)
P(θ)就是
θ
\theta
θ的一个先验,
P
(
D
)
P(D)
P(D)就是相当于是一个归一项、常熟。
那么,
P
(
θ
∣
D
)
∝
P
(
D
∣
θ
)
P
(
θ
)
P(\theta|D)\propto P(D|\theta)P(\theta)
P(θ∣D)∝P(D∣θ)P(θ)。
其中,最大后验概率就是
θ
^
M
A
P
=
arg max
θ
P
(
θ
∣
D
)
\hat{\theta}_{MAP}=\argmax_\theta P(\theta|D)
θ^MAP=θargmaxP(θ∣D)。
当
θ
\theta
θ的先验是一个均匀分布时,那么
P
(
θ
)
P(\theta)
P(θ)也是一个常熟,那么
P
(
θ
∣
D
)
∝
P
(
D
∣
θ
)
P(\theta|D)\propto P(D|\theta)
P(θ∣D)∝P(D∣θ)即MLE=MAP
例1:Beta分布,似然是伯努利分布
给
θ
\theta
θ一个先验是Beta分布,
P
(
θ
)
=
Γ
(
α
+
β
)
Γ
(
α
)
Γ
(
β
)
θ
α
−
1
(
1
−
θ
)
β
−
1
P(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}
P(θ)=Γ(α)Γ(β)Γ(α+β)θα−1(1−θ)β−1
那么
θ
\theta
θ的后验
P
(
θ
∣
D
)
∝
P
(
D
∣
θ
)
P
(
θ
)
∝
θ
n
h
(
1
−
θ
)
N
−
n
h
×
θ
α
−
1
(
1
−
θ
)
β
−
1
=
θ
n
h
+
α
−
1
(
1
−
θ
)
N
−
n
h
+
β
−
1
∝
B
e
t
a
(
α
+
n
h
,
β
+
n
t
)
P(\theta|D)\propto P(D|\theta)P(\theta)\propto\theta^{n_h}(1-\theta)^{N-n_h}\times\theta^{\alpha-1}(1-\theta)^{\beta-1}=\theta^{n_h+\alpha-1}(1-\theta)^{N-n_h+\beta-1}\propto Beta(\alpha+n_h,\beta+n_t)
P(θ∣D)∝P(D∣θ)P(θ)∝θnh(1−θ)N−nh×θα−1(1−θ)β−1=θnh+α−1(1−θ)N−nh+β−1∝Beta(α+nh,β+nt)
也就是先验和后验共轭了
例2:Beta分布,似然是二项分布
给
θ
\theta
θ一个先验是Beta分布,
P
(
θ
)
=
Γ
(
α
+
β
)
Γ
(
α
)
Γ
(
β
)
θ
α
−
1
(
1
−
θ
)
β
−
1
P(\theta)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}
P(θ)=Γ(α)Γ(β)Γ(α+β)θα−1(1−θ)β−1
P
(
D
∣
θ
)
=
C
n
k
θ
k
(
1
−
θ
)
n
−
k
P(D|\theta)=C_n^k\theta^k(1-\theta)^{n-k}
P(D∣θ)=Cnkθk(1−θ)n−k
那么后验概率
P
(
θ
∣
D
)
∝
P
(
D
∣
θ
)
P
(
θ
)
∝
θ
α
−
1
(
1
−
θ
)
β
−
1
θ
k
(
1
−
θ
)
n
−
k
∝
B
e
t
a
(
α
+
k
,
β
+
n
−
k
)
P(\theta|D)\propto P(D|\theta)P(\theta)\propto\theta^{\alpha-1}(1-\theta)^{\beta-1}\theta^k(1-\theta)^{n-k}\propto Beta(\alpha+k,\beta+n-k)
P(θ∣D)∝P(D∣θ)P(θ)∝θα−1(1−θ)β−1θk(1−θ)n−k∝Beta(α+k,β+n−k)
例3:
多项分布:
P
(
x
1
,
…
,
x
k
;
n
,
p
1
,
…
,
p
k
)
=
n
!
x
1
!
…
x
n
!
p
1
x
1
…
p
k
x
k
P(x_1,\dots,x_k;n,p_1,\dots,p_k)=\frac{n!}{x_1!\dots x_n!}p_1^{x_1}\dots p_k^{x_k}
P(x1,…,xk;n,p1,…,pk)=x1!…xn!n!p1x1…pkxk
狄利克雷分布:
P
(
θ
1
,
…
,
θ
K
)
=
1
B
(
α
)
Π
i
K
θ
i
α
i
−
1
P(\theta_1,\dots,\theta_K)=\frac{1}{B(\alpha)}\Pi_i^K\theta_i^{\alpha_i-1}
P(θ1,…,θK)=B(α)1ΠiKθiαi−1
共有条评论 网友评论