dtcms 源码 UEditor Appuim环境搭建 web开发 MongoDB 另类堆栈 Jenkins laravel facebook redux knockoutjs stack angularjs版本 vue树形菜单 jquery的each遍历方法 bootstrap文件上传样式 monkey安装 linux查看jdk安装路径 java使用redis 升级python版本 车载u盘 pythonset python定义一个变量 javastring类型 java匿名对象 如何安装java环境 javaabstract java包名 bcdautofix ntscan kms神龙版 蒙文字体 html特殊字符 临时会话 网络工程师教程 maya2016教程 迅雷会员共享账号 碧桂园园宝 ABViewer 视频编辑专家下载
当前位置: 首页 > 学习教程  > 编程语言

《Mutual Information Neural Estimator》笔记

2020/12/5 10:04:53 文章标签:

1. Sketch Objective : NN to estimate mutual information : I(X,Z)DKL(PXZ∣∣PX⊗PZ)\textcolor{brown}{I(X,Z) D_{KL}(\mathbb{P}_{XZ}||\mathbb{P}_X \otimes \mathbb{P}_Z)} I(X,Z)DKL​(PXZ​∣∣PX​⊗PZ​) Approach : Donsker-Varadhan representation : DKL(P∣∣…

1. Sketch

Objective : NN to estimate mutual information :
I ( X , Z ) = D K L ( P X Z ∣ ∣ P X ⊗ P Z ) \textcolor{brown}{I(X,Z) = D_{KL}(\mathbb{P}_{XZ}||\mathbb{P}_X \otimes \mathbb{P}_Z)} I(X,Z)=DKL(PXZPXPZ)
Approach : Donsker-Varadhan representation :
D K L ( P ∣ ∣ Q ) = sup ⁡ T : Ω → R E P [ T ] − log ⁡ ( E Q [ e T ] ) \textcolor{brown}{D_{KL}(\mathbb{P} || \mathbb{Q}) = \sup_{T: \Omega \rightarrow \mathbb{R}} \mathbb{E}_{\mathbb{P}}[T] - \log(\mathbb{E}_{\mathbb{Q}}[e^T])} DKL(PQ)=T:ΩRsupEP[T]log(EQ[eT])
Mutual information neural estimator : Maximize
I ( X ; Z ) ^ n = sup ⁡ θ ∈ Θ E P X Z ( n ) [ T θ ] − log ⁡ ( E P X ( n ) ⊗ P ^ Z ( n ) [ e T θ ] ) \textcolor{brown}{\widehat{I(X;Z)}_n = \sup_{\theta \in \Theta} \mathbb{E}_{\mathbb{P}_{XZ}^{(n)}} [T_{\theta}] - \log(\mathbb{E}_{\mathbb{P}_X^{(n)} \otimes \mathbb{\hat{P}}_Z^{(n)}}[e^{T_{\theta}}])} I(X;Z) n=θΘsupEPXZ(n)[Tθ]log(EPX(n)P^Z(n)[eTθ])


Q : why we use Donskere-Varadhan representation?

Transforming the NN of estimation mutual information into an optimization problem, then the gradient optimization algorithm is used to maximize the lower bound of Donskere-Varadhan representation.

2. Algorithm

Existing problem : SGD gradients of MINE are biased in a mini-batch setting.
G ^ B = E B [ ∇ θ T θ ] − E B [ ∇ θ T θ e T θ ] E B [ e T θ ] \textcolor{brown}{\hat{G}_B=\mathbb{E}_B[\nabla_{\theta}T_{\theta}]-\frac{\mathbb{E}_B[\nabla_{\theta}T_{\theta}e^{T_{\theta}}]}{\mathbb{E}_B[e^{T_{\theta}}]}} G^B=EB[θTθ]EB[eTθ]EB[θTθeTθ]
Solution : replacing the estimate in the denominator by exponential moving average and a small learning rate.

3. Other

the author also provides another lower boundary : f-divergence representation
D K L ( P ∣ ∣ Q ) = sup ⁡ T : Ω → R E P [ T ] − E Q [ e T − 1 ] \textcolor{brown}{D_{KL}(\mathbb{P} || \mathbb{Q}) = \sup_{T: \Omega \rightarrow \mathbb{R}} \mathbb{E}_{\mathbb{P}}[T] - \mathbb{E_Q}[e^{T-1}]} DKL(PQ)=T:ΩRsupEP[T]EQ[eT1]
but the Donsker-Varadhan bound is stronger than f-divergence representation. the following experiments not only indicate the Donsker-Varadhan is more tighter, but also represent the lower bound-based approach is better than the non-parametric approach in mutual information estimator.

correlation only reflect a linear relationship between variables. mutual information further reflect the non-linear relationship.

4. Application

Mutual Information regularized GAN to alleviate mode-dropping.
arg ⁡ max ⁡ G E [ log ⁡ ( D ( G ( [ ϵ , c ] ) ) ) ] + β I ( G ( [ ϵ , c ] ) ; c ) \textcolor{brown}{\arg\max\limits_{G}\mathbb{E}[\log(D(G([\epsilon, c])))] + \beta I(G([\epsilon,c]);c)} argGmaxE[log(D(G([ϵ,c])))]+βI(G([ϵ,c]);c)
My personal understanding is that the generator learns a pattern of spoofing the discriminator, causing the generator to generate data that looks real, but only this kind. The approach here is to divide z z z into two parts z = [ e , c ] z=[e,c] z=[e,c], and then maximize the mutual information between the generated image and c c c at the same time. This corresponds to the requirement that the generated image reflect as much information about c c c as possible.
在这里插入图片描述


本文链接: http://www.dtmao.cc/news_show_450135.shtml

附件下载

相关教程

    暂无相关的数据...

共有条评论 网友评论

验证码: 看不清楚?