Variational Information
variational information bound of mutual information
On variational lower bounds of mutual information
简述:总结当前互信息的估计方法
energy-based bound, EB
这种方法的下界为:
$$
I(x,y) \geq \mathbb E_{p(x,y)}[\log f(x,y)] - \mathbb E_{p(y)}[\frac{\mathbb E_{p(x)}[f(x,y)]}{a}+\log a -1]
$$
推导方法
$$
\mathbb E_{p(y)}[KL(p(x|y)| q(x|y))] \geq 0
$$
$$
I(x,y) = \mathbb E_{p(y)}[\mathbb E_{p(x)}\log \frac{p(x|y)}{p(x)}] \geq 0
$$
$$
I(x,y) \geq \mathbb E_{p(x,y)} [\log q(x|y) - \log p(x)]
$$
基于能量的方法主要体现在这一部分,假设$p(x|y)=p(x)\frac{f(x,y)}{Z(y)}, Z(y)=\mathbb E_{p(x)}[f(x,y)]$
$$
I(x,y) \geq \mathbb E_{p(x,y)} [\log f(x,y)] - \mathbb E_{p(y)}[\log \mathbb E_{p(x)}[f(x,y)]]
$$
由于
$$
\log x \leq x/a + \log a -1
$$
当$x=a$时,等号成立
得到
$$
I(x,y) \geq I_{EB}(x,y) := \mathbb E_{p(x,y)} [\log f(x,y)] - \mathbb E_{p(y)}[\mathbb E_{p(x)}[f(x,y)]/a+\log a -1]
$$
MINE
取$a=E_{p(x)}[f(x,y)]$, $T=\log f(x,y)$
$$
I(x,y) \geq I_{MINE}(x,y) := \mathbb E_{p(x,y)} [T] - \mathbb E_{p(y)}[\log \mathbb E_{p(x)}[\exp(T)]]
$$
NWJ
取$a=e$
$$
I(x,y) \geq I_{NWJ}(x,y) := \mathbb E_{p(x,y)} [\log f(x,y)] - 1/e\mathbb E_{p(x)p(y)}[f(x,y)]
$$
JS
在NWJ方法的基础上,令$f(x,y)=\exp(V(x,y)+1)$
$$
I(x,y) \geq I_{JS}(x,y) := 1 + \mathbb E_{p(x,y)} [V(x,y)] - \mathbb E_{p(x)p(y)}[\exp(V(x,y))]
$$
TCPC
这种方法考虑了数据集批大小的影响,假设$p(y|x$已经知道
$$
I(x,y) = \mathbb E_{x_{1:K}\in \mathcal D}[1/K \sum_{i=1}^K KL(p(y|x_i)|p(y))]
$$
边缘分布$p(y)$估计为:
$$
m(y) = 1/K \sum_{i=1}^K p(y|x_i)
$$
$$
1/K \sum_{i=1}^K [KL(p(y|x_i)|p(y))] = 1/K \sum_{i=1}^K [KL(p(y|x_i)|m(y))] + KL(m(y)|p(y))
$$
有
$$
I(x,y) \geq I_{TCPC}(x,y) := \mathbb E_{x_{1:K}\in \mathcal D}[1/K \sum_{i=1}^K KL(p(y|x_i)|m(y))]
$$

