1. # Correcting a proof in the InfoGAN paper

The InfoGAN paper has the following lemma: Lemma 5.1. For random variables $X, Y$ and function $f(x, y)$ under suitable regularity conditions: $\mathbb{E}_{x \sim X, y \sim Y|x}[f(x, y)] = \mathbb{E}_{x \sim X, y \sim Y|x, x' \sim X|y}[f(x', y)]$. The statement is correct, but the proof in the paper is confused – here’s a step where $x$ mysteriously becomes $x'$:

2. # Why Mean Squared Error and L2 regularization? A probabilistic justification.

When you solve a regression problem with gradient descent, you’re minimizing some differentiable loss function. The most commonly used loss function is mean squared error (aka MSE, $\ell_2$ loss). Why? Here is a simple probabilistic justification, which can also be used to explain $\ell_1$ loss, as well as $\ell_1$ and $\ell_2$ regularization.