
Correcting a proof in the InfoGAN paper
The InfoGAN paper has the following lemma: Lemma 5.1. For random variables and function under suitable regularity conditions: . The statement is correct, but the proof in the paper is confused – here’s a step where mysteriously becomes :
Read more ⟶ 
Why Mean Squared Error and L2 regularization? A probabilistic justification.
When you solve a regression problem with gradient descent, you’re minimizing some differentiable loss function. The most commonly used loss function is mean squared error (aka MSE, loss). Why? Here is a simple probabilistic justification, which can also be used to explain loss, as well as and regularization.
Read more ⟶