Recent Posts

  1. Correcting a proof in the InfoGAN paper

    The InfoGAN paper has the following lemma: Lemma 5.1. For random variables and function under suitable regularity conditions: . The statement is correct, but the proof in the paper is confused – here’s a step where mysteriously becomes :
    Read more ⟶

  2. Why Mean Squared Error and L2 regularization? A probabilistic justification.

    When you solve a regression problem with gradient descent, you’re minimizing some differentiable loss function. The most commonly used loss function is mean squared error (aka MSE, loss). Why? Here is a simple probabilistic justification, which can also be used to explain loss, as well as and regularization.
    Read more ⟶