  1. Why Mean Squared Error and L2 regularization? A probabilistic justification.

    When you solve a regression problem with gradient descent, you’re minimizing some differentiable loss function. The most commonly used loss function is mean squared error (aka MSE, loss). Why? Here is a simple probabilistic justification, which can also be used to explain loss, as well as and regularization.
