Loss Functions

I am going to walk you through the blog by answering a series of questions that I had while learning about choosing a loss function for a problem.

Introduction :

Loss functions are a fundamental concept in machine learning that is used to measure the difference between the predicted output of the model and the True Output.

In general, researchers try to choose a loss function that is well suited to the particular problem they are trying to solve. For example, in a classification problem where the goal is to predict the correct category for each input, researchers might choose a loss function like a cross-entropy loss function. On other hand for regression problem where the goal is to predict a continuous value, a loss function like mean squared error might be more appropriate.So the question I had now while learning is ......

Why Mean Squared Error ? why not just take the difference between predicted output and true output and minimize it?

Yeah, you could certainly use the absolute difference between predicted output and true output as a loss function and minimize it to train a machine learning model. This is known as the L1 loss or mean absolute error. However, mean squared error (MSE) is a more commonly used loss function in many machine learning applications for several reasons:

Differentiability :

One of the key advantages of MSE over the L1 loss is that it is differentiable everywhere, which makes it easier to use in optimization algorithms like gradient descent. In contrast, the L1 loss is not differentiable at zero, which can cause numerical instabilities when using gradient-based optimization algorithms. OK now,

what is meant by differentiable everywhere and why it should be differentiable everywhere? - A function is said to be differentiable everywhere if it is differentiable at every point in its domain.

In the context of machine learning, differentiability is important because many optimization algorithms used to train machine learning models rely on calculating the derivatives of the loss function with respect to the model parameters. If the loss function is not differentiable at some points, then these algorithms may fail to converge to a good solution, or they may encounter numerical instabilities that make the optimization process unstable.
Sensitivity to outliers:

The sensitivity of a loss function to outliers refers to the degree to which the presence of outliers in the data affects the value of the loss function. Outliers are data points that are significantly different from the rest of the data, and they can have a large impact on the value of the loss function

Comparison between MSE and MAE :

Mean squared error (MSE) is less sensitive to outliers than some other loss functions, such as mean absolute error (MAE), because of the way it squares the differences between predicted and true values. Consider a case where the true value is y_true=10, and the predicted value is y_pred=100. The absolute difference between these values is |y_true - y_pred| = 90, which would contribute a large amount to the MAE loss function. However, the squared difference between these values is (y_true - y_pred)^2 = 8100, which would contribute even more to the MSE loss function.

Because of the squaring effect, a single outlier can have a disproportionately large impact on the value of the MSE loss function, but this can also make the loss function less sensitive to other, less extreme outliers. In contrast, the MAE loss function treats all differences between predicted and true values equally, regardless of whether they are outliers or not. This can make it more sensitive to outliers in the data.

Conclusion :

Loss functions are an essential aspect of machine learning algorithms, as they allow us to measure the difference between the predicted output and the true output. Researchers choose the appropriate loss function based on the specific problem they are trying to solve. In this blog, we discussed the reasons why the mean squared error (MSE) loss function is commonly used over other loss functions, such as the L1 loss and mean absolute error (MAE). We also explained the importance of differentiability in the context of machine learning and the sensitivity of loss functions to outliers. Understanding loss functions and their properties is crucial in selecting an appropriate loss function and optimizing machine learning models effectively.

Loss Functions

The Intuition behind the choice of Loss Function

Table of contents

Introduction :

Why Mean Squared Error ? why not just take the difference between predicted output and true output and minimize it?

Conclusion :