Cost function | Types of cost function machine learning

Contents

This article was published as part of the Data Science Blogathon.

gxle-8713139

Credit: https://gifer.com/en/GxlE

The 2 main questions that arose in my mind while working on this article were “Why am I writing this article?” & “How is my article different from other articles?” Good, the cost function is an important concept to understand in the fields of data science, but while I was following my graduate, I realized that the resources available online are too general and do not cover my needs completely.

I had to consult a lot of articles and watch some videos on YouTube to get an idea of ​​the cost functions. As a result, I wanted to gather the functions “That”, “When”, “How” Y “Why” from Cost that can help explain this topic more clearly. I hope my article acts as a one-stop shop for cost functions!!

Dummy guide to cost function 🤷‍♀️

Loss function: it is used when we refer to the error of a single example of training.
Cost function: is used to reference an average of the loss functions in a complete training data set.

But, * because * use a cost function?

Why the hell do we need a cost function? Consider a scenario in which we want to classify the data. Suppose we have the height and weight details of some dogs and cats. Let's use these 2 characteristics to classify them correctly. If we trace these records, we obtain the following Dispersion diagram:

788061-3068436

Fig 1: Scatterplot for the height and weight of various cats and dogs

The blue dots are cats and the red dots are dogs. Below are some solutions to the classification problem above.

547072-9983286

Fig: Probable solutions to our classification problem

Essentially, all three classifiers have very high precision, but the third solution is the best because it does not misclassify any points. The reason it ranks all the points perfectly is that the line is almost exactly between the two groups and no closer to either group.. This is where the concept of cost function comes in.. The cost function helps us to reach the optimal solution. The cost function is the technique of evaluating “the performance of our algorithm / model”.

Takes both the results expected by the model and the actual results, and calculate how wrong the model was in its prediction. Produces a higher number if our predictions differ greatly from the actual values. As we adjust our model to improve predictions, the cost function acts as an indicator of how the model has improved. This is essentially an optimization problem. Optimization strategies always aim to "minimize the cost function".

Types of cost functions

There are many cost functions in machine learning and each has its use cases depending on whether it is a regression or classification problem..

  1. Regression cost function
  2. Binary classification cost functions
  3. Multiple Class Classification Cost Functions

1. Regression cost function:

Regression models try to predict a continuous value, for instance, the salary of an employee, the price of a car, predicting a loan, etc. A cost function used in the regression problem is called “Regression cost function”. They are calculated on the error based on the distance as follows:

Error = y-y ‘

Where,

Y – Real entrance

AND '- Planned departure

The most commonly used regression cost functions are below,

1.1 Mean error (ME)

  • In this cost function, the error is calculated for each training data and then the mean value of all these errors is derived.
  • Calculating the average of the errors is the simplest and most intuitive way possible.
  • Errors can be both negative and positive. Therefore, can cancel each other out during addition, which gives a zero mean error for the model.
  • Therefore, this is not a recommended cost function, but it lays the foundation for other cost functions of regression models.

1.2 Root mean square error (MSE)

  • This improves on the drawback we found in the above average error. Here a square of the difference between the actual and predicted value is calculated to avoid any possibility of negative error.
  • It is measured as the average of the sum of the squared differences between the predictions and the actual observations.
965844-3606584

MSE = (sum of squared errors) / n

  • Also known as L2 loss.
  • In MSE, since each error is squared, helps to penalize even small deviations in prediction compared to MAE. But if our dataset has outliers that contribute to larger prediction errors, then squaring this error even more will magnify the error many times more and also lead to a higher MSE error.
  • Therefore, we can say that it is less robust to outliers.

1.3 Mean absolute error (MUCH)

634276-6325349

EASY (sum of absolute errors) / n

2. Cost functions for classification problems

The cost functions used in the classification problems are different from the ones we use in the regression problem.. A commonly used loss function for classification is the cross entropy loss. Let's understand cross entropy with a little example. Consider that we have a classification problem of 3 classes as follows.

Class (Orange, Apple, tomato)

The machine learning model will give a probability distribution of these 3 classes as output for a given input data. The class with the highest probability is considered a winning class for prediction.

Output = [P(Orange),P(Apple),P(Tomato)]

The actual probability distribution for each class is shown below.

Orange = [1,0,0]

Apple = [0,1,0]

Tomato = [0,0,1]

If during the training phase, the input class is Tomato, the predicted probability distribution should tend towards Tomato's actual probability distribution. If the predicted probability distribution is no closer to the real one, the model must adjust its weight. This is where the cross entropy becomes a tool for calculating how far the predicted probability distribution is from the actual. In other words, cross entropy can be thought of as a way to measure the distance between two probability distributions. The following image illustrates the intuition behind cross entropy:

121667-2694188

FIGURE 3: Intuition behind croos-entropy (credit – machinelearningknowledge.ai)

This was just an intuition behind the cross entropy. It has its origin in information theory. Now, with this understanding of cross entropy, let's now look at the classification cost functions.

2.1 Multiple Class Classification Cost Functions

This cost function is used in classification problems where there are multiple classes and the input data belongs to a single class. Now let's understand how the cross entropy is calculated. Suppose the model gives the probability distribution as shown below for 'n’ classes and for a particular input data D.

8357111-2884960

And the actual or target probability distribution of the data D is

9109912-8629865

Later, the cross entropy for that particular datum D is calculated as

Loss of cross entropy (Y, p) = – YT Registration (p)

= – (Y1 log (p1) + Y2 log (p2) + …… andNorth log (pNorth))

7256614-3792894

Let's now define the cost function using the previous example (See cross entropy image -Fig3),

p (tomato) = [0.1, 0.3, 0.6]

Y (tomato) = [0, 0, 1]

Cross entropy (Y, P) = – (0 * Log (0.1) + 0 * Log (0.3) + 1 * Log (0.6)) = 0.51

The above formula only measures the cross entropy for a single observation or input data. The error in the classification of the complete model is given by the categorical cross entropy, which is nothing more than the mean of the cross entropy for all the N training data.

Categorical cross entropy = (Cross-entropy sum for N data) / N

2.2 Binary cross entropy cost function

Binary cross entropy is a special case of categorical cross entropy when there is only one output that simply assumes a binary value of 0 O 1 to denote the negative and positive class respectively. For instance, classification between cat and dog.

Suppose the actual output is denoted by a single variable Y, then the cross entropy for a particular datum D can be simplified as follows:

Cross entropy (D) = – Y * log (p) when y = 1

Cross entropy (D) = – (1-Y) * log (1-p) when y = 0

The error in the binary classification for the complete model is given by the binary cross entropy, which is nothing more than the mean of the cross entropy for all the N training data.

Binary cross entropy = (Cross-entropy sum for N data) / N

Conclution

I hope this article has been helpful to you!! Let me know what you think, especially if there are suggestions for improvement. You can connect with me on LinkedIn: https://www.linkedin.com/in/saily-shah/ and here is my GitHub profile: https://github.com/sailyshah

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.

Datapeaker