Confusion matrix: Not so confusing!
Have you been in a situation where you expected your machine learning model to work really well?, but it returned poor precision? You've done all the hard work, then, Where did the classification model go wrong? How can you correct this?
There are many alternatives to measure the performance of your ranking model, but none has stood the test of time like the confusion matrix. Helps us examine how our model worked, where it went wrong and offers us a guide to correct our path.
In this post, we'll explore how a confusion matrix provides a holistic view of your model's performance. And unlike its name, you will notice that a confusion matrix is a fairly simple but powerful concept. So let's unravel the mystery around the confusion matrix!!
Learning the ropes in the field of machine learning? These courses will help you follow your path:
This is what we will cover:
- What is a confusion matrix?
- True positive
- True negative
- False positive: type error 1
- False negative – Type error 2
- Why do you need a confusion matrix?
- Accuracy vs recovery
- F1 score
- Matrix of confusion in Scikit-learn
- Confusion matrix for multiple class classification
What is a confusion matrix?
The million dollar question: What is, after all, a confusion matrix?
A confusion matrix is an N x N matrix that is used to examine the performance of a classification model., where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of mistakes it is making..
For a binary classification obstacle, we would have a matrix of 2 x 2 as shown below with 4 values:
Let's decipher the matrix:
- The target variable has two values: Positive O Negative
- the columns represent the current values of the target variable
- the rows represent the predicted values of the target variable
But wait, What is tp, FP, FN and TN here? That is the crucial part of a confusion matrix.. Let's understand each term below.
Understand the true positive, the true negative, the false positive and the false negative in a confusion matrix
True positive (TP)
- Predicted value matches actual value
- The true value was positive and the model predicted a positive value.
True negative (TN)
- Predicted value matches actual value
- The actual value was negative and the model predicted a negative value.
False positive (FP): type error 1
- Predicted value was falsely predicted
- The actual value was negative but the model predicted a positive value
- Also known as the Type error 1
False negative (FN): type error 2
- Predicted value was falsely predicted
- The actual value was positive but the model predicted a negative value
- Also known as the Type error 2
Let me give you an example to better know this. Suppose we have a classification data set with 1000 data points. We put a classifier on it and get the next confusion matrix:
The different values of the confusion matrix would be the following:
- True positive (TP) = 560; which means that 560 positive class data points were correctly classified by the model
- True negative (TN) = 330; which means that 330 negative class data points were correctly classified by the model
- False positive (FP) = 60; which means that the model incorrectly classified 60 data points of negative class as belonging to positive class
- False negative (FN) = 50; which means that the model incorrectly classified 50 data points of positive class as belonging to negative class
This turned out to be a pretty decent classifier for our data set considering the relatively larger number of true positive and true negative values..
Remember Type Errors 1 and Type 2. Interviewers love to ask the difference between these two!! You can better prepare for all of this from our Online Machine Learning Course
Why do we need a confusion matrix?
Before answering this question, Let's think about a hypothetical ranking hurdle.
Suppose you want to predict how many people are infected with a contagious virus before they show symptoms and isolate them from the healthy population. (Does something still sound? 😷). The two values of our target variable would be: Sick and Not Sick.
Now, you must wonder: Why do we need a confusion matrix when we have our all-weather friend: precision? Good, let's see where precision fails.
Our dataset is an example of unbalanced data set. There is 947 data points for the negative class and 3 data points for the positive class. This is how we will calculate the precision:
Let's see how our model worked:
The total result values are:
TP = 30, TN = 930, FP = 30, FN = 10
Then, the precision of our model turns out to be:
96%! Nothing bad!
But you are giving the wrong idea about the result. Think about it.
Our model says “I can predict sick people the 96% weather”. Despite this, is doing the opposite. You are predicting people who will not get sick with a 96% precision while the sick are spreading the virus!
Do you think this is a correct metric for our model given the severity of the problem? Shouldn't we measure how many positive cases we can correctly predict to stop the spread of the contagious virus? Or maybe, of correctly predicted cases, How many are positive cases to verify the reliability of our model?
This is where we meet the dual concept of Precision and Recall.
Accuracy vs. recovery
Accuracy tells us how many of the correctly predicted cases were truly positive.
Then, explains how to calculate precision:
This would determine if our model is reliable or not..
Recall tells us how many of the actual positive cases we were able to correctly predict with our model.
And this is how we can calculate Recall:
We can easily calculate Precision and Recall for our model by plugging in the values in the questions above:
The 50% of the correctly predicted cases turned out to be positive cases. While our model successfully predicted the 75% of the positives. Impressive!
Accuracy is a useful metric in cases where false positives are a greater concern than false negatives.
Accuracy is essential in music or video recommendation systems, e-commerce websites, etc. Incorrect results can lead to customer loss and be detrimental to the company.
Recovery is a useful metric in cases where the false negative trumps the false positive.
Recall is essential in medical cases where it does not matter if we raise a false alarm, But real positive cases should not go unnoticed!!
In our example, Recall would be a better metric because we do not want to accidentally discharge an infected person and let them mix with the healthy population, thus spreading the contagious virus.. Now you can understand why precision was a bad metric for our model.
But there will be cases where there is no clear distinction between whether precision is more important or recovery. What should we do in those cases? We combine them!
F1 score
In practice, when we try to increase the precision of our model, recovery decreases and vice versa. The F1 score captures both trends in a single value:
The F1 score is a harmonic mean of precision and recall, so it gives a combined idea about these two metrics. It is maximum when Precision equals Recall.
But there is a catch here. Interpretability of the F1 score is poor. This means that we do not know what is maximizing our classifier: Precision or remember? Then, we use it in combination with other evaluation metrics that give us a complete picture of the result.
Confusion matrix using scikit-learn in Python
You already know the theory, now let's put it into practice. Let's code a confusion matrix with the Scikit-learn library (sklearn) and Python.
Sklearn has two great functions: confusion matrix() Y classification_report ().
- Sklearn confusion matrix() returns the values of the confusion matrix. Despite this, the result is slightly different from what we have studied so far. Take the rows as actual values and the columns as predicted values. The rest of the concept remains the same.
- Sklearn classification_report () generates precision, recovery and f1 score for each target class. At the same time of this, it also has some extra values: micro average, macro average, Y weighted average
Mirco Average is the precision / Recovery / f1 score calculated for all classes.
Macro media is the average precision / I remember / f1 score.
Average weight it's just the weighted average of precision / Recovery / f1 score.
Confusion matrix for multiple class classification
How would a confusion matrix work for a multiple class classification obstacle?? Good, Don't scratch your head! We will take a look at that here..
Let's draw a confusion matrix for a multiclass obstacle where we have to predict if a person loves Facebook, Instagram or Snapchat. The confusion matrix would be a matrix of 3 x 3 how are you:
The real positive, true negative, false positive and false negative of each class would be calculated by summing the cell values as follows:
That is all! You are ready to decipher any N x N confusion matrix!!
Final notes
And suddenly, the confusion matrix is no longer so confused! This post should give you a solid foundation on how to interpret and use a confusion matrix for classification algorithms in machine learning..
Soon we will publish a post about the AUC-ROC curve and we will continue our discussion there. Until next time, don't lose hope in your classification model, You may be using the wrong evaluation metric!