This article was published as part of the Data Science Blogathon.
These situations can be addressed by understanding the use case of each metric..
Everyone knows the basics of all frequently used ranking metrics, but when it comes to knowing which one is the right one to evaluate the performance of your classification model, very few trust the next step to take.
Supervised learning is usually in regression (have continuous goals) or in classification (have discrete goals). But nevertheless, in this article, I will try to focus on a very small but very important part of machine learning, that being the favorite topic of the interviewers, “who knows what”, it can also help you get your concepts correct in classification models and, Finally, about any business problem. This article will help you know that when someone tells you that an ml model is giving a 94% precision, what questions to ask to find out if the model is actually working as required.
Then, How to decide the questions that will help?
Now, that's a thought for the soul.
We will respond to this by knowing how to evaluate a classification model, the correct way.
We will review the following topics in this article:
Metrics based on the confusion matrix
After reading this article, will have the knowledge about:
What is confusion matrix and why do you need to use it?
How to Calculate a Confusion Matrix for a Classification Problem 2 lessons
Metrics based on the confusion matrix and how to use them
Accuracy and its flaws:
Precision (ACC) measures the fraction of correct predictions. is defined as “the relationship between correct predictions and total predictions made”.
Problem with accuracy:
Hide the details you need to better understand the performance of your ranking model. You can follow the examples below to help you understand the problem:
Multiple class target variable: when your data has more than 2 lessons. With 3 or more classes, you can get a sort precision of the 80%, but you don't know if that is because all classes predict equally well or if the model is neglecting one or two classes.
A typical example of unbalanced data is in an email classification problem where emails are classified as spam or not spam. Here, spam count is considerably very low (less than 10%) than the number of relevant emails (no spam) (more than 90%). Then, the original two-class distribution leads to an unbalanced data set.
If we take two classes, then the balanced data would mean that we have 50% points for each of the classes. What's more, Yes there are 60-65% points for a class and 40% f
Classification accuracy does not highlight the details you need to diagnose your model's performance. This can be highlighted using a confusion matrix.
Wikipedia defines the term as “a confusion matrix, also known as error matrix, it is a specific table design that allows the visualization of the performance of an algorithm ".
Below is a confusion matrix for two classes (+, -).
There are four quadrants in the confusion matrix, which are symbolized below.
- : The number of in
- Predicted that an email is spam and it actually is.
- False negative (FN): The number of instances that were positive (+) and were incorrectly classified as negative (-). It is also known as Type error 2.
- Predicted that an email is not spam and it actually is.
- True negative (TN): The number of instances that were negative (-) and were correctly classified as (-).
- Predicted that an email is not spam and in fact it is not.
- False positive (FP): The number of instances that were negative (-) and were incorrectly classified as (+). This also known as Type error 1.
- Predicted that an email is not spam and it actually is.
To add a little clarity:
Up to the left: true positives for correctly predicted event values.
Top right: false positives for incorrectly predicted event values.
Down left: false negatives for correctly predicted non-event values.
Bottom right: True negatives for values without incorrectly predicted events.
Metrics based on confusion matrix:
Precision calculates the ability of a classifier not to label a true negative observation as positive.
We use precision when working on a model similar to the spam detection data set, since Recall actually calculates how many of the actual positives our model captures by labeling it positive.
Recall calculates the ability of a classifier to find positive observations in the data set. If you wanted to be sure to find all the positive comments, could maximize memory.
We always tend to use withdrawal when we need to correctly identify positive scenarios, as in a cancer screening dataset or a fraud screening case. Accuracy or precision won't be that helpful here.
To compare any two models, we use F1-Score. It is difficult to compare two models with low precision and high recovery or vice versa. F1 score helps measure recovery and accuracy at the same time. Use the harmonic mean instead of the arithmetic mean when punishing extreme values more.
Understanding the Confusion Matrix
Let's say we have a binary classification problem in which we want to predict whether a patient has cancer or not., depending on the symptoms (the characteristics) introduced in the machine learning model (sorter).
As previously studied, the left side of the confusion matrix shows the class predicted by the classifier. Meanwhile, the top row of the array shows the actual class labels from the examples.
If the problem set has more than two classes, the confusion matrix just grows by the respective number of classes. For instance, if there are four classes, would be an array of 4 x 4.
In simple words, the number of classes does not matter, the main will remain the same: the left side of the matrix are the predicted values and the top the actual values. What we have to check is where they intersect to see the number of predicted examples for any given class versus the actual number of examples for that class.
While you can manually calculate metrics like the confusion matrix, precision and recovery, most machine learning libraries, how to Scikit-learn for Python, have built-in methods to get these metrics.
Generating a confusion matrix in Scikit Learn
We have already covered the theory on how the confusion matrix works, here we will share python commands to get the output of any classifier as an array.
To get the confusion matrix for our classifier, we need to instantiate the confusion matrix that we imported from Sklearn and pass it the relevant arguments: the true values and our predictions.
de sklearn.metrics importar confusion_matrix
c_matrix = confusion_matrx (test_y, predictions)
to print (c_matrix)
In a short summary, we analyze:
problems it can bring to the table
confusion matrix to better understand the classification model
accuracy and recovery and scenario on where to use them
We lean towards using precision because everyone has an idea of what it means. It is necessary to increase the use of more suitable metrics, like recovery and precision, that may seem strange. Now you have an intuitive idea why they work best for some problems, like unbalanced sorting tasks.
Statistics provide us with formal definitions to evaluate these measures.. Our job as a data scientist is to know the right tools for the right job, and this entails the need to go beyond precision when working with classification models.
Using recovery, precision and F1 score (harmonic mean precision and recovery) allows us to evaluate classification models and also makes us think about using only the precision of a model, especially for unbalanced problems. As we have learned, accuracy is not a useful evaluation tool in various problems, so let's implement other measures added to our arsenal to evaluate the model.