Decision tree algorithm for classification: machine learning 101

Contents

This article was published as part of the Data Science Blogathon.

Overview

  • Learn more about the Decision Tree Algorithm in Machine Learning for Classification Problems.
  • here we have covered entropy, information gain and Gini impurity

Decision tree algorithm

algorithms. That can be used for both a classification problem and a regression problem.

The goal of this algorithm is to create a model that predicts the value of a target variable, for which the decision tree uses the representation of the tree to solve the problem in which the leaf node corresponds to a class label and the attributes are represented in the internal node. From the tree.

Let's take a sample data set to go further ....

41746screenshot2096-3876029

Suppose we have a sample of 14 patient data sets and we have to predict which drug to suggest to patient A or B.

Let's say we choose cholesterol as the first attribute to divide the data

40753screenshot2098-4676492

It will divide our data into two branches High and Normal according to cholesterol, as you can see in the figure above.

Suppose our new patient has high cholesterol from the above division of our data that we cannot tell either Drug B or Drug A will be appropriate for the patient.

What's more, if the patient's cholesterol is normal, we do not yet have an idea or information to determine if drug A or drug B is suitable for the patient.

Let's take another attribute age, as we can see, age has three categories: Young man, middle age and older, let's try to divide.

82443screenshot20100-4295219

From the previous figure, now we can say that we can easily predict which drug to administer to a patient based on their reports.

Assumptions we make when using the decision tree:

– In the beginning, we consider the entire training set as the root.

-Characteristic values ​​are preferred to be categorical, if the values ​​continue, are converted to discrete before building the model.

-Based on attribute values, records are distributed recursively.

-We use a statistical method to order attributes such as root node or internal node.

Math behind the decision tree algorithm: Before moving on to information gain, we first have to understand entropy.

Entropy: Entropy are the measurements of impurity, disorder, O uncertainty in a lot of examples.

Purpose of entropy:

Entropy controls how a decision tree decides break apart the data. It affects how a Decision tree draw its limits.

"Entropy values ​​range from 0 a 1”, less the entropy value is more reliable.

27223screenshot20103-4776202
38403screenshot20106-1386100

Suppose we have characteristics F1, F2, F3, we select characteristic F1 as our root node

F1 contains 9 label yes and 5 no label, after dividing the F1 we ​​get F2 which has 6 Yes / 2 No and F3 you have 3 Yes / 3 no.

Now, if we try to calculate the Entropy of both F2 using the Entropy formula …

Putting the values ​​in the formula:

86728screenshot20108-9276083

Here, 6 is the number of yeses taken as positive since we are calculating the probability divided by 8 is the total of rows present in F2.

In the same way, if we carry out Entropy for F3 we will obtain 1 bit which is a case of an attribute since in it there is 50%, Yes, and 50% no.

This division will continue unless and until we obtain a pure subset.

What is a Puresubset?

The pure subset is a situation in which we will get all yes or all no in this case.

We have done this with respect to a node, What if after dividing F2 we can also require some other attribute to get to the leaf node and we also have to take the entropy of those values ​​and add them to send all those entropy values ​​for that? we have the concept of information gain.

Information gain: The information gain is used to decide which function to divide into at each step of the tree construction. Simplicity is the best, that's why we want our tree to be small. To do it, in each step we must choose the division that results in the purest child nodes. A commonly used measure of purity is called information.

For each node in the tree, the information value measures how much information a characteristic gives us about the class. The division with the highest information gain will be taken as the first division and the process will continue until all secondary nodes are pure or until the information gain is 0.

15051screenshot20110-1797590

The algorithm calculates the information gain for each division and the division that gives the highest information gain value is selected.

We can say that in Information Gain we are going to calculate the average of all entropy as a function of the specific division.

Sv = Total sample after division as in F2 there is 6 Yes

S = Total sample as in F1 = 9 + 5 = 14

Now calculating the information gain:

73175screenshot20113-2733588

This way, the algorithm will do this for n number of divisions, and the gain of information for the division that is greater is going to take to build the decision tree.

The higher the value of the information gain of the division, the greater the probability that it will be selected for the particular division.

Gini impurity:

The Gini impurity is a measure used to construct decision trees to determine how the characteristics of a data set should divide the nodes to form the tree. More precisely, the Gini impurity of a data set is a number between 0-0,5, indicating the probability that new and random data will be misclassified if they are assigned a random class label according to the distribution of classes in the data set.

Entropy vs Impurity of Gini

The maximum entropy value is 1, while the maximum Gini impurity value is 0,5.

Like the Gini Impurit

In this article, we have covered a lot of details about the decision tree, how it works and the math behind it, attribute selection measures such as Entropy, Information gain, Gini impurity with its formulas and how the machine learning algorithm solves it.

At this stage, I hope you got an idea about the decision tree, one of the best machine learning algorithms to solve a classification problem.

Like new, I advise you to learn these techniques and understand their implementation and then implement them in your models.

for better understanding, see https://scikit-learn.org/stable/modules/tree.html

The media shown in this article is not the property of Analytics Vidhya and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.