Algorithm augmentation | Driving algorithms in machine learning

Contents

Introduction

Many analysts misunderstand the term “impulse” used in data science. Let me give you an interesting explanation of this term.. Momentum Empowers Machine Learning Models to Improve Their Prediction Accuracy.

Boost algorithms are one of the most widely used algorithms in data science competitions. The winners of our latest hackathons agree that they are trying to push the algorithm to improve the accuracy of their models.

In this article, I will explain how the boost algorithm works in a very simple way. I have also shared the python codes below. I have skipped the intimidating math derivations used in Boosting. Because that would not have allowed me to explain this concept in simple terms.

Let us begin.

boosting-2557549

What is boost?

Definition: The term “Impulse” refers to a family of algorithms that turns a weak student into a strong student.

Let's understand this definition in detail by solving a spam identification problem:

How would you classify an email as SPAM or not? Like everyone else, our initial approach would be to identify the emails “spam” Y “no spam” using the following criteria. And:

  1. The email has only one image file (promotional image), are SPAM
  2. Email only has link (s), are SPAM
  3. The body of the email consists of a sentence like “You won a cash prize of $ xxxxxx”, are SPAM
  4. Email from our official domain “Analyticsvidhya.com“, It is not a SPAM
  5. Email from known source, no SPAM

Previously, we have defined several rules to classify an email as ‘spam’ the ‘no spam’. But, Do you think these rules individually are strong enough to successfully classify an email? No.

Individually, these rules are not powerful enough to classify an email as' spam’ the ‘no spam’. Therefore, these rules are called weak learner.

To turn a weak student into a strong student, we will combine the prediction of each weak student using methods such as:
• Using average / weighted average
• Considering that the prediction has a higher vote

For instance: up, we have defined 5 weak students. Of these 5, 3 are voted as' SPAM’ Y 2 are voted as ‘It’s not SPAM’. In this case, by default, we will consider an email as SPAM because we have a higher vote (3) for 'SPAM'.

How do impulse algorithms work?

We now know that momentum combines a weak student, also known as basic student, to form a solid ruler. An immediate question that should arise in your mind is: ‘How to boost the identification of weak rules?

To find a weak rule, we apply basic learning algorithms (ML) with a different distribution. Every time the base learning algorithm is applied, generates a new weak prediction rule. This is an iterative process. After many iterations, the momentum algorithm combines these weak rules into a single strong prediction rule.

Here's another question that might haunt you ”.How do we choose a different distribution for each round? ‘

To choose the correct layout, these are the next steps:

Paso 1: The basic student takes all distributions and assigns equal weight or attention to each observation.

Paso 2: If there are any prediction errors caused by the first base learning algorithm, then we pay more attention to observations that have a prediction error. Later, we apply the following base learning algorithm.

Paso 3: Repeat step 2 until the base learning algorithm limit is reached or higher precision is achieved.

Finally, combines the results of the weak student and creates a strong student that ultimately improves the predictive power of the model. The momentum is paid more attention to examples that are misclassified or have higher errors due to the weak rules above.

Types of impulse algorithms

The underlying engine used to drive algorithms can be anything. It can be a stamp of decision, a sort algorithm that maximizes margins, etc. There are many boost algorithms that other types of motors use, What:

  1. AdaBoost (There isptive IncreaseSpooky)
  2. Gradient tree rise
  3. XGBoost

In this article, we will focus on AdaBoost and Gradient Boosting followed by their respective Python codes and we will focus on XGboost in the next article.

Algorithm augmentation: AdaBoost

bigd-3876156

This diagram aptly explains Ada-boost. Let's understand closely:

Box 1: You can see that we have assigned equal weights to each data point and applied a decision stump to classify them as + (plus) O – (less). The decision stump (D1) has generated a vertical line on the left side to classify the data points. We see that, this vertical line has incorrectly predicted three + (plus) What – (less). In that case, we will assign higher weights to these three + (plus) and we will apply another decision stump.

dd1-e1526989432375-1889980

Box 2: Here, you can see that the size of three + (plus) incorrectly predicted is higher compared to the rest of the data points. In this case, the second decision stump (D2) will try to predict them correctly. Now, a vertical line (D2) on the right side of this chart you have correctly classified three + (plus) misclassified. But again, has caused misclassification errors. This time with three – (less). Again, we will assign a greater weight to three – (less) and we will apply another decision stump.

dd2-e1526989487878-6832470

Box 3: Here, three – (less) receive higher weights. A decision stump is applied (D3) to correctly predict these misclassified observations. This time a horizontal line is generated to classify + (plus) Y – (less) based on a higher weight of misclassified observation.

dd3-6572753

Box 4: Here, we have combined D1, D2 and D3 to form a strong prediction that has a complex rule compared to an individual weak learner. You can see that this algorithm has classified these observations quite well compared to any of the individual weak students..

dd4-e1526551014644-2439809

AdaBoost (There isptive Increaseing): It works with a method similar to the one described above. Se ajusta a una secuencia de estudiantes débiles en diferentes datos de training ponderados. Start by predicting the original data set and give each observation equal weight. If the prediction is wrong using the first student, then incorrectly predicted observations are given greater weight. Being an iterative process, continues adding learners until a limit on the number of models or precision is reached.

Principally, we use decision stamps with AdaBoost. But we can use any machine learning algorithm as the base learner if it accepts the weight in the training dataset. We can use AdaBoost algorithms for classification and regression problems.

You can refer to the article “How to be smart with machine learning: AdaBoost” to understand the AdaBoost algorithms in more detail.

Python code

Here's a live encoding window to get you started. You can run the codes and get the result in this window:

Puede ajustar los parameters para optimizar el rendimiento de los algoritmos, I have mentioned below the key parameters for tuning:

  • n_estimators: Control the number of weak students.
  • learning rate:CControl the contribution of weak students in the final combination. There is a trade-off between learning rate Y n_estimators.
  • base_estimaters: Helps specify different machine learning algorithms.

You can also adjust basic student parameters to optimize their performance.

Impulse algorithm: gradient increase

En el aumento de gradient, train many models sequentially. Cada nuevo modelo minimiza gradualmente la Loss function (y = ax + b + e, e needs special attention as it is an error term) of the entire system using Gradient descent method. El procedimiento de aprendizaje se ajustó consecutivamente a nuevos modelos para proporcionar una estimación más precisa de la variable de respuesta.

The main idea behind this algorithm is to build new base students that can be correlated to the maximum with the negative gradient of the loss function, associated with the whole set. You can refer to the article “Learn the gradient increase algorithm” to understand this concept with an example.

In the Python Sklearn library, usamos Gradient Tree Boosting o GBRT. It is a generalization of the impulse to arbitrary differentiable loss functions. Can be used for both regression and classification problems.

Python code

from sklearn.ensemble import GradientBoostingClassifier #For Classification
from sklearn.ensemble import GradientBoostingRegressor #For Regression
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1)
clf.fit(X_train, y_train)
  • n_estimators: Control the number of weak students.
  • learning rate:CControl the contribution of weak students in the final combination. There is a trade-off between learning rate Y n_estimators.
  • Maximum depth: maximum depth of individual regression estimators. Maximum depth limits the number of nodes in the tree. Adjust this parameter for the best performance; the best value depends on the interaction of the input variables.

You can adjust the loss function for better performance.

Final note

In this article, we analyze the momentum, one of the ensemble modeling methods to improve predictive power. Here, We have discussed the science behind the impulse and its two types: AdaBoost y Gradient Boost. We also study their respective Python codes.

In my next article, I will discuss about another type of boost algorithms which is now a days secret to winning “XGBoost” data science contests.

Do you find helpful this article? Share your opinions / thoughts in the comment section below.

If you like what you have just read and want to continue learning about analytics, subscribe to our emails, Follow us on twitter or like ours page the Facebook.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.