Introduction
Many analysts misunderstand the term “impulse” used in data science. Let me give you an interesting explanation of this term.. Momentum Empowers Machine Learning Models to Improve Their Prediction Accuracy.
Boost algorithms are one of the most widely used algorithms in data science competitions. The winners of our latest hackathons agree that they are trying to push the algorithm to improve the accuracy of their models.
In this article, I will explain how the boost algorithm works in a very simple way. I have also shared the python codes below. I have skipped the intimidating math derivations used in Boosting. Because that would not have allowed me to explain this concept in simple terms.
Let us begin.
What is boost?
Definition: The term “Impulse” refers to a family of algorithms that turns a weak student into a strong student.
Let's understand this definition in detail by solving a spam identification problem:
How would you classify an email as SPAM or not? Like everyone else, our initial approach would be to identify the emails “spam” Y “no spam” using the following criteria. And:
- The email has only one image file (promotional image), are SPAM
- Email only has link (s), are SPAM
- The body of the email consists of a sentence like “You won a cash prize of $ xxxxxx”, are SPAM
- Email from our official domain “Analyticsvidhya.com“, It is not a SPAM
- Email from known source, no SPAM
Previously, we have defined several rules to classify an email as ‘spam’ the ‘no spam’. But, Do you think these rules individually are strong enough to successfully classify an email? No.
Individually, these rules are not powerful enough to classify an email as' spam’ the ‘no spam’. Therefore, these rules are called weak learner.
To turn a weak student into a strong student, we will combine the prediction of each weak student using methods such as:
• Using average / weighted average
• Considering that the prediction has a higher vote
For instance: up, we have defined 5 weak students. Of these 5, 3 are voted as' SPAM’ Y 2 are voted as ‘It’s not SPAM’. In this case, by default, we will consider an email as SPAM because we have a higher vote (3) for 'SPAM'.
How do impulse algorithms work?
We now know that momentum combines a weak student, also known as basic student, to form a solid ruler. An immediate question that should arise in your mind is: ‘How to boost the identification of weak rules?‘
To find a weak rule, we apply basic learning algorithms (ML) with a different distribution. Every time the base learning algorithm is applied, generates a new weak prediction rule. This is an iterative process. After many iterations, the momentum algorithm combines these weak rules into a single strong prediction rule.
Here's another question that might haunt you ”.How do we choose a different distribution for each round? ‘
To choose the correct layout, these are the next steps:
Paso 1: The basic student takes all distributions and assigns equal weight or attention to each observation.
Paso 2: If there are any prediction errors caused by the first base learning algorithm, then we pay more attention to observations that have a prediction error. Later, we apply the following base learning algorithm.
Paso 3: Repeat step 2 until the base learning algorithm limit is reached or higher precision is achieved.
Finally, combines the results of the weak student and creates a strong student that ultimately improves the predictive power of the model. The momentum is paid more attention to examples that are misclassified or have higher errors due to the weak rules above.
Types of impulse algorithms
The underlying engine used to drive algorithms can be anything. It can be a stamp of decision, a sort algorithm that maximizes margins, etc. There are many boost algorithms that other types of motors use, What:
- AdaBoost (There isptive IncreaseSpooky)
- Gradient tree rise
- XGBoost
In this article, we will focus on AdaBoost and Gradient Boosting followed by their respective Python codes and we will focus on XGboost in the next article.
Algorithm augmentation: AdaBoost
This diagram aptly explains Ada-boost. Let's understand closely:
Box 1: You can see that we have assigned equal weights to each data point and applied a decision stump to classify them as + (plus) O – (less). The decision stump (D1) has generated a vertical line on the left side to classify the data points. We see that, this vertical line has incorrectly predicted three + (plus) What – (less). In that case, we will assign higher weights to these three + (plus) and we will apply another decision stump.
Box 2: Here, you can see that the size of three + (plus) incorrectly predicted is higher compared to the rest of the data points. In this case, the second decision stump (D2) will try to predict them correctly. Now, a vertical line (D2) on the right side of this chart you have correctly classified three + (plus) misclassified. But again, has caused misclassification errors. This time with three – (less). Again, we will assign a greater weight to three – (less) and we will apply another decision stump.
Box 3: Here, three – (less) receive higher weights. A decision stump is applied (D3) to correctly predict these misclassified observations. This time a horizontal line is generated to classify + (plus) Y – (less) based on a higher weight of misclassified observation.
Box 4: Here, we have combined D1, D2 and D3 to form a strong prediction that has a complex rule compared to an individual weak learner. You can see that this algorithm has classified these observations quite well compared to any of the individual weak students..
AdaBoost (There isptive Increaseing): It works with a method similar to the one described above. Se ajusta a una secuencia de estudiantes débiles en diferentes datos de trainingTraining is a systematic process designed to improve skills, physical knowledge or abilities. It is applied in various areas, like sport, Education and professional development. An effective training program includes goal planning, regular practice and evaluation of progress. Adaptation to individual needs and motivation are key factors in achieving successful and sustainable results in any discipline.... ponderados. Start by predicting the original data set and give each observation equal weight. If the prediction is wrong using the first student, then incorrectly predicted observations are given greater weight. Being an iterative process, continues adding learners until a limit on the number of models or precision is reached.
Principally, we use decision stamps with AdaBoost. But we can use any machine learning algorithm as the base learner if it accepts the weight in the training dataset. We can use AdaBoost algorithms for classification and regression problems.
You can refer to the article “How to be smart with machine learning: AdaBoost” to understand the AdaBoost algorithms in more detail.
Python code
Here's a live encoding window to get you started. You can run the codes and get the result in this window:
Puede ajustar los parametersThe "parameters" are variables or criteria that are used to define, measure or evaluate a phenomenon or system. In various fields such as statistics, Computer Science and Scientific Research, Parameters are critical to establishing norms and standards that guide data analysis and interpretation. Their proper selection and handling are crucial to obtain accurate and relevant results in any study or project.... para optimizar el rendimiento de los algoritmos, I have mentioned below the key parameters for tuning:
- n_estimators: Control the number of weak students.
- learning rate:CControl the contribution of weak students in the final combination. There is a trade-off between learning rate Y n_estimators.
- base_estimaters: Helps specify different machine learning algorithms.
You can also adjust basic student parameters to optimize their performance.
Impulse algorithm: gradient increase
En el aumento de gradientGradient is a term used in various fields, such as mathematics and computer science, to describe a continuous variation of values. In mathematics, refers to the rate of change of a function, while in graphic design, Applies to color transition. This concept is essential to understand phenomena such as optimization in algorithms and visual representation of data, allowing a better interpretation and analysis in..., train many models sequentially. Cada nuevo modelo minimiza gradualmente la Loss functionThe loss function is a fundamental tool in machine learning that quantifies the discrepancy between model predictions and actual values. Its goal is to guide the training process by minimizing this difference, thus allowing the model to learn more effectively. There are different types of loss functions, such as mean square error and cross-entropy, each one suitable for different tasks and... (y = ax + b + e, e needs special attention as it is an error term) of the entire system using Gradient descent method. El procedimiento de aprendizaje se ajustó consecutivamente a nuevos modelos para proporcionar una estimación más precisa de la variableIn statistics and mathematics, a "variable" is a symbol that represents a value that can change or vary. There are different types of variables, and qualitative, that describe non-numerical characteristics, and quantitative, representing numerical quantities. Variables are fundamental in experiments and studies, since they allow the analysis of relationships and patterns between different elements, facilitating the understanding of complex phenomena.... de respuesta.
The main idea behind this algorithm is to build new base students that can be correlated to the maximum with the negative gradient of the loss function, associated with the whole set. You can refer to the article “Learn the gradient increase algorithm” to understand this concept with an example.
In the Python Sklearn library, usamos Gradient Tree Boosting o GBRT. It is a generalization of the impulse to arbitrary differentiable loss functions. Can be used for both regression and classification problems.
Python code
from sklearn.ensemble import GradientBoostingClassifier #For Classification from sklearn.ensemble import GradientBoostingRegressor #For Regression
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1) clf.fit(X_train, y_train)
- n_estimators: Control the number of weak students.
- learning rate:CControl the contribution of weak students in the final combination. There is a trade-off between learning rate Y n_estimators.
- Maximum depth: maximum depth of individual regression estimators. Maximum depth limits the number of nodes in the tree. Adjust this parameter for the best performance; the best value depends on the interaction of the input variables.
You can adjust the loss function for better performance.
Final note
In this article, we analyze the momentum, one of the ensemble modeling methods to improve predictive power. Here, We have discussed the science behind the impulse and its two types: AdaBoost y Gradient Boost. We also study their respective Python codes.
In my next article, I will discuss about another type of boost algorithms which is now a days secret to winning “XGBoost” data science contests.
Do you find helpful this article? Share your opinions / thoughts in the comment section below.