We use various techniques to achieve greater accuracy in machine learning models. One such method is ensemble learning. In this article, we will discuss the basics of ensemble learning. We will discuss the various ensembling techniques and the differences between them.
- What is Ensemble Learning?
- What Are The Different Ensembling Techniques in Machine Learning?
- Bagging in Machine Learning
- Boosting in Machine Learning
- What Are The Different Types of Boosting Techniques?
- Stacking in Machine Learning
- Bagging vs Boosting: Differences Between The Ensembling Techniques
What is Ensemble Learning?
Ensemble learning is a technique to build machine learning applications using multiple ML models instead of a single model. Here, an ensemble consists of various machine-learning models that participate in deciding the output of the machine-learning application. Each model in the ensemble is calibrated to reduce the bias and variance in the machine-learning applications while predicting the outputs.
When we train a machine learning model, it might run into overfitting or underfitting. In both cases, the predictions of the ML model aren’t very accurate. When we create an ensemble of multiple models and assign them weights to predict the final output, the combination can reduce the bias and variance leading to better performance of the machine learning application.
Some examples of ensemble learning algorithms include Random forests, AdaBoost, Gradient Boosting, and XGBoost.
What Are The Different Ensembling Techniques in Machine Learning?
Based on the implementation details, we can divide the Ensemble Learning techniques into three categories.
Let us discuss each technique separately.
Bagging in Machine Learning
Bagging is an acronym for Bootstrap Aggregation. It is primarily used in supervised machine learning applications like classification and regression. In bagging, a machine learning model consists of several small models. The entire model is termed the primary model and the smaller machine learning models are terms base models. All the base models are trained on different samples of training data. In bagging, each base model works independently and is not affected by other base models.
While predicting output for new data points, the predictions of all the base models are aggregated and the final output of the primary model is decided by assigning weights to the outputs of the based models.
As the name suggests, the Bagging technique consists of two steps.
- Bootstrapping: In bootstrapping, we create random samples from the data with replacements. The data samples are then fed to the machine learning model. Different base models inside the primary machine learning model are then trained on the data samples. Any new data point is processed by all the base models to predict the output.
- Aggregation: In aggregation, predictions from all the base models are aggregated. Then, the final output is generated by calculations on the weights of the base models and their outputs.
To understand Bagging in Machine learning, let us take the example of the Random forest algorithm.
In the random forest algorithm, we use multiple decision trees to generate the final output. While training, different decision trees are trained using samples of the input data.
- For regression tasks, we can predict the output of each decision tree. Then, we can take the mean, median, or weighted average of the outputs of the decision trees to generate the final output for the random forest regression.
- Similarly, if we are using a random forest algorithm for classification, we can use the majority or weighted average of classification scores of each decision tree to classify any new data point fed to the random forest classifier.
Bagging helps us reduce variance for our machine learning model.
Boosting in Machine Learning
Boosting is another ensembling technique that we use to train machine learning models for better performance. In boosting, we combine a set of weak learners into strong learners. Here, the base models are dependent on each other for predictions. Boosting optimizes the loss function of weak learners. By iteratively improving the weak learners, boosting helps us reduce the bias in our machine-learning model.
In boosting, we use the following steps.
- First, a base model is trained on the input data by assigning equal weights to each data point.
- Then, the incorrect predictions made by the base model are identified. After identifying the data points for which the predictions are wrong, we assign higher weights to the data points.
- Next, the weighted data is fed to the next base model. Again, the predictions are analyzed, and the data points with incorrect predictions are given higher weightage and given as input to this next base model.
This process of sequentially passing outputs of one base model to another base model creates an ensemble that performs better than all the base models. Hence, the weak learners are combined to create the final machine learning model with better performance.
What Are The Different Types of Boosting Techniques?
Although all the machine learning algorithms using boosting combine weak learners to create a strong learner for building high-performance classification and regression models, they can differ in how they create and aggregate weak learners during the sequential learning process. Based on the differences, we use the following boosting techniques.
- Adaptive Boosting (AdaBoost)
- Gradient Boosting
- Extreme Gradient Boosting (XGBoost)
Adaptive Boosting (AdaBoost)
AdaBoost is primarily used in training classification models. In adaptive boosting, the weak learners take into account a single feature and create a single split decision tree that we name as decision stump. While creating the decision stump, each observation in the input data is weighted equally. Once we create the decision stumps in the first iteration, we analyze the prediction results. If we get any observations with incorrect outputs, we assign them higher weights. Then, new decision stumps are created considering that the observations with higher weights are more significant.
The above process is executed iteratively by identifying incorrect predictions and adjusting the weight of the data points until we get the correct outputs for the training data points to the maximum extent.
The gradient boosting algorithm is also based on sequential learning and can be used in classification and regression tasks. However, it differs from Adaptive boosting. In the gradient boosting algorithm, the focus is on minimizing the error of the base model instead of assigning weights to input data points with incorrect predictions. For this, we use the following steps.
- First, a weak learner is trained on a data sample and the predictions are obtained.
- Then, we add a new we learner sequentially after the previous base model. The new base model tries to optimize the loss function. In gradient boosting, we don’t add weights to the incorrectly predicted data points. Instead, we try to minimize the loss function for the weak learners that we are using.
- After each iteration, we add a new base model with an optimized loss function until we get satisfactory results.
Extreme Gradient Boosting (XGBoost)
As the name suggests, XGBoost is an advanced version of the gradient boosting algorithm. The XGBoost algorithm was designed to increase the speed and accuracy of the gradient boosting algorithm. It uses parallel processing, cross-validation, regularization, and cache optimization to increase the computational speed and model efficiency.
Other ensembling algorithms using various forms of boosting techniques are LightGBM (Light Gradient Boosting Machine), CatBoost, etc.
Disadvantages of Boosting
Apart from its advantages, using boosting techniques to train machine learning algorithms also poses some challenges as discussed below.
- Boosting sometimes can lead to overfitting.
- We sequentially train the weak learners in all the boosting ensembling techniques. Since each learner is built on its predecessor, boosting can be computationally expensive and is hard to scale up. However, algorithms like XGBoost tackle the issue of scalability to a great extent.
- Boosting algorithms are sensitive to outliers. As each model attempts to predict the target values correctly for the data points in the training set, outliers can skew the loss functions of the base learners significantly.
Stacking in Machine Learning
Stacking uses different levels of machine learning models to create classification or regression models. In stacking, we first train multiple weak learners in a parallel manner. The predictions of the weak learners are then fed to another machine-learning model for training and predictions.
Bagging vs Boosting: Differences Between The Ensembling Techniques
Bagging and boosting gather major attention out of the three ensembling techniques due to their larger adoption for creating classification and regression models. Hence, it is important to discuss the similarities and differences the bagging and boosting ensembling techniques. The following table summarizes the difference between bagging and boosting.
|Bagging focuses on minimizing the variance.
|Boosting focuses on minimizing bias.
|The base models in bagging are independent of each other.
|In boosting, the base models are dependent on each other as they are trained sequentially.
|In bagging, each base model has a different weight.
|In boosting, no weights are assigned to the models and they all have the same weights.
|The base models work parallelly in bagging.
|The base models work sequentially in boosting.
|In bagging, the data points in the training samples are decided using row sampling with random sampling with replacement.
|In boosting, each new sample is decided by the factors that are misclassified by the previous models.
|Bagging trains faster.
|Boosting is slower to train compared to bagging.
In this article, we discussed various ensembling techniques in machine learning that you can use to build classification and regression models. To learn more about machine learning, you can read this article on overfitting and underfitting in machine learning. You might also like this article on Naive Bayes classification numerical example.
I hope you enjoyed reading this article. Stay tuned for more informative articles.