Skip to main content

Ensemble learning

bagging and boosting are the two major families of ensemble techniques. Most other methods (like stacking, blending, or random subspaces) can be thought of as variations or combinations of these core approaches.


1. Bagging (Bootstrap Aggregating):

  • Key Idea: Reduce variance by training multiple models independently and averaging their outputs (for regression) or taking a majority vote (for classification).
  • Characteristics:
    • Uses bootstrapped datasets (sampling with replacement).
    • Models are trained in parallel, so there's no dependency between them.
    • Works best with models that have high variance (e.g., decision trees).
  • Example:
    • Random Forest: Combines decision trees by training each tree on a random subset of data and features, then aggregates predictions.

2. Boosting:

  • Key Idea: Reduce bias by sequentially training models, where each model corrects the mistakes of its predecessor.
  • Characteristics:
    • Models are trained in sequence, with each one focusing on the errors of the previous models.
    • Weights are updated to give more importance to misclassified examples.
    • Works best with weak learners (e.g., shallow decision trees).
  • Examples:
    • AdaBoost: Increases weights on misclassified samples.
    • Gradient Boosting: Optimizes a loss function by training models sequentially to minimize residuals.
    • XGBoost, LightGBM, CatBoost: Variants of Gradient Boosting with improved speed and accuracy.

Comparison Between Bagging and Boosting:

FeatureBaggingBoosting
GoalReduce varianceReduce bias
TrainingParallel (independent models)Sequential (models dependent)
FocusEqual treatment of all samplesFocus on hard-to-predict samples
Overfitting RiskLowerHigher if not tuned well
ExamplesRandom Forest, Bagged TreesAdaBoost, Gradient Boosting

Why Some People Focus on Just Bagging and Boosting

  • These two are the foundation of most ensemble methods.
  • Stacking, blending, and others often combine aspects of bagging and boosting.