This blog is a simple summary of the ensemble methods.
1. Basic methods for ensemble
1.1 Voting
Voting is to get the ensemble result of a classification problem by gethering all the results from each classifier. And then make the label with most votes the final result.
1.2 Averaging
Averaging is to take the mathematical average of all the output of regressors to be the final result of a regression problem.
2. Bagging
The main idea of Bagging is sampling with returning. After sampling, we use the samples to build basic models and train them to optimal states. Repeat this process many times and then do the ensemble by voting or averaging.
Characteristic:
- Reduces variance and increase accuracy;
- Robust against outliers and noisy data;
- Often used with Decision Tree (Random Forest).
3. Boosting
A parallelizable algorithm with iteration method. Focus on the failure samples during each iteration and assign more weights on these samples in order to classify them easier in next iteration. At the end we sum the weak classifiers weighted as the final results.
Characteristic:
- Also reduce variance and increase accuracy;
- Not robust against outliers or noisy data;
- Flexible - can be used with any loss function.
4. Stacking
Train many strong learners with cross validation at first, and then predict on the train data and test data with these learners respectively. Concatenate the predict results of train dataset to be a new features data for the second time learning and concatenate the result of test dataset to be the test data.Train the second time learner on the new train dataset (use original labels) then predict with this model on the new test dataset. The predict output is the final result.
Characteristic
- Used to ensemble a diverse group of strong learner;
- Involves training a second-level Machine Learning algorithm called a “metalearner” to learn the optimal combination of the base learners.
###Update 2018.1.14 Note that before the procedure of ensemble, we can do some extra experiment for a deeper insight of the dataset.
- For Decision Tree mode, we can visualize its decision path;
- Display the score of all the models;
- Calculate the correlation coefficient between the base models (highly coupled error will accumulate);
- Show the correlation matrix.
Details in Ref[2]