How to do Backward Elimination in Machine Learning?

4 min readSep 26, 2020

Steps included in Backward Elimination Machine Learning Model.

Backward Elimination in Machine Learning

Step 1: Select a significance level to stay in the model. The significance level is a measure of the strength of the evidence that must be present in your sample data-set. Significance value 0.05 means that a 5% risk exists or actual difference involved.

Step 2: Fitting of all possible predictors that are all possible attributes of the column.

Step 3: We need to derive the p-value(it is the probability that you would obtain the effect observed in your sample data-set ) from the data-set.

Step 4: If the p-value is less than your significance level then you can prove that the Null Hypothesis is rejected that is your attribute or column is significant for the model, otherwise, we remove the column having a p-value greater than the significant level.

Step 5: Fit the model without that column.

Step 6: Follow steps 3,4 & 5 until p-values of all the attributes or columns are less than a significant level.

Now lets, see the practical implementation of the Machine Learning Model.

Step 1. Import the Libraries

Step 2. Importing the Dataset

Step 3. Split the data into a matrix of features(X) and the dependent variable(y).

Step 4. Categorization of ‘State’ Column.

Step 5. Removing one dummy variable.

use this code to remove the column

Step 6. Splitting the matrix of features(X) and dependent variable(y) into training and test set.

Step 7. Fitting a linear model to test and training dataset.

Step 8. Predicting the Test result.

Step 10: Add Extra Column containing one(s) because when you look at the equation of multiple linear regression

Here, the constant ‘b0’ is multiplied with ‘x0’ whose value is equal to one(1).

The column containing values one(1)

Step 11: Apply the Backward Elimination steps Explained above.

1st Iteration: We have included all the columns in the ‘X_op’ variable and see the p-values of all the columns.

On looking at the p-values of columns and comparing with the significance level value and eliminating the column having the highest p-value that is variable ‘x2’.

2nd Iteration: After eliminating the variable ‘x2', we again fit the model and do the same comparison as we did in the first iteration.