How to do Backward Elimination in Machine Learning?

Manik Soni
4 min readSep 26, 2020

--

Steps included in Backward Elimination Machine Learning Model.

Backward Elimination in Machine Learning

Step 1: Select a significance level to stay in the model. The significance level is a measure of the strength of the evidence that must be present in your sample data-set. Significance value 0.05 means that a 5% risk exists or actual difference involved.

Step 2: Fitting of all possible predictors that are all possible attributes of the column.

Step 3: We need to derive the p-value(it is the probability that you would obtain the effect observed in your sample data-set ) from the data-set.

Step 4: If the p-value is less than your significance level then you can prove that the Null Hypothesis is rejected that is your attribute or column is significant for the model, otherwise, we remove the column having a p-value greater than the significant level.

Step 5: Fit the model without that column.

Step 6: Follow steps 3,4 & 5 until p-values of all the attributes or columns are less than a significant level.

Now lets, see the practical implementation of the Machine Learning Model.

Step 1. Import the Libraries

Import Libraries

Step 2. Importing the Dataset

Import Dataset

Step 3. Split the data into a matrix of features(X) and the dependent variable(y).

X and y splitting

Step 4. Categorization of ‘State’ Column.

Categorization
Table containing data

Step 5. Removing one dummy variable.

use this code to remove the column
Removal from dataset

Step 6. Splitting the matrix of features(X) and dependent variable(y) into training and test set.

Splitting of Test and Training set

Step 7. Fitting a linear model to test and training dataset.

Fitting the linear model

Step 8. Predicting the Test result.

Predict the model
y_test and y_pred results

Step 10: Add Extra Column containing one(s) because when you look at the equation of multiple linear regression

Multiple Linear Regression

Here, the constant ‘b0’ is multiplied with ‘x0’ whose value is equal to one(1).

The column containing values one(1)

Step 11: Apply the Backward Elimination steps Explained above.

1st Iteration: We have included all the columns in the ‘X_op’ variable and see the p-values of all the columns.

On looking at the p-values of columns and comparing with the significance level value and eliminating the column having the highest p-value that is variable ‘x2’.

2nd Iteration: After eliminating the variable ‘x2', we again fit the model and do the same comparison as we did in the first iteration.

Now, eliminate the ‘x1' column.

3rd Iteration: After eliminating the variable ‘x1’, we again fit the model and do the same comparison as we did in the first iteration.

Now, eliminate the ‘x2’ column.

4th Iteration: After eliminating the variable ‘x2’, we again fit the model and do the same comparison as we did in the first iteration.

Now, eliminate the ‘x2’ column.

5th Iteration: After eliminating the variable ‘x2’, we again fit the model and do the same comparison as we did in the first iteration.

Now our Model is Ready.

--

--

No responses yet