What is Multiple Linear Regression?

Manik Soni
4 min readSep 25, 2020
Multiple Linear Regression

Multiple Linear Regression is a statistical technique that uses multiple independent variables to do predictions.

Dependent and independent variables

The Dependent Variable is a variable that is the outcome we say the result which is the actual outcome of the prediction.

The Independent Variable is a variable that helps in the prediction or these are those features that will do the prediction.

Constant and coefficients

In the above equation, ‘Constant’ is a predefined requirement that affects the situation and ‘Coefficient’ is a variable constant that multiple with the independent variable to give the impact or the significance to the dependent variable.

There are some assumptions that a Multiple Linear Regression Model will follow:

  1. Linearity
  2. Homoscedasticity
  3. Multivariate normality
  4. Independence of errors
  5. Lack of multicollinearity

But for now on we are not focusing on these assumptions. So in order to build a model of multilinear regression we just first need to have a dataset that helps us to understand the concept.

So we are having a dataset of 50 startup companies who have invested in various types of domains like R&D Spend, Marketing. etc. So our dependent variable here is the ‘Profit’ that is the actual outcome of the investment.

DataSet to understand the concept

But before we begin our discussion on building a model let’s first discuss the Dummy variables concept.

Dataset

Our state column should be categorical to do predictions.

categorization of column

But we need to remove one dummy variable because it is self-explanatory that ‘0’ in the ‘New York’ column is ‘California’ and we cannot be in the Dummy variable trap.

Dataset

So we did the data preprocessing successfully. Now to build the model of multiple linear regression, we should follow the following steps.

Step 1. Import the Libraries

Import Libraries

Step 2. Importing the Dataset

Import Dataset

Step 3. Split the data into a matrix of features(X) and the dependent variable(y).

X and y splitting

Step 4. Categorization of ‘State’ Column.

Categorization
Table containing data

Step 5. Removing one dummy variable.

use this code to remove the column
Removal from dataset

Step 6. Splitting the matrix of features(X) and dependent variable(y) into training and test set.

Splitting of Test and Training set

Step 7. Fitting a linear model to test and training dataset.

Fitting the linear model

Step 8. Predicting the Test result.

Predict the model
y_test and y_pred results

Step 9. Making the Confusion Matrix to do predictions.

confusion matrix
Confusion Matrix

Confusion Matrix helps to give the accuracy of our model that is the number of false values and true values.

Accuracy of our model

--

--