What is Multiple Linear Regression?
Multiple Linear Regression is a statistical technique that uses multiple independent variables to do predictions.
The Dependent Variable is a variable that is the outcome we say the result which is the actual outcome of the prediction.
The Independent Variable is a variable that helps in the prediction or these are those features that will do the prediction.
In the above equation, ‘Constant’ is a predefined requirement that affects the situation and ‘Coefficient’ is a variable constant that multiple with the independent variable to give the impact or the significance to the dependent variable.
There are some assumptions that a Multiple Linear Regression Model will follow:
- Linearity
- Homoscedasticity
- Multivariate normality
- Independence of errors
- Lack of multicollinearity
But for now on we are not focusing on these assumptions. So in order to build a model of multilinear regression we just first need to have a dataset that helps us to understand the concept.
So we are having a dataset of 50 startup companies who have invested in various types of domains like R&D Spend, Marketing. etc. So our dependent variable here is the ‘Profit’ that is the actual outcome of the investment.
But before we begin our discussion on building a model let’s first discuss the Dummy variables concept.
Our state column should be categorical to do predictions.
But we need to remove one dummy variable because it is self-explanatory that ‘0’ in the ‘New York’ column is ‘California’ and we cannot be in the Dummy variable trap.
So we did the data preprocessing successfully. Now to build the model of multiple linear regression, we should follow the following steps.
Step 1. Import the Libraries
Step 2. Importing the Dataset
Step 3. Split the data into a matrix of features(X) and the dependent variable(y).
Step 4. Categorization of ‘State’ Column.
Step 5. Removing one dummy variable.
Step 6. Splitting the matrix of features(X) and dependent variable(y) into training and test set.
Step 7. Fitting a linear model to test and training dataset.
Step 8. Predicting the Test result.
Step 9. Making the Confusion Matrix to do predictions.
Confusion Matrix helps to give the accuracy of our model that is the number of false values and true values.