What is Multiple Linear Regression?

4 min readSep 25, 2020

Multiple Linear Regression is a statistical technique that uses multiple independent variables to do predictions.

The Dependent Variable is a variable that is the outcome we say the result which is the actual outcome of the prediction.

The Independent Variable is a variable that helps in the prediction or these are those features that will do the prediction.

In the above equation, ‘Constant’ is a predefined requirement that affects the situation and ‘Coefficient’ is a variable constant that multiple with the independent variable to give the impact or the significance to the dependent variable.

There are some assumptions that a Multiple Linear Regression Model will follow:

Linearity
Homoscedasticity
Multivariate normality
Independence of errors
Lack of multicollinearity

But for now on we are not focusing on these assumptions. So in order to build a model of multilinear regression we just first need to have a dataset that helps us to understand the concept.

So we are having a dataset of 50 startup companies who have invested in various types of domains like R&D Spend, Marketing. etc. So our dependent variable here is the ‘Profit’ that is the actual outcome of the investment.

But before we begin our discussion on building a model let’s first discuss the Dummy variables concept.

Our state column should be categorical to do predictions.

But we need to remove one dummy variable because it is self-explanatory that ‘0’ in the ‘New York’ column is ‘California’ and we cannot be in the Dummy variable trap.