Multiple Linear Regression

Published: 2021 March 06Modified: 2021 March 15, 04:18:56More details

Simple linear regression	Multiple linear regression
There is only one slope parameter for which we can perform hypothesis tests.	There are three different types of hypothesis tests we can perform, Test that a single one of the slope parameters is equal to zero. Test that all of the slope parameters are equal to zero. Test that a subset (more than one, but not all) of the slope parameters are equal to zero.

Testing a multiple linear regression model

Let's say we have a multiple linear regression model. We may ask the following types of questions,

Do some or all of the variables have statistically significant roles?
Conduct an F-test of whether certain population slope parameters are equal to zero.
Does it make statistically significant improvement to a model to include a certain variable?

To answer any of these, we are testing whether one ore more slopes might be equal to zero. In other words, we are testing a full model versus a reduced model.

Full model versus reduced model

Consider the following example regarding a full model with three variables,

We want to test a reduced model with some variables missing, or in other words a reduced model where some slopes are equal to zero. Consider a reduced model with missing,

Or consider a reduced model with missing,

Understanding full and reduced models is important,

Model	Description
Full model	A model with all possible predictor variables in the regression formula.
Reduced model	A model where we remove insignificant predictor variables from the regression formula, or which we use for testing whether it is appropriate to remove those predictor variables.

Useful tests

To test this, we can do some testing,

Method

Description

General linear F-statistic

We can use an F-statistic to test whether one or more variables have slopes of zero, using the error sums of squares for the full and the reduced regression models.

Or alternatively,

Note that the degrees of freedom are,

Where SSE(F) and DF(F) are the residual error sum of square for the full model and its corresponding degrees of freedom, and SSE(R) and DF(R) are the corresponding values for the reduced model.

And we are testing,

Hypothesis	Description
Null Hypothesis	The slopes omitted from the reduced model are indeed equal to zero. They have no correlation to the response variable, and are not statistically important for the model. We prefer the reduced model.
Alternate Hypothesis	The slopes omitted from the reduced model are not equal to zero. They have correlation to the response variable, and are statistically important for the model. We prefer the full model.

And to test this F-statistic, our degrees of freedom when calculating the P-value are,

Degrees of freedom	Description
Numerator	Number of variables we are removing from the full model: DF(R) - DF(F).
Denominator	We use DF(F) for our denominator degrees of freedom. Remember that we are essentially testing the meaningfulness of the full model.

Note that when we are using a simple linear regression model to test the one slope parameter , our F-statistic is,

That is essentially a simplified version of the equation above, where for a simple linear regression model,

Simple linear regression	Multiple linear regression
SSTO	SSE(R)
SSE	SSE(F)
	DF(R) =
	DF(F) =

t-statistic

We can use a t-statistic to test whether one of the slopes is equal to zero. As usual, this is the slope, divided by the standard error of the slope.

From there, we can calculate the p-value for the t-statistic based on degrees of freedom, where p is the number of variables under consideration. We can conclude that there is sufficient evidence to reject the null hypothesis that the slope coefficient for the variable is zero, ignoring the other variable(s).

The problem with doing multiple individual t-tests instead of one F-test is that we are more likely to make an incorrect conclusion overall, because in each of the multiple tests we are conducting we may be making an incorrect conclusion. In each of the individual t-tests, we may make a Type I or a Type II error,

Error	Description
Type I Error	Reject a null hypothesis that is true: retain the variable when it should be dropped.
Type II Error	Accept a null hypothesis that is true: drop the variable when it should be retained.

Partial F-test

Endnotes

https://online.stat.psu.edu/stat501/lesson/6