Multiple Linear Regression
Simple linear regression | Multiple linear regression |
---|---|
There is only one slope parameter for which we can perform hypothesis tests. | There are three different types of hypothesis tests we can perform,
|
Testing a multiple linear regression model
Let's say we have a multiple linear regression model. We may ask the following types of questions,
Do some or all of the variables have statistically significant roles?
Conduct an F-test of whether certain population slope parameters are equal to zero.
Does it make statistically significant improvement to a model to include a certain variable?
To answer any of these, we are testing whether one ore more slopes might be equal to zero. In other words, we are testing a full model versus a reduced model.
Full model versus reduced model
Consider the following example regarding a full model with three variables,
We want to test a reduced model with some variables missing, or in other words a reduced model where some slopes are equal to zero. Consider a reduced model with
Or consider a reduced model with
Understanding full and reduced models is important,
Model | Description |
---|---|
Full model | A model with all possible predictor variables in the regression formula. |
Reduced model | A model where we remove insignificant predictor variables from the regression formula, or which we use for testing whether it is appropriate to remove those predictor variables. |
Useful tests
To test this, we can do some testing,
Method | Description | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
General linear F-statistic | We can use an F-statistic to test whether one or more variables have slopes of zero, using the error sums of squares for the full and the reduced regression models. Or alternatively, Note that the degrees of freedom are, Where SSE(F) and DF(F) are the residual error sum of square for the full model and its corresponding degrees of freedom, and SSE(R) and DF(R) are the corresponding values for the reduced model. And we are testing,
And to test this F-statistic, our degrees of freedom when calculating the P-value are,
Note that when we are using a simple linear regression model to test the one slope parameter That is essentially a simplified version of the equation above, where for a simple linear regression model,
| ||||||||||||||||||||||
t-statistic | We can use a t-statistic to test whether one of the slopes is equal to zero. As usual, this is the slope, divided by the standard error of the slope. From there, we can calculate the p-value for the t-statistic based on The problem with doing multiple individual t-tests instead of one F-test is that we are more likely to make an incorrect conclusion overall, because in each of the multiple tests we are conducting we may be making an incorrect conclusion. In each of the individual t-tests, we may make a Type I or a Type II error,
| ||||||||||||||||||||||
Partial F-test |