Analysis of Variance (ANOVA) table and F-test
Term | Short form | Description |
---|---|---|
Analysis of Variance | ANOVA | The variance of the response variable y is the average of the sum of squares, or in other words, the analysis of the mean squares. |
Analysis of Variance table | ANOVA table | A table that provides a range of numbers relating to variance and dependance. |
Degrees of freedom | DF | Degrees of freedom are variations on population size, based on the type of computations. Some are |
Sum of squares | SS | The sum of the squares of various differences — sometimes the difference between a predicted variable and an observed variable; other times between a predicted variable and a mean of the predicted variables. Altogether, this measures difference types of variance. |
Mean squares | MS | This is always the sum of squares divided by the degrees of freedom, or in other words, |
Regression sum of squares | SSR | Measure of the variance (sum of squares) in y, that is due to changes in the predictor variable x. If this is a large proportion of the SSTO then that indicates that there is indeed a linear association between the predictor and the response variables. |
Error sum of squares | SSR | Measure of the variance (sum of squares) in y, that is due to random error. If this is a large proportion of the SSTO then that indicates that there is not a linear association between the predictor and the response variables. |
Total sum of squares | SSTO | The sum of SSR and SSE. |
F-Value | If | |
P-Value | P | "What is the probability that we’d get an F statistic as large as we did, if the null hypothesis is true?" |
Here are the conclusions for
Hypothesis | Description | ||
---|---|---|---|
We accept the null hypothesis that... | We reject the null hypothesis that... | The null hypothesis is that the predictor variable has zero impact on the response variable, i.e. there is no linear relationship between these two variables. | |
We reject the hypothesis that... | We accept the hypothesis that... | The hypothesis is that the predictor variable has an impact on the response variable, i.e. these two variables have a linear relationship. |
Using software, we generally wind up with tables where all the values are calculated automatically. However, it is still important to understand what these are composed of, and how the values are calculated. Below is an "analysis of variance" (ANOVA) table,
Source of Variation | Degrees of Freedom (DF) | Sum of Squares (SS) | Mean Squares (MS) | F-Value | P-Value |
---|---|---|---|---|---|
Regression (R) | |||||
Residual Error (E) | |||||
Lack of Fit (LF) | |||||
Pure Error (PE) | |||||
Total (TO) |
Below are formulas related to the sum of squares,
Source of Variation | Degrees of Freedom (DF) | Sum of Squares (SS) | Mean Squares (MS) | F-Value | P-Value |
---|---|---|---|---|---|
R | |||||
E | |||||
LF | |||||
PE | |||||
TO |
The values
Variable | Description |
---|---|
Number of observations. For example, for observations | |
The number of unique observed values. For example, for observations |
Below are more formulas related to the sum of squares,
Source of Variation | Degrees of Freedom (DF) | Sum of Squares (SS) | Mean Squares (MS) | F-Value | P-Value |
---|---|---|---|---|---|
R | |||||
E | |||||
LF | |||||
PE | |||||
TO |
Calculating R-squared (R²)
Generally, we can calculate R² as follows,
There is also an adjusted R²,
Also an adjusted R²,
Lack of fit test
To develop the F-statistic for lack of fit, we divide the mean square for the lack of fit divided by the mean square for the pure error,
We use an F test based on the null hypothesis that there is no lack of fit, and the hypothesis that there is lack of fit. If we get a p-value less than our