Hypothesis testing on linear regressions

Published: 2021 February 07Modified: 2021 February 07, 02:55:32More details

The goal of a simple linear regression is to approximate the population as a whole, not just the sample given. There are two common ways of evaluating how well simple linear regression models are likely to resemble the overall population, and whether there is a relationship between two variables.

With all of the tests below, we calculate a t-statistic. From the t-statistic, we take the t-statistic and reference a t-distribution table. A t-distribution table is organized by degrees of freedom (rows) and levels of certainty (columns). If the absolute value of the t-statistic is greater than the number in the t-distribution table, we reject our null hypothesis and accept our hypothesis.

outcome	Description
	We will reject the null hypothesis. In other words, we will conclude: "There is sufficient evidence at this level to conclude that there is a linear relationship in the population between the predictor variable and the response variable."
	We will accept the null hypothesis. In other words, we will conclude: "There is not enough evidence at this level to conclude that there is a linear relationship in the population between the predictor variable and the response variable."

t-test for the population correlation coefficient

The correlation coefficient for the sample is , and for population correlation is . The coefficient of determination for the sample is . Just because we observe a relationship in the sample, we have to test to establish whether we can be confident to a certain degree (i.e. 95%) that the relationship exist.

Hypothesis	Statement	Description
		There is no relationship between the two variables.
	or or	There is a relationship between the two variables.

t-test to determine linear association

This t-test is a.k.a the "slope" test. It tests whether a relationship exists, making it similar to the previous test. We fundamentally have either a hypothesis (i.e. there is a relationship) or a null hypothesis . The hypothesis is that the linear regression line is either positive or negative, meaning that . Alternatively, if there is no relationship, then the linear regression line is perfectly horizontal, or in other words that ).

The hypothesis is: "There is a relationship between the two variables. In other words, ." The null hypothesis is: "There is no relationship between the two variables. In other words, ."

Hypothesis	Statement	Description
		There is no relationship between the two variables.
		There is a relationship between the two variables.

Calculating the likelihood that the population slope is some value (i.e., in this case, ) is closely related to calculating the confidence interval around the slope. Both calculations involve the standard error of the slope,

In full notation,

Generally, when we test if a relationship exists, we will use . The degrees of freedom are .

F-test to determine if a line or a curve is the best fit

The F-test is also called the "analysis of variance" (ANOVA) test.

Hypothesis	F-test