Regression

Published: 2021 January 30Modified: 2021 January 30, 22:05:40More details

Regression is the measure of the relationship between a predictor variable (e.g. age) and a response variable (e.g. number of annual hospital visits).

There are different types of regressions based on the type of line that best portrays the scatter plot of the two variables,

Type of shape	Description
Linear regression	A straight line best portrays the overall trend when plotting the predictor and the response.
Curvilinear regression	A curved line best portrays the overall when plotting the predictor and the response.

Also, there are different types of regressions based on the number of predictors,

Number of predictors	Description
Simple regression	There is one predictor and one response.
Multiple regression	There are multiple predictors and one response.

There are several parameters to analyze the regression (relationship) between variables,

Technique	Shorthand	Description
Degrees of freedom	DF	This can best be described as the number of possible states for a system. When taking the mean of a population, the population size is the number of degrees of freedom. In other cases, the DF may be the number of unique values in the population. For some parameters, the DF equals because we are examining a line and so the two values necessary to establish the line result in two less possible states. For other parameters, the DF equals because when making a prediction, the prediction is not an observed and "possible" state but rather is determined mathematically from the observations.
Sum of Squares	SS	Two variables may be related very strongly, and hug a trend line very tightly. But oftentimes, there is some amount of random error. For example, as patients get older, they may generally require more annual hospitalizations, but there may be some older patients who almost never visit the hospital. We measure this type of variance by finding the difference between the observed outcome and the predicted outcome, and then squaring that difference. The difference may be positive or negative, but by squaring it, we get a positive result that quantifies the magnitude of the difference, not the sign of the difference. There are lots of other types of SS calculations may measure the difference between the observed outcome and the overall mean, or between the predicted outcome and the overall mean.
Mean squares	MS	By dividing the sum of squares by the degrees of freedom, we calculate the MS parameter. This quantifies how "spread-apart" the values are for whatever we are measuring, such as a sample, population, or the errors between predicted and observed values.