Regression
Regression is the measure of the relationship between a predictor variable (e.g. age) and a response variable (e.g. number of annual hospital visits).
There are different types of regressions based on the type of line that best portrays the scatter plot of the two variables,
Type of shape | Description |
---|---|
Linear regression | A straight line best portrays the overall trend when plotting the predictor and the response. |
Curvilinear regression | A curved line best portrays the overall when plotting the predictor and the response. |
Also, there are different types of regressions based on the number of predictors,
Number of predictors | Description |
---|---|
Simple regression | There is one predictor and one response. |
Multiple regression | There are multiple predictors and one response. |
There are several parameters to analyze the regression (relationship) between variables,
Technique | Shorthand | Description |
---|---|---|
Degrees of freedom | DF | This can best be described as the number of possible states for a system. When taking the mean of a population, the population size |
Sum of Squares | SS | Two variables may be related very strongly, and hug a trend line very tightly. But oftentimes, there is some amount of random error. For example, as patients get older, they may generally require more annual hospitalizations, but there may be some older patients who almost never visit the hospital. We measure this type of variance by finding the difference between the observed outcome and the predicted outcome, and then squaring that difference. The difference may be positive or negative, but by squaring it, we get a positive result that quantifies the magnitude of the difference, not the sign of the difference. There are lots of other types of SS calculations may measure the difference between the observed outcome and the overall mean, or between the predicted outcome and the overall mean. |
Mean squares | MS | By dividing the sum of squares by the degrees of freedom, we calculate the MS parameter. This quantifies how "spread-apart" the values are for whatever we are measuring, such as a sample, population, or the errors between predicted and observed values. |