# Influential points

Some quick definitions,

Term | Description |
---|---|

Outlier | A data point is an outlier if it has a response that does not follow the general trend of the data. |

High Leverage | A data point has high leverage if its predictor values are unique. This may mean that it has one or more predictor values that are extraordinarily high or low (e.g. a predictor value of 1 or 15, when most of the predictor values are between 5 to 10). Also, it may mean that it has predictor values that normally do not go together; for example, on a data set of K-12 students, we know of course that being eight years old is not unusual, nor is being in the twelfth grade unusual, but being eight years old |

## Calculations

Term | Formula | Description |
---|---|---|

Residual | As usual, this is just the difference between the observed result and the predicted result for an observation. | |

Deleted residual | A deleted residual is the residual for an observation, based on a model that has that row removed. These are also called PRESS prediction errors, or unstandardized deleted residuals. They will usually be larger than non-deleted residuals, because the influential observation will | |

Predicted R-squared | This is generally a more intuitive result than working with PRESS. Also, this is a helpful way to evaluate a model without having to split the data into training and validation sets. | |

Leverage | All the leverages should add up to | |

Leverage threshold | If leverage is greater than three times | |

Studentized residual | In other words, this is the residual, divided by an estimate of the standard deviation of the residuals. | |

Studentized deleted residual | Minitab refers to these as deleted residuals. | |

Difference in fits (DFFITS) | ||

DFFITS threshold | ||

Cook's distance | A large Cook's distance value indicates an observation is influential. |