Search

CloseClose

Do you really want to create a new entry?

Offices and unitsDemographicsPartiesRegionsSettlementsPlacesPeopleArticles

Create new

# Influential points

Some quick definitions,

Term

Description

Outlier

A data point is an outlier if it has a response that does not follow the general trend of the data.

High Leverage

A data point has high leverage if its predictor values are unique. This may mean that it has one or more predictor values that are extraordinarily high or low (e.g. a predictor value of 1 or 15, when most of the predictor values are between 5 to 10). Also, it may mean that it has predictor values that normally do not go together; for example, on a data set of K-12 students, we know of course that being eight years old is not unusual, nor is being in the twelfth grade unusual, but being eight years old and in the twelfth grade would be a data point with high leverage.

## Calculations

Term

Formula

Description

Residual

As usual, this is just the difference between the observed result and the predicted result for an observation.

Deleted residual

A deleted residual is the residual for an observation, based on a model that has that row removed. These are also called PRESS prediction errors, or unstandardized deleted residuals. They will usually be larger than non-deleted residuals, because the influential observation will pull the predicted value towards itself.

Predicted R-squared

This is generally a more intuitive result than working with PRESS. Also, this is a helpful way to evaluate a model without having to split the data into training and validation sets.

Leverage

All the leverages should add up to .

Leverage threshold

If leverage is greater than three times then we can say it is high leverage.

Studentized residual

In other words, this is the residual, divided by an estimate of the standard deviation of the residuals.

Studentized deleted residual

Minitab refers to these as deleted residuals.

Difference in fits (DFFITS)

DFFITS threshold

Cook's distance

A large Cook's distance value indicates an observation is influential.