Step 1
Let the following data be the initial dataset (without any outliers):
[x 1 2 3 4 5 y 1 3 5 7 9]
Since all points lie on the line
Let's add the point (6,2) to our initial dataset, so that the data is:
[x 1 2 3 4 5 6 y 1 3 5 7 9 2]
The outlier that we've just added clearly doesn’t belong to the line
since
Therefore, we lost the perfect relationship between the two variables, so the correlation coefficient decreased and the new correlation coefficient is
Here is the scatterplot of the given data (outlier is coloured in blue):
Unusual points Each of the four scatterplots that follow shows a cluster of points and one “stray” point. For each, answer these questions:
1) In what way is the point unusual? Does it have high leverage, a large residual, or both?
2) Do you think that point is an influential point?
3) If that point were removed, would the correlation be- come stronger or weaker? Explain.
4) If that point were removed, would the slope of the re- gression line increase or decrease? Explain
Scatterplots Which of the scatterplots below show
a) little or no association?
b) a negative association?
c) a linear association?
d) a moderately strong association?
e) a very strong association?