2020-11-05

Sketch a scatterplot in which the presence of an outlier decreases the observed correlation between the response and explanatory variables. Indicate on your plot which point is the outlier.

Step 1

Let the following data be the initial dataset (without any outliers):

[x 1 2 3 4 5 y 1 3 5 7 9]

Since all points lie on the line

Let's add the point (6,2) to our initial dataset, so that the data is:

[x 1 2 3 4 5 6 y 1 3 5 7 9 2]

The outlier that we've just added clearly doesn’t belong to the line

since

Therefore, we lost the perfect relationship between the two variables, so the correlation coefficient decreased and the new correlation coefficient is

Here is the scatterplot of the given data (outlier is coloured in blue):