Given: \(\displaystyle{n}={S}{a}\mp\le\ {s}{i}{z}{e}={90}\) a) First round is on the horizontal axis and Second round is on the vertical axis. b) When there is no strong curvature presents in the scatterplot, then it is safe to assume that there is a linear relationship between the variables and thus it is then reasonable to find a regression line. We note that the scatterplot of part (a) does not contain strong curvature and thus is reasonable to find the regression line. c) We determine all necessary sums: \(\displaystyle\sum\ {x}_{{{i}}}={6745}\)

\(\displaystyle\sum\ {y}_{{{i}}}={6657}\)

\(\displaystyle\sum\ {x}_{{{i}}}\ {y}_{{{i}}}={499217}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={506871}\)

\(\displaystyle\sum\ {{y}_{{{i}}}^{{{2}}}}={493291}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\ {\quad\text{and}\quad}\ {S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}={506871}\ -\ {\frac{{{6745}^{{{2}}}}}{{{90}}}}={1370.7222}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}\ {y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}\ {\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={499217}\ -\ {\frac{{{6745}\ \cdot\ {6657}}}{{{90}}}}={311.8333}\) The estimate b of the slope \(\displaystyle\beta\ \text{is the ratio of}\ {S}_{{{x}{y}}}\ {\quad\text{and}\quad}\ {S}_{{xx}}:\)

\(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}=\ {\frac{{{311.8333}}}{{{1370.7222}}}}={0.2275}}}}\) The mean is the sum of all values divided by the number of values: \(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{6545}}}{{{90}}}}={74.9444}\)

\(\displaystyle\overline{{{y}}}=\ \sum\ \{{y}_{{{i}}}\rbrace{\left\lbrace{n}\right\rbrace}=\ {\frac{{{6657}}}{{{90}}}}={73.9667}\) The estimate a of the intercept \(\displaystyle\alpha\ \text{is the average of y decresed by the product of the estimate of the slope and the average of x}\)

\(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\overline{{{x}}}={739667}\ -\ {0.2275}\ \cdot\ {74.9444}={56.9171}\) General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}.\ \text{Replace}\ \alpha\ {b}{y}\ {a}={56.9171}\ {\quad\text{and}\quad}\ \beta\ {b}{y}\ {b}={0.2275}\ \text{in the general least-squres equation}\text{in the general least}\ {s}{q}{u}{r}{e}{s}\ {e}{q}{u}{a}{t}{i}{o}{n}:\)

\(\displaystyle\hat{{{y}}}={a}\ +\ {b}{x}={56.9171}\ +\ {0.2275}{x}\)

d) Let us evalute the regression line in part (c) at \(\displaystyle{x}={72}:\)

\(\displaystyle\hat{{{y}}}={56.9171}\ +\ {0.2275}{\left({72}\right)}={73.2968}\) Thus the predicted second-round score of a golfer who got a 72 on the first round is 73.2968.

e) We deteermine all necessary sums: \(\displaystyle\sum\ {x}_{{{i}}}={6745}\)

\(\displaystyle\sum\ {y}_{{{i}}}={6657}\)

\(\displaystyle\sum\ {x}_{{{i}}}\ {y}_{{{i}}}={499217}\)

\(\displaystyle{\sum_{{{i}}}^{{{2}}}}={506871}\)

\(\displaystyle\sum\ {{y}_{{{i}}}^{{{2}}}}={493291}\)

Determine the correlation coefficient: \(\displaystyle{r}=\ {\frac{{\sum\ {x}_{{{i}}}\ {y}_{{{i}}}\ -\ {\left(\sum\ {x}_{{{i}}}\right)}\ \frac{{\sum\ {y}_{{{i}}}}}{{n}}}}{{\sqrt{{{\left[\sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ \frac{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}{{n}}\right]}\ {\left[\sum\ {{y}_{{{i}}}^{{{2}}}}\ -\ \frac{{\left(\sum\ {y}_{{{i}}}\right)}^{{{2}}}}{{n}}\right]}}}}}}\)

\(\displaystyle=\ {\frac{{{499217}\ -\ {\left({6745}\right)}\ \frac{{{6657}}}{{90}}}}{{\sqrt{{{\left[{506871}\ -\ \frac{{6745}^{{{2}}}}{{90}}\right]}\ {\left[{493291}\ -\ \frac{{6657}^{{{2}}}}{{90}}\right]}}}}}}\)

\(\displaystyle\approx\ {0.2816}\) If r is positive, then there is a positive linear relationship. If r is negative, then there is a negative linear relationship. If \(\displaystyle{0}\ {<}\ {\left|{r}\right|}\ {<}\ {0.5},\ \text{then the linear relationship is weak. If}\ {0.5}\ {<}\ {\left|{r}\right|}\ {<}\ {0.8},\ \text{then the linear relationship is moderate. If}\ {0.8}\ {<}\ {\left|{r}\right|}\ {<}\ {1},\ \text{then the linear relationship is strong.}\) We note that the linear correlation coefficient r is larger than 0.5 in absolute value and positive, thus there is a weak positive linear relationship between the variables.

f) There is one data point to the far right of the scatterplot, which lies quite a bit further than all data ponts and thus this point appears to be an outlier. There do not appear to be any influential observations, because there is no single data point near the regression line in the scatterplot that lies far from the other data points in the scatterplot.