Step 1: Female \(\displaystyle{n}=\ \text{Sample size}\ ={30}\)

a) Health GDP is on the horizontal axis and Life expectancy is on the vertical axis.

b) It is reasonable to find a regression lien for the data if there is no strong curvature present in the scatterplot. We note that there is no strong curvature in the scatterplot of part (a) and thus it is reasonable to find a regression line for the data.

c) Let us first determine the necessary sums:

\(\displaystyle\sum\ {x}_{{{i}}}={269.1}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2514.89}\)

\(\displaystyle\sum\ {y}_{{{i}}}={2441.7}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={21942.55}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2514.89}\ -\ {\frac{{{269.1}^{{{2}}}}}{{{30}}}}={101.063}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={21942.55}\ -\ {\frac{{{269.1}\ \cdot\ {2441.7}}}{{{30}}}}={167.087}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{167.087}}}{{{101.063}}}}={0.4008}\)

The mean is the sum of all values divided by the number of values: \(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{269.1}}}{{{30}}}}={8.97}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2441.7}}}{{{30}}}}={81.39}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={81.39}\ -\ {0.4008}\ \cdot\ {8.97}={77.7953}\) General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\).

Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={77.7953}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={0.4008}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={77.7953}\ +\ {0.4008}{x}\)

d) There appear to be an outlier to the far right in the graph, because the point lies much far too the right than all other points in the scatterplot. The outlier appears to be an influential observation as well, because the regression line doesn't follow the general pattern in points excluding the outlier.

e) Let us first determine the necessary sums: \(\displaystyle\sum\ {x}_{{{i}}}={253.8}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2280.8}\)

\(\displaystyle\sum\ {y}_{{{i}}}={2361.3}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={20712.43}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2280.8}\ -\ {\frac{{{253.8}^{{{2}}}}}{{{30}}}}={59.6124}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={20712.43}\ -\ {\frac{{{253.8}\ \cdot\ {2361.3}}}{{{30}}}}={46.9838}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{46.9838}}}{{{49.6124}}}}={0.7882}\)

The mean is the sum of all values divided by the number of values: \(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{253.8}}}{{{30}}}}={8.7517}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2361.3}}}{{{30}}}}={81.4241}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={81.4241}\ -\ {0.7882}\ \cdot\ {8.7517}={74.5264}\)

General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\). Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={74.5264}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={0.7882}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={74.5264}\ +\ {0.7882}{x}\)

We note that the slope increased strongly when the outlier was removed from the data set.

f) Let us first determine the necessary sums:

\(\displaystyle\sum\ {x}_{{{i}}}={253.8}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2280.8}\)

\(\displaystyle\sum\ {y}_{{{i}}}={2361.3}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={20712.43}\)

Next, we can determine \(\displaystyle{S}_{{x x}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2280.8}\ -\ {\frac{{{253.8}^{{{2}}}}}{{{30}}}}={59.6124}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={20712.43}\ -\ {\frac{{{253.8}\ \cdot\ {2361.3}}}{{{30}}}}={46.9838}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{46.9838}}}{{{49.6124}}}}={0.7882}\)

The mean is the sum of all values divided by the number of values: \(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{253.8}}}{{{30}}}}={8.7517}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2361.3}}}{{{30}}}}={81.4241}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={81.4241}\ -\ {0.7882}\ \cdot\ {8.7517}={74.5264}\)

General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\). Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={74.5264}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={0.7882}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={74.5264}\ +\ {0.7882}{x}\)

We note that the slope increased strongly when the outlier was removed from the data set.

Step 2: Male \(\displaystyle{n}=\ \text{Sample size}\ ={30}\)

a) Health GDP is on the horizontal axis and Life expectancy is on the vertical axis.

b) It is reasonable to find a regression lien for the data if there is no strong curvature present in the scatterplot. We note that there is no strong curvature in the scatterplot of part (a) and thus it is reasonable to find a regression line for the data.

c) Let us first determine the necessary sums:

\(\displaystyle\sum\ {x}_{{{i}}}={269.1}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2514.89}\)

\(\displaystyle\sum\ {y}_{{{i}}}={20439.69}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={21942.55}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2514.89}\ -\ {\frac{{{269.1}^{{{2}}}}}{{{30}}}}={101.063}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={21942.55}\ -\ {\frac{{{269.1}\ \cdot\ {20439.69}}}{{{30}}}}={61.644}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{61.644}}}{{{101.063}}}}={0.61}\)

The mean is the sum of all values divided by the number of values:

\(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{269.1}}}{{{30}}}}={8.97}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{20439.69}}}{{{30}}}}={75.7267}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={75.7267}\ -\ {0.61}\ \cdot\ {8.97}={70.2554}\)

General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\). Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={70.2554}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={0.61}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={70.2554}\ +\ {0.61}{x}\)

d) There appear to be an outlier to the far right in the graph, because the point lies much far too the right than all other points in the scatterplot. The outlier appears to be an influential observation as well, because the regression line doesn't follow the general pattern in points excluding the outlier.

e) Let us first determine the necessary sums:

\(\displaystyle\sum\ {x}_{{{i}}}={253.8}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2280.8}\)

\(\displaystyle\sum\ {y}_{{{i}}}={2196.6}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={19289.13}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2280.8}\ -\ {\frac{{{253.8}^{{{2}}}}}{{{30}}}}={59.6124}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={19289.13}\ -\ {\frac{{{253.8}\ \cdot\ {2196.6}}}{{{30}}}}={65.0928}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{65.0928}}}{{{59.6124}}}}={1.0919}\)

The mean is the sum of all values divided by the number of values:

\(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{253.8}}}{{{30}}}}={8.7517}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2196.6}}}{{{30}}}}={75.7448}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={75.7448}\ -\ {1.0919}\ \cdot\ {8.7517}={66.1885}\)

General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\). Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={66.1885}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={1.0919}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={66.1885}\ +\ {1.0919}{x}\)

We note that the slope increased strongly when the outlier was removed from the data set.

f) Let us first determine the necessary sums:

\(\displaystyle\sum\ {x}_{{{i}}}={253.8}\)

\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={2280.8}\)

\(\displaystyle\sum\ {y}_{{{i}}}={2196.6}\)

\(\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={19289.13}\)

Next, we can determine \(\displaystyle{S}_{{xx}}\) and \(\displaystyle{S}_{{{x}{y}}}\)

\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={2280.8}\ -\ {\frac{{{253.8}^{{{2}}}}}{{{30}}}}={59.6124}\)

\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={19289.13}\ -\ {\frac{{{253.8}\ \cdot\ {2196.6}}}{{{30}}}}={65.0928}\)

The estimate b of the slope \(\displaystyle\beta\) is the ratio of \(\displaystyle{S}_{{{x}{y}}}\) and \(\displaystyle{S}_{{xx}}\): \(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{65.0928}}}{{{59.6124}}}}={1.0919}\)

The mean is the sum of all values divided by the number of values:

\(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{253.8}}}{{{30}}}}={8.7517}\)

\(\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2196.6}}}{{{30}}}}={75.7448}\)

The estimate a of the intercept \(\displaystyle\alpha\) is the average of y decreased by the product of the estimate of the slope and the average of x. \(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={75.7448}\ -\ {1.0919}\ \cdot\ {8.7517}={66.1885}\)

General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\). Replace \(\displaystyle\alpha\) by \(\displaystyle{a}={66.1885}\) and \(\displaystyle\beta\) by \(\displaystyle{b}={1.0919}\) in the general least-squares equation: \(\displaystyle{y}={a}\ +\ {b}{x}={66.1885}\ +\ {1.0919}{x}\)

We note that the slope increased strongly when the outlier was removed from the data set.