Question

An issue of BARRON’S presented information on top wealth managers in the United States, based on individual clients with accounts of $1 million or mor Scatterplots ANSWERED asked 2020-10-21 An issue of BARRON’S presented information on top wealth managers in the United States, based on individual clients with accounts of$1 million or more. Data were given for various variables, two of which were number of private client managers and private client assets. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)–(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.

2020-10-22

Given: $$\displaystyle{n}=\ \text{Sample size}\ ={38}$$ a) Managers is on the horizontal axis and Assets is on the vertical axis. b) It is reasonable to find a regression lien for the data if there is no strong curvature present in the scatterplot. We note that there is no strong curvature in the scatterplot of part (a) and thus it is reasonable to find a regression line for the data. c) Let us first determine the necessary sums: $$\displaystyle\sum\ {x}_{{{i}}}={31386}$$
$$\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={174931090}$$
$$\displaystyle\sum\ {y}_{{{i}}}={3103.9}$$
$$\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={8095211}$$ Next, we can determine $$\displaystyle{S}_{{xx}}\ \text{and}\ {S}_{{{x}{y}}}$$
$$\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={174931090}\ -\ {\frac{{{31386}^{{{2}}}}}{{{38}}}}={149007905.8947}$$
$$\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={8095211}\ -\ {\frac{{{31386}\ \cdot\ {3103.9}}}{{{38}}}}={5531552.9632}$$ The estimate b of the slope $$\displaystyle\beta\ \text{is the ratio of}\ {S}_{{{x}{y}}}\ \text{and}\ {S}_{{xx}}$$:
$$\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{5531552.9632}}}{{{149007905.8947}}}}={0.0371}$$ The mean is the sum of all values divided by the number of values: $$\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{31386}}}{{{38}}}}={825.9474}$$
$$\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{3103.9}}}{{{38}}}}={81.6816}$$ The estimate a of the intercept $$\displaystyle\alpha$$ is the average of y decreased by the product of the estimate of the slope and the average of x. $$\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={81.6816}\ -\ {0.0371}\ \cdot\ {825.9474}={51.0203}$$ General least-squares equation: $$\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\ \text{Replace}\ \alpha\ \text{by}\ {a}={51.0203}\ \text{and}\ \beta\ \text{by}\ {b}={0.0371}$$ in the general least-squares equation: $$\displaystyle{y}={a}\ +\ {b}{x}={51.0203}\ +\ {0.0371}{x}$$ d) The right most point appears to be an influential outlier, because it lies extremely close to the regression line and lies far from the other points in the data set. The point at the top of the scatterplot appears to be an outlier, because it lies far from the data as well. There appears to be a few other outliers as well, but we will only use these two points in parts (e) and (f). e) Let us first determine the necessary sums: $$\displaystyle\sum\ {x}_{{{i}}}={29498}$$
$$\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={171366546}$$
$$\displaystyle\sum\ {y}_{{{i}}}={2473.9}$$
$$\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={6905771}$$ Next, we can determine $$\displaystyle{S}_{{xx}}\ \text{and}\ {S}_{{{x}{y}}}$$
$$\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={171366546}\ -\ {\frac{{{29498}^{{{2}}}}}{{{37}}}}={147849464.8108}$$
$$\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={6905771}\ -\ {\frac{{{29498}\ \cdot\ {2473.9}}}{{{37}}}}={4933470.9405}$$ The estimate b of the slope $$\displaystyle\beta\ \text{is the ratio of}\ {S}_{{{x}{y}}}\ \text{and}\ {S}_{{xx}}$$:
$$\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{4933470.9405}}}{{{147849464.108}}}}={0.0334}$$ The mean is the sum of all values divided by the number of values: $$\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{29498}}}{{{37}}}}={797.2432}$$
$$\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2473.9}}}{{{37}}}}={66.8622}$$ The estimate a of the intercept $$\displaystyle\alpha$$ is the average of y decreased by the product of the estimate of the slope and the average of x. $$\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={66.8622}\ -\ {0.0334}\ \cdot\ {797.2432}={40.2596}$$ General least-squares equation: $$\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\ \text{Replace}\ \alpha\ \text{by}\ {a}={40.2596}\ \text{and}\ \beta\ \text{by}\ {b}={0.0334}$$ in the general least-squares equation: $$\displaystyle{y}={a}\ +\ {b}{x}={40.2596}\ +\ {0.0334}{x}$$ We note that the regression line is a bit less steep than the regression line in part (c) and thus the potential outlier makes the regression line a bit steeper. f) Let us first determine the necessary sums: $$\displaystyle\sum\ {x}_{{{i}}}={21228}$$
$$\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={71746126}$$
$$\displaystyle\sum\ {y}_{{{i}}}={2671.9}$$
$$\displaystyle\sum\ {x}_{{{i}}}{y}_{{{i}}}={3706955}$$ Next, we can determine $$\displaystyle{S}_{{xx}}\ \text{and}\ {S}_{{{x}{y}}}$$
$$\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}}{{{n}}}}={7174626}\ -\ {\frac{{{21228}^{{{2}}}}}{{{37}}}}={59566991.2973}$$
$$\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}{y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}{\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={3706955}\ -\ {\frac{{{21228}\ \cdot\ {2671.9}}}{{{37}}}}={2174006.5351}$$ The estimate b of the slope $$\displaystyle\beta\ \text{is the ratio of}\ {S}_{{{x}{y}}}\ \text{and}\ {S}_{{xx}}$$:
$$\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}}}}=\ {\frac{{{2174006.5351}}}{{{59566991.2973}}}}={0.0365}$$ The mean is the sum of all values divided by the number of values: $$\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{21228}}}{{{37}}}}={573.7297}$$
$$\displaystyle\overline{{{y}}}=\ {\frac{{\sum\ {y}_{{{i}}}}}{{{n}}}}=\ {\frac{{{2671.9}}}{{{37}}}}={72.2135}$$ The estimate a of the intercept $$\displaystyle\alpha$$ is the average of y decreased by the product of the estimate of the slope and the average of x. $$\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\ \overline{{{x}}}={72.2135}\ -\ {0.0365}\ \cdot\ {573.7297}={51.2742}$$ General least-squares equation: $$\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}\ \text{Replace}\ \alpha\ \text{by}\ {a}={51.2742}\ \text{and}\ \beta\ \text{by}\ {b}={0.0365}$$ in the general least-squares equation: $$\displaystyle{y}={a}\ +\ {b}{x}={51.2742}\ +\ {0.0365}{x}$$ We note that the regression line is nearly identical to the regression line in part (c) and thus the potential influential observation does not influens the regression line much.