Use the technology of your choice to do the following tasks. From the International Data Base, published by the U.S. Census Bureau, we obtained data on infant mortality rate (IMR) and life expectancy (LE), in years, for a sample of 60 countries. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.

Use the technology of your choice to do the following tasks. From the International Data Base, published by the U.S. Census Bureau, we obtained data on infant mortality rate (IMR) and life expectancy (LE), in years, for a sample of 60 countries. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.

Question
Scatterplots
asked 2021-02-16
Use the technology of your choice to do the following tasks. From the International Data Base, published by the U.S. Census Bureau, we obtained data on infant mortality rate (IMR) and life expectancy (LE), in years, for a sample of 60 countries. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.

Answers (1)

2021-02-17

Given: \(\displaystyle{n}={S}{a}\mp\le\ {s}{i}{z}{e}={60}\) a) IMR is on the horizontal axis and LE is on the vertical axis. image b) When there is no strong curvature presents in the scatterplot, then it is safe to assume that there is a linear relationship between the variables and thus it is then reasonable to find a regression line. We note that the scatterplot of part (a) does not contain strong curvature and thus is reasonable to find the regression line. c) We determine all necessary sums: \(\displaystyle\sum\ {x}_{{{i}}}={1743.1}\)
\(\displaystyle\sum\ {y}_{{{i}}}={4147.6}\)
\(\displaystyle\sum\ {x}_{{{i}}}\ {y}_{{{i}}}={106485.62}\)
\(\displaystyle\sum\ {{x}_{{{i}}}^{{{2}}}}={90242.13}\)
\(\displaystyle\sum\ {{y}_{{{i}}}^{{{2}}}}={293216.68}\) Next, we can determine \(\displaystyle{S}_{{xx}}\ {\quad\text{and}\quad}\ {S}_{{{x}{y}}}\)
\(\displaystyle{S}_{{xx}}=\ \sum\ {{x}_{{{i}}}^{{{2}}}}={90242.13}\ -\ {\frac{{{1743.1}^{{{2}}}}}{{{60}}}}={39602.1698}\)
\(\displaystyle{S}_{{{x}{y}}}=\ \sum\ {x}_{{{i}}}\ {y}_{{{i}}}\ -\ {\frac{{{\left(\sum\ {x}_{{{i}}}\right)}\ {\left(\sum\ {y}_{{{i}}}\right)}}}{{{n}}}}={106485.62}\ -\ {\frac{{{1743.1}\ \cdot\ {4147.6}}}{{{60}}}}=\ -{14009.0727}\) The estimate b of the slope \(\displaystyle\beta\ \text{is the ratio of}\ {S}_{{{x}{y}}}\ {\quad\text{and}\quad}\ {S}_{{xx}}:\)
\(\displaystyle{b}=\ {\frac{{{S}_{{{x}{y}}}}}{{{S}_{{xx}}=\ {\frac{{-{14009.0727}}}{{{39602.1698}}}}=\ -{0.3537}}}}\) The mean is the sum of all values divided by the number of values: \(\displaystyle\overline{{{x}}}=\ {\frac{{\sum\ {x}_{{{i}}}}}{{{n}}}}=\ {\frac{{{1743.1}}}{{{60}}}}={29.0517}\)
\(\displaystyle\overline{{{y}}}=\ \sum\ {y}_{{{i}}}\rbrace{\left\lbrace{n}\right\rbrace}=\ {\frac{{{4147.6}}}{{{60}}}}={69.1267}\) The estimate a of the intercept \(\displaystyle\alpha\ \text{is the average of y decresed by the product of the estimate of the slope and the average of x}\)
\(\displaystyle{a}=\ \overline{{{y}}}\ -\ {b}\overline{{{x}}}={69.1267}\ -\ {\left(-{0.3537}\right)}\ \cdot\ {29.0517}={79.4036}\) General least-squares equation: \(\displaystyle\hat{{{y}}}=\ \alpha\ +\ \beta\ {x}.\ \text{Replace}\ \alpha\ {b}{y}\ {a}={79.4036}\ {\quad\text{and}\quad}\ \beta\ {b}{y}\ {b}=\ -{0.3537}\ \text{in the general least-squres equation}\)
\(\displaystyle\hat{{{y}}}={a}\ +\ {b}{x}={79.4036}\ -\ {0.3537}{x}\)
image d) Let us evalute the regression line in part (c) at \(\displaystyle{x}={30}:\)
\(\displaystyle\hat{{{y}}}={79.4036}\ -\ {0.3537}{\left({30}\right)}={68.7926}\) Thus the predicted life expectancy of a country with an IMA of 30 is 68.7926 years. e) We deteermine all necessary sums: \(\displaystyle\sum\ {x}_{{{i}}}={1743.1}\)
\(\displaystyle\sum\ {y}_{{{i}}}={4147.6}\)
\(\displaystyle\sum\ {x}_{{{i}}}\ {y}_{{{i}}}={106485.62}\)
\(\displaystyle{\sum_{{{i}}}^{{{2}}}}={90242.13}\)
\(\displaystyle\sum\ {{y}_{{{i}}}^{{{2}}}}={293216.68}\) Determine the correlation coefficient: \(\displaystyle{r}=\ {\frac{{\sum\ {x}_{{{i}}}\ {y}_{{{i}}}\ -\ {\left(\sum\ {x}_{{{i}}}\right)}\ \frac{{\sum\ {y}_{{{i}}}}}{{n}}}}{{\sqrt{{{\left[\sum\ {{x}_{{{i}}}^{{{2}}}}\ -\ \frac{{\left(\sum\ {x}_{{{i}}}\right)}^{{{2}}}}{{n}}\right]}\ {\left[\sum\ {{y}_{{{i}}}^{{{2}}}}\ -\ \frac{{\left(\sum\ {y}_{{{i}}}\right)}^{{{2}}}}{{n}}\right]}}}}}}\)
\(\displaystyle=\ {\frac{{{106485.62}\ -\ {\left({1743.1}\right)}\ \frac{{{4147.6}}}{{60}}}}{{\sqrt{{{\left[{90242.13}\ -\ \frac{{1743.1}^{{{2}}}}{{60}}\right]}\ {\left[{293216.68}\ -\ \frac{{4147.6}^{{{2}}}}{{60}}\right]}}}}}}\)
\(\displaystyle\approx\ -{0.8727}\) If r is positive, then there is a positive linear relationship. If r is negative, then there is a negative linear relationship. If \(\displaystyle{0}\ {<}\ {\left|{r}\right|}\ {<}\ {0.5},\ \text{then the linear relationship is weak. If}\ {0.5}\ {<}\ {\left|{r}\right|}\ {<}\ {0.8},\ \text{then the linear relationship is moderate. If}\ {0.8}\ {<}\ {\left|{r}\right|}\ {<}\ {1},\ \text{then the linear relationship is strong.}\)

We note that the linear correlation coefficient r is larger than 0.8 in absolute value and negative, thus there is a strong negative linear relationship between the variables. f) (45.6, 49.9) is an outlier, because this data point lies further from the regression line than all other points in the scatterplot. There do not appear to be any influential observations, because there is no single data point near the regression line in the scatterplot that lies far from the other data points in the scatterplot.

0

Relevant Questions

asked 2021-02-22
Use the technology of your choice to do the following tasks. In the article “Statistical Fallacies in Sports” (Chance, Vol. 19, No. 4, pp. 50-56), S. Berry discussed, among other things, the relation between scores for the first and second rounds of the 2006 Masters golf tournament. You will find those scores on the WeissStats CD. For part (d), predict the secondround score of a golfer who got a 72 on the first round. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)–(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
asked 2020-12-30
Use the technology of your choice to do the following tasks. The National Oceanic and Atmospheric Administration publishes temperature and precipitation information for cities around the world in Climates of the World. Data on average high temperature (in degrees Fahrenheit) in July and average precipitation (in inches) in July for 48 cities are on the WeissStats CD. For part (d), predict the average July precipitation of a city with an average July temperature of \(\displaystyle{83}^{{\circ}}{F}\) a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
asked 2021-02-25
Researchers have asked whether there is a relationship between nutrition and cancer, and many studies have shown that there is. In fact, one of the conclusions of a study by B. Reddy et al., “Nutrition and Its Relationship to Cancer” (Advances in Cancer Research, Vol. 32, pp. 237-345), was that “...none of the risk factors for cancer is probably more significant than diet and nutrition.” One dietary factor that has been studied for its relationship with prostate cancer is fat consumption. On the WeissStats CD, you will find data on per capita fat consumption (in grams per day) and prostate cancer death rate (per 100,000 males) for nations of the world. The data were obtained from a graph-adapted from information in the article mentioned-in J. Robbins’s classic book Diet for a New America (Walpole, NH: Stillpoint, 1987, p. 271). For part (d), predict the prostate cancer death rate for a nation with a per capita fat consumption of 92 grams per day. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
asked 2020-11-03
Does a higher state per capita income equate to a higher per capita beer consumption? From the document Survey of Current Business, published by the U.S. Bureau of Economic Analysis, and from the Brewer’s Almanac, published by the Beer Institute, we obtained data on personal income per capita, in thousands of dollars, and per capita beer consumption, in gallons, for the 50 states and Washington, D.C. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2020-11-09
Box Office Mojo collects and posts data on movie grosses. For a random sample of 50 movies, we obtained both the domestic (U.S.) and overseas grosses, in millions of dollars. a) Obtain a scatterplot for the data. b) Decide whether finding a regressimz line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2021-02-21
How important are birdies (a score of one under par on a given golf hole) in determining the final total score of a woman golfer? From the U.S. Women’s OpenWeb site, we obtained data on number of birdies during a tournament and final score for 63 women golfers. The data are presented on the WeissStats CD. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2021-02-24
The document Arizona Residential Property Valuation System, published by the Arizona Department of Revenue, describes how county assessors use computerized systems to value single-family residential properties for property tax purposes. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2021-02-09
Polychlorinated biphenyls (PCBs), industrial pollutants, are known to be carcinogens and a great danger to natural ecosystems. As a result of several studies, PCB production was banned in the United States in 1979 and by the Stockholm Convention on Persistent Organic Pollutants in 2001: One study, published in 1972 by R. Risebrough, is titled “Effects of Environmental Pollutants Upon Animals Other Than Man”. In that study, 50 Anacapa pelican eggs were collected and measured for their shell thickness, in millimetres (mm), and concentration of PCBs, in parts per million (ppm). a) Obtain a scatterplot for the data. b) Decide whether finding a regressimz line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2021-01-30
Polychlorinated biphenyls (PCBs), industrial pollutants, are known to be a great danger to natural ecosystems. In a study by R. W. Risebrough titled “Effects of Environmental Pollutants Upon Animals Other Than Man” (Proceedings of the 6th Berkeley Symposium on Mathematics and Statistics, VI, University of California Press, pp. 443-463), 60 Anacapa pelican eggs were collected and measured for their shell thickness, in millimeters (mm), and concentration of PCBs, in parts per million (ppm). a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)–(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
asked 2020-12-25
The magazine Consumer Reports publishes information on automobile gas mileage and variables that affect gas mileage. In one issue, data on gas mileage (in miles per gallon) and engine displacement (in liters) were published for 121 vehicles. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
...