# In 2014, the Pew Research Centers American Trends Panel sought to better understand what Americans know about science. It was observed that among a random selection of 3278 adults, 2065 adults could correctly interpret a scatterplot. Is this good evidence that more than 60% of Americans are able to correctly interpret scatterplots?

Question
Scatterplots
In 2014, the Pew Research Centers American Trends Panel sought to better understand what Americans know about science. It was observed that among a random selection of 3278 adults, 2065 adults could correctly interpret a scatterplot. Is this good evidence that more than 60% of Americans are able to correctly interpret scatterplots?

2020-12-03
Step 1
Solution:
Let X be the number of adults correctly interpret a scatterplot and n be the sample number of adults.
From the given information, X=2065 and n=3278.
The given claim is that more than 60% of Americans are able to correctly interpret scatterplots.
State the hypotheses.
Null hypothesis:
$$H_0: p<=0.60$$</span>.
That is, the proportion of Americans are able to correctly interpret is not more than 0.60.
Alternative hypothesis:
$$H_a: p>0.60$$
That is, the proportion of Americans are able to correctly interpret is more than 0.60.
Step 3:
The sample proportion is
$$hatp= X/n$$
$$=2065/3278$$
=0.6300
then,the test statistic is
$$z=(hatp-p)/sqrt ((p(1-p))/n)$$
$$=(0.6300-0.60)/sqrt ((0.60(1-0.60))/3278)$$
$$=(sqrt3278(0.03))/sqrt(0.24)$$
=3.51
Step 4
The p value is obtained by using EXCEL
P-alue=p(Z>3.51)
=1-p(z
=1-0.999776[USING THE EXCEL FUSION]
=0.0002
Thus the p value is 0.0002
Step 5
Rejection rule:
If the P-value is less than or equal to 0.05, then reject the null hypothesis.
Conclusion:
Here, the P-value is 0.0002.
This is less than 0.05.
By the rejection rule, reject the null hypothesis.
Thus, there is good evidence that more than 60% of Americans are able to correctly interpret scatterplots.

### Relevant Questions

A new thermostat has been engineered for the frozen food cases in large supermarkets. Both the old and new thermostats hold temperatures at an average of $$25^{\circ}F$$. However, it is hoped that the new thermostat might be more dependable in the sense that it will hold temperatures closer to $$25^{\circ}F$$. One frozen food case was equipped with the new thermostat, and a random sample of 21 temperature readings gave a sample variance of 5.1. Another similar frozen food case was equipped with the old thermostat, and a random sample of 19 temperature readings gave a sample variance of 12.8. Test the claim that the population variance of the old thermostat temperature readings is larger than that for the new thermostat. Use a $$5\%$$ level of significance. How could your test conclusion relate to the question regarding the dependability of the temperature readings? (Let population 1 refer to data from the old thermostat.)
(a) What is the level of significance?
State the null and alternate hypotheses.
$$H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}>?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}\neq?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}?_{2}^{2},H1:?_{1}^{2}=?_{2}^{2}$$
(b) Find the value of the sample F statistic. (Round your answer to two decimal places.)
What are the degrees of freedom?
$$df_{N} = ?$$
$$df_{D} = ?$$
What assumptions are you making about the original distribution?
The populations follow independent normal distributions. We have random samples from each population.The populations follow dependent normal distributions. We have random samples from each population.The populations follow independent normal distributions.The populations follow independent chi-square distributions. We have random samples from each population.
(c) Find or estimate the P-value of the sample test statistic. (Round your answer to four decimal places.)
(d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis?
At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant. At the ? = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.
(e) Interpret your conclusion in the context of the application.
Reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings.Fail to reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings. Fail to reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.Reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.
Researchers have asked whether there is a relationship between nutrition and cancer, and many studies have shown that there is. In fact, one of the conclusions of a study by B. Reddy et al., “Nutrition and Its Relationship to Cancer” (Advances in Cancer Research, Vol. 32, pp. 237-345), was that “...none of the risk factors for cancer is probably more significant than diet and nutrition.” One dietary factor that has been studied for its relationship with prostate cancer is fat consumption. On the WeissStats CD, you will find data on per capita fat consumption (in grams per day) and prostate cancer death rate (per 100,000 males) for nations of the world. The data were obtained from a graph-adapted from information in the article mentioned-in J. Robbins’s classic book Diet for a New America (Walpole, NH: Stillpoint, 1987, p. 271). For part (d), predict the prostate cancer death rate for a nation with a per capita fat consumption of 92 grams per day. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
factor in determining the usefulness of an examination as a measure of demonstrated ability is the amount of spread that occurs in the grades. If the spread or variation of examination scores is very small, it usually means that the examination was either too hard or too easy. However, if the variance of scores is moderately large, then there is a definite difference in scores between "better," "average," and "poorer" students. A group of attorneys in a Midwest state has been given the task of making up this year's bar examination for the state. The examination has 500 total possible points, and from the history of past examinations, it is known that a standard deviation of around 60 points is desirable. Of course, too large or too small a standard deviation is not good. The attorneys want to test their examination to see how good it is. A preliminary version of the examination (with slight modifications to protect the integrity of the real examination) is given to a random sample of 20 newly graduated law students. Their scores give a sample standard deviation of 70 points. Using a 0.01 level of significance, test the claim that the population standard deviation for the new examination is 60 against the claim that the population standard deviation is different from 60.
(a) What is the level of significance?
State the null and alternate hypotheses.
$$H_{0}:\sigma=60,\ H_{1}:\sigma\ <\ 60H_{0}:\sigma\ >\ 60,\ H_{1}:\sigma=60H_{0}:\sigma=60,\ H_{1}:\sigma\ >\ 60H_{0}:\sigma=60,\ H_{1}:\sigma\ \neq\ 60$$
(b) Find the value of the chi-square statistic for the sample. (Round your answer to two decimal places.)
What are the degrees of freedom?
What assumptions are you making about the original distribution?
We assume a binomial population distribution.We assume a exponential population distribution. We assume a normal population distribution.We assume a uniform population distribution.
The accompanying two-way table was constructed using data in the article “Television Viewing and Physical Fitness in Adults” (Research Quarterly for Exercise and Sport, 1990: 315–320). The author hoped to determine whether time spent watching television is associated with cardiovascular fitness. Subjects were asked about their television-viewing habits and were classified as physically fit if they scored in the excellent or very good category on a step test. We include MINITAB output from a chi-squared analysis. The four TV groups corresponded to different amounts of time per day spent watching TV (0, 1–2, 3–4, or 5 or more hours). The 168 individuals represented in the first column were those judged physically fit. Expected counts appear below observed counts, and MINITAB displays the contribution to $$\displaystyle{x}^{{{2}}}$$ from each cell.
State and test the appropriate hypotheses using $$\displaystyle\alpha={0.05}$$
$$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}&{a}\mp,\ {1}&{a}\mp,\ {2}&{a}\mp,\ {T}{o}{t}{a}{l}\backslash{h}{l}\in{e}{1}&{a}\mp,\ {35}&{a}\mp,\ {147}&{a}\mp,\ {182}\backslash{h}{l}\in{e}&{a}\mp,\ {25.48}&{a}\mp,\ {156.52}&{a}\mp,\backslash{h}{l}\in{e}{2}&{a}\mp,\ {101}&{a}\mp,\ {629}&{a}\mp,\ {730}\backslash{h}{l}\in{e}&{a}\mp,\ {102.20}&{a}\mp,\ {627.80}&{a}\mp,\backslash{h}{l}\in{e}{3}&{a}\mp,\ {28}&{a}\mp,\ {222}&{a}\mp,\ {250}\backslash{h}{l}\in{e}&{a}\mp,\ {35.00}&{a}\mp,\ {215.00}&{a}\mp,\backslash{h}{l}\in{e}{4}&{a}\mp,\ {4}&{a}\mp,\ {34}&{a}\mp,\ {38}\backslash{h}{l}\in{e}&{a}\mp,\ {5.32}&{a}\mp,\ {32.68}&{a}\mp,\backslash{h}{l}\in{e}{T}{o}{t}{a}{l}&{a}\mp,\ {168}&{a}\mp,\ {1032}&{a}\mp,\ {1200}\backslash{h}{l}\in{e}$$
$$\displaystyle{C}{h}{i}{s}{q}={a}\mp,\ {3.557}\ +\ {0.579}\ +\ {a}\mp,\ {0.014}\ +\ {0.002}\ +\ {a}\mp,\ {1.400}\ +\ {0.228}\ +\ {a}\mp,\ {0.328}\ +\ {0.053}={6.161}$$
$$\displaystyle{d}{f}={3}$$
Using the health records of ever student at a high school, the school nurse created a scatterplot relating $$\displaystyle{y}=\ \text{height (in centimeters) to}\ {x}=\ \text{age (in years).}$$
$$\displaystyle\text{After verifying that the conditions for the regression model were met, the nurse calculated the equation of the population regression line to be}\ \mu_{{{0}}}={105}\ +\ {4.2}{x}\ \text{with}\ \sigma={7}\ {c}{m}.$$ About what percent of 15-year-old students at this school are taller than 180 cm?
a. In multiple linear regression, we can determine whether we are extrapolating in predicting the value of the response variable for a given set of predictor variable values by determining whether each predictor variable value falls in the range of observed values of that predictor.
b. Irregularly shaped regions of the values of predictor variables are easy to detect with two-dimensional scatterplots of pairs of predictor variables, and thus it is easy to determine whether we are extrapolating when predicting the response variable.
A survey of 4826 randomly selected young adults (aged 19 to 25 ) asked, "What do you think are the chances you will have much more than a middle-class income at age 30?" The two-way table summarizes the responses. PSK\begin{array} {lc} & \text{Gender} \ \text {Opinion} & \begin{array}{l|c|c|c} & Female & Male & Total \\ \hline Almost no chance & 96 & 98 & 194 \\ \hline \begin{array}{l} Some chance but \\ robably not \end{array} & 426 & 286 & 712 \\ \hline A 50-50 chance & 696 & 720 & 1416 \\ \hline A good chance & 663 & 758 & 1421 \\ \hline Almost certain & 486 & 597 & 1083 \\ \hline Total & 2367 & 2459 & 4826 \end{array}\ \end{array}ZSK Choose a survey respondent at random. Define events G: a good chance, M: male, and N: almost no chance. Find P(G | M). Interpret this value in context.
A survey of 4826 randomly selected young adults (aged 19 to 25) asked, "What do you think are the chances you will have much more than a middle-class income at age 30?" The two-way table summarizes the responses.
$$\begin{array}{c|cc|c} &\text { Female } & \text { Male } & \text { Total } \\ \hline \text { Almost no chance } & 96 & 98 & 194 \\ \hline \text { Some chance but } \ \text { probably not } & 426 & 286 & 712 \\\hline \text { A 50-50 chance } & 696 & 720 & 1416 \\ \hline \text { A good chance } & 663 & 758 & 1421 \\ \hline \text { Almost certain } & 486 & 597 & 1083 \\ \hline \text { Total } & 2367 & 2459 & 4826 \end{array}$$
Choose a survey respondent at random. Define events G: a good chance, M: male, and N: almost no chance. Find P(C∣M). Interpret this value in context.