# Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of alpha = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.) Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and M

Question
Modeling data distributions
Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of $$\alpha = 0.05$$. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.) Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1]. Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities? $$\begin{matrix} \text{Lemon Imports} & 230 & 265 & 358 & 480 & 530\\ \text{Crashe Fatality Rate} & 15.9 & 15.7 & 15.4 & 15.3 & 14.9\\ \end{matrix}$$

2020-11-09
Step 1 Note: we are using MINTAB software to perform the calculations. The data shows that the weights of lemon imports from Mexico and U.S. car fatality rates. The level of significance is $$\alpha = 0.05$$. Procedure to obtain scatterplot using the MINITAB software: Choose Graph > Scatterplot. Choose Simple and then click OK. Under Y variables, enter a column of CRASH FERTILITY RATES. Under X variables, enter a column of LEMON IMPORTS. Click OK. Step 2 The hypotheses are given below: Null hypothesis: $$H0:\rho = 0$$ That is, there is no linear correlation between the weights of lemon imports from Mexico and U.S. car fatality rates. Alternative hypothesis: $$H1:\rho cancel= 0$$ That is, there is a linear correlation between the weights of lemon imports from Mexico and U.S. car fatality rates. Correlation coefficient r: Software Procedure: Step-by-step procedure to obtain the ‘correlation coefficient’ using the MINITAB software: Select Stat >Basic Statistics > Correlation. In Variables, select LEMON IMPORTS and CRASH FERTILITY RATES from the box on the left. Click OK. Output using the MINITAB software is given below: Correlations: LEMON IMPORTS, CRASH FATALITY RATE Pearson correlation of LEMON IMPORTS and CRASH FATALITY RATE $$= -0.959$$
$$P-value = 0.010$$ Step 3 Thus, the Pearson correlation of weights of lemon imports from Mexico and U.S. car fatality rates is –0.959 and the P-value is 0.010. Critical value: From the TABLE “Critical Values of the Pearson Correlation Coefficient r”, the critical value for 5 degrees of freedom for $$\alpha = 0.05$$ level of significance is $$\pm 0.878$$. The horizontal axis represents weights of lemon imports from Mexico and vertical axis represents U.S. car fatality rates. From the plot, it is observed that there is a linear association between the weights of lemon imports from Mexico and U.S. car fatality rates because the data point show a distinct pattern. The P-value is 0.010 and the level of significance is 0.05. Here, the P-value is less than the level of significance. Hence, the null hypothesis is rejected. That is, there is a linear correlation between the weights of lemon imports from Mexico and U.S. car fatality rates. The critical value is $$\pm 0.878$$. Here, the correlation value –0.959 lies beyond the lower critical value. Thus, there is sufficient evidence to support the claim that there is a linear correlation between the weights of lemon imports from Mexico and U.S. car fatality rates. There is a linear correlation between the weights of lemon imports from Mexico and U.S. car fatality rates but it does not appear the imported lemons cause car fatalities because, it do not suggest any cause-effect relationship.

### Relevant Questions

A random sample of $$n_1 = 14$$ winter days in Denver gave a sample mean pollution index $$x_1 = 43$$.
Previous studies show that $$\sigma_1 = 19$$.
For Englewood (a suburb of Denver), a random sample of $$n_2 = 12$$ winter days gave a sample mean pollution index of $$x_2 = 37$$.
Previous studies show that $$\sigma_2 = 13$$.
Assume the pollution index is normally distributed in both Englewood and Denver.
(a) State the null and alternate hypotheses.
$$H_0:\mu_1=\mu_2.\mu_1>\mu_2$$
$$H_0:\mu_1<\mu_2.\mu_1=\mu_2$$
$$H_0:\mu_1=\mu_2.\mu_1<\mu_2$$
$$H_0:\mu_1=\mu_2.\mu_1\neq\mu_2$$
(b) What sampling distribution will you use? What assumptions are you making? NKS The Student's t. We assume that both population distributions are approximately normal with known standard deviations.
The standard normal. We assume that both population distributions are approximately normal with unknown standard deviations.
The standard normal. We assume that both population distributions are approximately normal with known standard deviations.
The Student's t. We assume that both population distributions are approximately normal with unknown standard deviations.
(c) What is the value of the sample test statistic? Compute the corresponding z or t value as appropriate.
(Test the difference $$\mu_1 - \mu_2$$. Round your answer to two decimal places.) NKS (d) Find (or estimate) the P-value. (Round your answer to four decimal places.)
(e) Based on your answers in parts (i)−(iii), will you reject or fail to reject the null hypothesis? Are the data statistically significant at level \alpha?
At the $$\alpha = 0.01$$ level, we fail to reject the null hypothesis and conclude the data are not statistically significant.
At the $$\alpha = 0.01$$ level, we reject the null hypothesis and conclude the data are statistically significant.
At the $$\alpha = 0.01$$ level, we fail to reject the null hypothesis and conclude the data are statistically significant.
At the $$\alpha = 0.01$$ level, we reject the null hypothesis and conclude the data are not statistically significant.
(f) Interpret your conclusion in the context of the application.
Reject the null hypothesis, there is insufficient evidence that there is a difference in mean pollution index for Englewood and Denver.
Reject the null hypothesis, there is sufficient evidence that there is a difference in mean pollution index for Englewood and Denver.
Fail to reject the null hypothesis, there is insufficient evidence that there is a difference in mean pollution index for Englewood and Denver.
Fail to reject the null hypothesis, there is sufficient evidence that there is a difference in mean pollution index for Englewood and Denver. (g) Find a 99% confidence interval for
$$\mu_1 - \mu_2$$.
lower limit
upper limit
(h) Explain the meaning of the confidence interval in the context of the problem.
Because the interval contains only positive numbers, this indicates that at the 99% confidence level, the mean population pollution index for Englewood is greater than that of Denver.
Because the interval contains both positive and negative numbers, this indicates that at the 99% confidence level, we can not say that the mean population pollution index for Englewood is different than that of Denver.
Because the interval contains both positive and negative numbers, this indicates that at the 99% confidence level, the mean population pollution index for Englewood is greater than that of Denver.
Because the interval contains only negative numbers, this indicates that at the 99% confidence level, the mean population pollution index for Englewood is less than that of Denver.
1. Find each of the requested values for a population with a mean of $$? = 40$$, and a standard deviation of $$? = 8$$ A. What is the z-score corresponding to $$X = 52?$$ B. What is the X value corresponding to $$z = - 0.50?$$ C. If all of the scores in the population are transformed into z-scores, what will be the values for the mean and standard deviation for the complete set of z-scores? D. What is the z-score corresponding to a sample mean of $$M=42$$ for a sample of $$n = 4$$ scores? E. What is the z-scores corresponding to a sample mean of $$M= 42$$ for a sample of $$n = 6$$ scores? 2. True or false: a. All normal distributions are symmetrical b. All normal distributions have a mean of 1.0 c. All normal distributions have a standard deviation of 1.0 d. The total area under the curve of all normal distributions is equal to 1 3. Interpret the location, direction, and distance (near or far) of the following zscores: $$a. -2.00 b. 1.25 c. 3.50 d. -0.34$$ 4. You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with $$\mu = 78$$ and $$\sigma = 12$$. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: $$82, 74, 62, 68, 79, 94, 90, 81, 80$$. 5. You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $$12 (\mu = 42, \sigma = 12)$$. You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is$44.50 from tips. Test for a difference between this value and the population mean at the $$\alpha = 0.05$$ level of significance.
We will now add support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be restricted to register indirect (i.e., all addresses are simply a value held in a register; no offset or displacement may be added to the register value). For example, the register-memory instruction add x4, x5, (x1) means add the contents of register x5 to the contents of the memory location with address equal to the value in register x1 and put the sum in register x4. Register-register ALU operations are unchanged. The following items apply to the integer RISC pipeline:
a. List a rearranged order of the five traditional stages of the RISC pipeline that will support register-memory operations implemented exclusively by register indirect addressing.
b. Describe what new forwarding paths are needed for the rearranged pipeline by stating the source, destination, and information transferred on each needed new path.
c. For the reordered stages of the RISC pipeline, what new data hazards are created by this addressing mode? Give an instruction sequence illustrating each new hazard.
d. List all of the ways that the RISC pipeline with register-memory ALU operations can have a different instruction count for a given program than the original RISC pipeline. Give a pair of specific instruction sequences, one for the original pipeline and one for the rearranged pipeline, to illustrate each way.
Hint for (d): Give a pair of instruction sequences where the RISC pipeline has “more” instructions than the reg-mem architecture. Also give a pair of instruction sequences where the RISC pipeline has “fewer” instructions than the reg-mem architecture.

Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1]. Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities? $$\begin{array}{|c|c|}Lemon\ imports &230&265&368&480&630\\ Crash\ Fatality\ Rate&159&157&15.3&15.4&14.9\end{array}$$

A new thermostat has been engineered for the frozen food cases in large supermarkets. Both the old and new thermostats hold temperatures at an average of $$25^{\circ}F$$. However, it is hoped that the new thermostat might be more dependable in the sense that it will hold temperatures closer to $$25^{\circ}F$$. One frozen food case was equipped with the new thermostat, and a random sample of 21 temperature readings gave a sample variance of 5.1. Another similar frozen food case was equipped with the old thermostat, and a random sample of 19 temperature readings gave a sample variance of 12.8. Test the claim that the population variance of the old thermostat temperature readings is larger than that for the new thermostat. Use a $$5\%$$ level of significance. How could your test conclusion relate to the question regarding the dependability of the temperature readings? (Let population 1 refer to data from the old thermostat.)
(a) What is the level of significance?
State the null and alternate hypotheses.
$$H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}>?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}\neq?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}?_{2}^{2},H1:?_{1}^{2}=?_{2}^{2}$$
(b) Find the value of the sample F statistic. (Round your answer to two decimal places.)
What are the degrees of freedom?
$$df_{N} = ?$$
$$df_{D} = ?$$
What assumptions are you making about the original distribution?
The populations follow independent normal distributions. We have random samples from each population.The populations follow dependent normal distributions. We have random samples from each population.The populations follow independent normal distributions.The populations follow independent chi-square distributions. We have random samples from each population.
(c) Find or estimate the P-value of the sample test statistic. (Round your answer to four decimal places.)
(d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis?
At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant. At the ? = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.
(e) Interpret your conclusion in the context of the application.
Reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings.Fail to reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings. Fail to reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.Reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.
Use either the critical-value approach or the P-value approach to perform the required hypothesis test. For several years, evidence had been mounting that folic acid reduces major birth defects. A. Czeizel and I. Dudas of the National Institute of Hygiene in Budapest directed a study that provided the strongest evidence to date. Their results were published in the paper “Prevention of the First Occurrence of Neural-Tube Defects by Periconceptional Vitamin Supplementation” (New England Journal of Medicine, Vol. 327(26), p. 1832). For the study, the doctors enrolled women prior to conception and divided them randomly into two groups. One group, consisting of 2701 women, took daily multivitamins containing 0.8 mg of folic acid, the other group, consisting of 2052 women, received only trace elements. Major birth defects occurred in 35 cases when the women took folic acid and in 47 cases when the women did not. a. At the 1% significance level, do the data provide sufficient evidence to conclude that women who take folic acid are at lesser risk of having children with major birth defects? b. Is this study a designed experiment or an observational study? Explain your answer. c. In view of your answers to parts (a) and (b), could you reasonably conclude that taking folic acid causes a reduction in major birth defects? Explain your answer.
Researchers have asked whether there is a relationship between nutrition and cancer, and many studies have shown that there is. In fact, one of the conclusions of a study by B. Reddy et al., “Nutrition and Its Relationship to Cancer” (Advances in Cancer Research, Vol. 32, pp. 237-345), was that “...none of the risk factors for cancer is probably more significant than diet and nutrition.” One dietary factor that has been studied for its relationship with prostate cancer is fat consumption. On the WeissStats CD, you will find data on per capita fat consumption (in grams per day) and prostate cancer death rate (per 100,000 males) for nations of the world. The data were obtained from a graph-adapted from information in the article mentioned-in J. Robbins’s classic book Diet for a New America (Walpole, NH: Stillpoint, 1987, p. 271). For part (d), predict the prostate cancer death rate for a nation with a per capita fat consumption of 92 grams per day. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
Are yields for organic farming different from conventional farming yields? Independent random samples from method A (organic farming) and method B (conventional farming) gave the following information about yield of sweet corn (in tons/acre). $$\text{Method} A: 6.51, 7.02, 6.81, 7.27, 6.73, 6.11, 6.17, 5.88, 6.69, 7.12, 5.74, 6.90.$$
$$\text{Method} B: 7.32, 7.01, 6.66, 6.85, 5.78, 6.48, 5.95, 6.31, 6.50, 5.93, 6.68.$$ Use a 5% level of significance to test the claim that there is no difference between the yield distributions. (a) What is the level of significance? (b) Compute the sample test statistic. (Round your answer to two decimal places.) (c) Find the P-value of the sample test statistic. (Round your answer to four decimal places.)