# The accompanying data on y = normalized energy \left[ (J/m^2)\right] and x = intraocular pressure (mmHg) appeared in a scatterplot in the article “Evaluating the Risk of Eye Injuries: Intraocular Pressure During High Speed Projectile Impacts” (Current Eye Research, 2012: 43–49), an estimated regression function was superimposed on the plot. x 2761 19764 25713 3980 12782 19008 y 1553 14999 32813 1667 8741 16526 x 19028 14397 9606 3905 25731 y 26770 16526 9868 6640 1220 30730 Here is Minitab output from fitting the simple linear regression model. Does the model appear to specify a useful relationship between the two variables? Predictor Coef SE Coef T P Constant -5090 2257 -2.26 0.048 Pressure 1.2912 0.1347 9.59 0.000 \left[S=3679.36, R-Sq = 90.2%, R-Sq(adj)=89.2%\right].

Question
Scatterplots
The accompanying data on y = normalized energy $$\displaystyle{\left[{\left(\frac{{J}}{{m}^{{2}}}\right)}\right]}$$ and x = intraocular pressure (mmHg) appeared in a scatterplot in the article “Evaluating the Risk of Eye Injuries: Intraocular Pressure During High Speed Projectile Impacts” (Current Eye Research, 2012: 43–49), an estimated regression function was superimposed on the plot.
x 2761 19764 25713 3980 12782 19008 y 1553 14999 32813 1667 8741 16526 x 19028 14397 9606 3905 25731 y 26770 16526 9868 6640 1220 30730
Here is Minitab output from fitting the simple linear regression model. Does the model appear to specify a useful relationship between the two variables?
Predictor Coef SE Coef T P Constant -5090 2257 -2.26 0.048 Pressure 1.2912 0.1347 9.59 0.000

2021-01-14
Step 1
Given:
$$\displaystyle{\left[{n}={12}\right]}$$
Let us assume:
$$\displaystyle{\left[\alpha={0.05}\right]}$$
Given in the output:
$$\displaystyle{\left[{b}_{{1}}={1.2912}\right]}$$
$$\displaystyle{S}{E}_{{{b}_{{{1}}}}}={0.1347}$$
Determine the hypothesis:
$$\displaystyle{H}_{{0}}:\beta_{{1}}={0}$$
$$\displaystyle{H}_{{0}}:\beta_{{1}}\ne{q}{0}$$
Compute the value of the test statistic:
$$\displaystyle{\left[{t}={\frac{{{b}_{{1}}-\beta_{{1}}}}{{{S}{E}_{{{b}_{{{1}}}}}}}}={\frac{{{1.2912}-{0}}}{{{0.1347}}}}\approx{9.59}\right]}$$
The P-value is the probability of obtaining the value of the test statistic, or a value more extreme. The P-value is the number (or interval) in the column title of Table B containing the t-value in the row $$\displaystyle{\left[{d}{f}={n}—{2}={12}—{2}={10}:\right]}$$
$$\displaystyle{\left[{P}{<}{2}\times{0.0005}={0.001}\right]}$$</span>
If the P-value is less than or equal to the significance level, then the null
hypothesis is rejected:
$$\displaystyle{\left[{P}{<}{0.05}\Rightarrow{R}{e}{j}{e}{c}{t}{H}_{{0}}\right]}$$</span>
There is sufficient evidence to support the claim that the slope of the population regression line is not zero, which means that the model appears to specify a useful relationship between the two variables.
Result:
Yes.

### Relevant Questions

Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
According to the article “Modeling and Predicting the Effects of Submerged Arc Weldment Process Parameters on Weldment Characteristics and Shape Profiles” (J. of Engr. Manuf., 2012: 1230–1240), the submerged arc welding (SAW) process is commonly used for joining thick plates and pipes. The heat affected zone (HAZ), a band created within the base metal during welding, was of particular interest to the investigators. Here are observations on depth (mm) of the HAZ both when the current setting was high and when it was lower. PSK\begin{matrix} Non-high & 1.04 & 1.15 & 1.23 & 1.69 & 1.92 & 1.98 & 2.36 & 2.49 & 2.72 & 1.37 & 1.43 & 1.57 & 1.71 & 1.94 & 2.06 & 2.55 & 2.64 & 2.82 \\ High & 1.55 & 2.02 & 2.02 & 2.05 & 2.35 & 2.57 & 2.93 & 2.94 & 2.97 \\ \end{matrix}ZSK c. Does it appear that true average HAZ depth is larger for the higher current condition than for the lower condition? Carry out a test of appropriate hypotheses using a significance level of .01.
Researchers have asked whether there is a relationship between nutrition and cancer, and many studies have shown that there is. In fact, one of the conclusions of a study by B. Reddy et al., “Nutrition and Its Relationship to Cancer” (Advances in Cancer Research, Vol. 32, pp. 237-345), was that “...none of the risk factors for cancer is probably more significant than diet and nutrition.” One dietary factor that has been studied for its relationship with prostate cancer is fat consumption. On the WeissStats CD, you will find data on per capita fat consumption (in grams per day) and prostate cancer death rate (per 100,000 males) for nations of the world. The data were obtained from a graph-adapted from information in the article mentioned-in J. Robbins’s classic book Diet for a New America (Walpole, NH: Stillpoint, 1987, p. 271). For part (d), predict the prostate cancer death rate for a nation with a per capita fat consumption of 92 grams per day. a) Construct and interpret a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation. d) Make the indicated predictions. e) Compute and interpret the correlation coefficient. f) Identify potential outliers and influential observations.
An automobile tire manufacturer collected the data in the table relating tire pressure x​ (in pounds per square​ inch) and mileage​ (in thousands of​ miles). A mathematical model for the data is given by
$$\displaystyle​ f{{\left({x}\right)}}=-{0.554}{x}^{2}+{35.5}{x}-{514}.$$
$$\begin{array}{|c|c|} \hline x & Mileage \\ \hline 28 & 45 \\ \hline 30 & 51\\ \hline 32 & 56\\ \hline 34 & 50\\ \hline 36 & 46\\ \hline \end{array}$$
​(A) Complete the table below.
$$\begin{array}{|c|c|} \hline x & Mileage & f(x) \\ \hline 28 & 45 \\ \hline 30 & 51\\ \hline 32 & 56\\ \hline 34 & 50\\ \hline 36 & 46\\ \hline \end{array}$$
​(Round to one decimal place as​ needed.)
$$A. 20602060xf(x)$$
A coordinate system has a horizontal x-axis labeled from 20 to 60 in increments of 2 and a vertical y-axis labeled from 20 to 60 in increments of 2. Data points are plotted at (28,45), (30,51), (32,56), (34,50), and (36,46). A parabola opens downward and passes through the points (28,45.7), (30,52.4), (32,54.7), (34,52.6), and (36,46.0). All points are approximate.
$$B. 20602060xf(x)$$
Acoordinate system has a horizontal x-axis labeled from 20 to 60 in increments of 2 and a vertical y-axis labeled from 20 to 60 in increments of 2.
Data points are plotted at (43,30), (45,36), (47,41), (49,35), and (51,31). A parabola opens downward and passes through the points (43,30.7), (45,37.4), (47,39.7), (49,37.6), and (51,31). All points are approximate.
$$C. 20602060xf(x)$$
A coordinate system has a horizontal x-axis labeled from 20 to 60 in increments of 2 and a vertical y-axis labeled from 20 to 60 in increments of 2. Data points are plotted at (43,45), (45,51), (47,56), (49,50), and (51,46). A parabola opens downward and passes through the points (43,45.7), (45,52.4), (47,54.7), (49,52.6), and (51,46.0). All points are approximate.
$$D.20602060xf(x)$$
A coordinate system has a horizontal x-axis labeled from 20 to 60 in increments of 2 and a vertical y-axis labeled from 20 to 60 in increments of 2. Data points are plotted at (28,30), (30,36), (32,41), (34,35), and (36,31). A parabola opens downward and passes through the points (28,30.7), (30,37.4), (32,39.7), (34,37.6), and (36,31). All points are approximate.
​(C) Use the modeling function​ f(x) to estimate the mileage for a tire pressure of 29
$$\displaystyle​\frac{{{l}{b}{s}}}{{{s}{q}}}\in.$$ and for 35
$$\displaystyle​\frac{{{l}{b}{s}}}{{{s}{q}}}\in.$$
The mileage for the tire pressure $$\displaystyle{29}\frac{{{l}{b}{s}}}{{{s}{q}}}\in.$$ is
The mileage for the tire pressure $$\displaystyle{35}\frac{{{l}{b}{s}}}{{{s}{q}}}$$ in. is
(Round to two decimal places as​ needed.)
(D) Write a brief description of the relationship between tire pressure and mileage.
A. As tire pressure​ increases, mileage decreases to a minimum at a certain tire​ pressure, then begins to increase.
B. As tire pressure​ increases, mileage decreases.
C. As tire pressure​ increases, mileage increases to a maximum at a certain tire​ pressure, then begins to decrease.
D. As tire pressure​ increases, mileage increases.
The accompanying two-way table was constructed using data in the article “Television Viewing and Physical Fitness in Adults” (Research Quarterly for Exercise and Sport, 1990: 315–320). The author hoped to determine whether time spent watching television is associated with cardiovascular fitness. Subjects were asked about their television-viewing habits and were classified as physically fit if they scored in the excellent or very good category on a step test. We include MINITAB output from a chi-squared analysis. The four TV groups corresponded to different amounts of time per day spent watching TV (0, 1–2, 3–4, or 5 or more hours). The 168 individuals represented in the first column were those judged physically fit. Expected counts appear below observed counts, and MINITAB displays the contribution to $$\displaystyle{x}^{{{2}}}$$ from each cell.
State and test the appropriate hypotheses using $$\displaystyle\alpha={0.05}$$
$$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}&{a}\mp,\ {1}&{a}\mp,\ {2}&{a}\mp,\ {T}{o}{t}{a}{l}\backslash{h}{l}\in{e}{1}&{a}\mp,\ {35}&{a}\mp,\ {147}&{a}\mp,\ {182}\backslash{h}{l}\in{e}&{a}\mp,\ {25.48}&{a}\mp,\ {156.52}&{a}\mp,\backslash{h}{l}\in{e}{2}&{a}\mp,\ {101}&{a}\mp,\ {629}&{a}\mp,\ {730}\backslash{h}{l}\in{e}&{a}\mp,\ {102.20}&{a}\mp,\ {627.80}&{a}\mp,\backslash{h}{l}\in{e}{3}&{a}\mp,\ {28}&{a}\mp,\ {222}&{a}\mp,\ {250}\backslash{h}{l}\in{e}&{a}\mp,\ {35.00}&{a}\mp,\ {215.00}&{a}\mp,\backslash{h}{l}\in{e}{4}&{a}\mp,\ {4}&{a}\mp,\ {34}&{a}\mp,\ {38}\backslash{h}{l}\in{e}&{a}\mp,\ {5.32}&{a}\mp,\ {32.68}&{a}\mp,\backslash{h}{l}\in{e}{T}{o}{t}{a}{l}&{a}\mp,\ {168}&{a}\mp,\ {1032}&{a}\mp,\ {1200}\backslash{h}{l}\in{e}$$
$$\displaystyle{C}{h}{i}{s}{q}={a}\mp,\ {3.557}\ +\ {0.579}\ +\ {a}\mp,\ {0.014}\ +\ {0.002}\ +\ {a}\mp,\ {1.400}\ +\ {0.228}\ +\ {a}\mp,\ {0.328}\ +\ {0.053}={6.161}$$
$$\displaystyle{d}{f}={3}$$
The following data on = soil depth (in centimeters) and y = percentage of montmorillonite in the soil were taken from a scatterplot in the paper "Ancient Maya Drained Field Agriculture: Its Possible Application Today in the New River Floodplain, Belize, C.A." (Agricultural Ecosystems and Environment [1984]: 67-84):
a. Draw a scatterplot of y versus x.
b. The equation of the least-squares line is 0.45x. Draw this line on your scatterplot. Do there appear to be any large residuals?
c. Compute the residuals, and construct a residual plot. Are there any unusual features in the plot?
x 40 50 60 70 80 90 100
y 58 34 32 30 28 27 22
$$\displaystyle{\left[\hat{{{y}}}={64.50}\right]}$$.
Two scatterplots are shown below.
Scatterplot 1
A scatterplot has 14 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 60) up and right to approximately (95, 85).
The points are somewhat scattered.
Scatterplot 2
A scatterplot has 10 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 55) steeply up and right to approximately (70, 90), and then steeply down and right to approximately (85, 60).
The points are somewhat scattered.
Explain why it makes sense to use the least-squares line to summarize the relationship between x and y for one of these data sets but not the other.
Scatterplot 1 seems to show a relationship between x and y, while Scatterplot 2 shows a relationship between the two variables. So it makes sense to use the least squares line to summarize the relationship between x and y for the data set in , but not for the data set in .
Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of $$\alpha = 0.05$$. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.) Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1]. Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities? $$\begin{matrix} \text{Lemon Imports} & 230 & 265 & 358 & 480 & 530\\ \text{Crashe Fatality Rate} & 15.9 & 15.7 & 15.4 & 15.3 & 14.9\\ \end{matrix}$$
The article “Anodic Fenton Treatment of Treflan MTF” describes a two-factor experiment designed to study the sorption of the herbicide trifluralin. The factors are the initial trifluralin concentration and the $$\displaystyle{F}{e}^{{{2}}}\ :\ {H}_{{{2}}}\ {O}_{{{2}}}$$ delivery ratio. There were three replications for each treatment. The results presented in the following table are consistent with the means and standard deviations reported in the article. $$\displaystyle{b}{e}{g}\in{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}\text{Initial Concentration (M)}&\text{Delivery Ratio}&\text{Sorption (%)}\ {15}&{1}:{0}&{10.90}\quad{8.47}\quad{12.43}\ {15}&{1}:{1}&{3.33}\quad{2.40}\quad{2.67}\ {15}&{1}:{5}&{0.79}\quad{0.76}\quad{0.84}\ {15}&{1}:{10}&{0.54}\quad{0.69}\quad{0.57}\ {40}&{1}:{0}&{6.84}\quad{7.68}\quad{6.79}\ {40}&{1}:{1}&{1.72}\quad{1.55}\quad{1.82}\ {40}&{1}:{5}&{0.68}\quad{0.83}\quad{0.89}\ {40}&{1}:{10}&{0.58}\quad{1.13}\quad{1.28}\ {100}&{1}:{0}&{6.61}\quad{6.66}\quad{7.43}\ {100}&{1}:{1}&{1.25}\quad{1.46}\quad{1.49}\ {100}&{1}:{5}&{1.17}\quad{1.27}\quad{1.16}\ {100}&{1}:{10}&{0.93}&{0.67}&{0.80}\ {e}{n}{d}{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}$$ a) Estimate all main effects and interactions. b) Construct an ANOVA table. You may give ranges for the P-values. c) Is the additive model plausible? Provide the value of the test statistic, its null distribution, and the P-value.
(a) (Exponential Model) This model is described by $$\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{e}}}{N}\ {\left({8.86}\right)}.$$ Solve (8.86) with the initial condition N(0) at time 0, and show that $$\displaystyle{r}_{{{e}}}$$ can be estimated from $$\displaystyle{r}_{{{e}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ {\left({8.87}\right)}$$
(b) (Logistic Growth) This model is described by $$\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{l}}}{N}\ {\left({1}\ -\ {\frac{{{N}}}{{{K}}}}\right)}\ {\left({8.88}\right)}$$ where K is the equilibrium value. Solve (8.88) with the initial condition N(0) at time 0, and show that $$\displaystyle{r}_{{{l}}}$$ can be estimated from $$\displaystyle{r}_{{{l}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{K}\ -\ {N}{\left({0}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ +\ {\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{K}\ -\ {N}{\left({t}\right)}}}}\right]}\ {\left({8.89}\right)}$$ for $$\displaystyle{N}{\left({t}\right)}\ {<}\ {K}.$$
(c) Assume that $$\displaystyle{N}{\left({0}\right)}={1}$$ and $$\displaystyle{N}{\left({10}\right)}={1000}.$$ Estimate $$\displaystyle{r}_{{{e}}}$$ and $$\displaystyle{r}_{{{l}}}$$ for both $$\displaystyle{K}={1001}$$ and $$\displaystyle{K}={10000}.$$
(d) Use your answer in (c) to explain the following quote from Stanley (1979): There must be a general tendency for calculated values of $$\displaystyle{\left[{r}\right]}$$ to represent underestimates of exponential rates,because some radiation will have followed distinctly sigmoid paths during the interval evaluated.
(e) Explain why the exponential model is a good approximation to the logistic model when $$\displaystyle\frac{{N}}{{K}}$$ is small compared with 1.