# Tuddenham and Snyder obtained the following results for 66 California boys at ages 6 and 18 (the scatter diagram is football-shaped): average height at 6 ≈ 3 feet 10 inches, SD ≈ 1.7 inches average height at 18 ≈ 5 feet 10 inches, SD ≈ 2.5 inches, r≈0.80 a) Find the r.m.s. error for the regression prediction of height at 18 from height at 6. b) Find the r.m.s. error for the regression prediction of height at 6 from height at 18.

Question
Analyzing categorical data
Tuddenham and Snyder obtained the following results for 66 California boys at ages 6 and 18 (the scatter diagram is football-shaped):
average height at 6 ≈ 3 feet 10 inches, SD ≈ 1.7 inches
average height at 18 ≈ 5 feet 10 inches, SD ≈ 2.5 inches, r≈0.80
a) Find the r.m.s. error for the regression prediction of height at 18 from height at 6. b) Find the r.m.s. error for the regression prediction of height at 6 from height at 18.

2021-02-25
$$\sqrt{(1-r^{2})} * SD_{y}$$
a) The r.m.s. error for the regression prediction of height at 18 from height at 6: in this case:
$$\sqrt{1-0.8^2} * 2.5 = \sqrt{1-0.64} * 2.5 = \sqrt{0.36} * 2.5 = 0.6 * 2.5 = 1.5$$
b) The r.m.s. error for the regression prediction of height at 6 from height at 18:
$$\sqrt{1-0.8^2} * 1.7 = \sqrt{1-0.64} * 1.7 = \sqrt{0.36} * 1.7 = 0.6 * 1.7 = 1.02$$

### Relevant Questions

The article “Anodic Fenton Treatment of Treflan MTF” describes a two-factor experiment designed to study the sorption of the herbicide trifluralin. The factors are the initial trifluralin concentration and the $$\displaystyle{F}{e}^{{{2}}}\ :\ {H}_{{{2}}}\ {O}_{{{2}}}$$ delivery ratio. There were three replications for each treatment. The results presented in the following table are consistent with the means and standard deviations reported in the article. $$\displaystyle{b}{e}{g}\in{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}\text{Initial Concentration (M)}&\text{Delivery Ratio}&\text{Sorption (%)}\ {15}&{1}:{0}&{10.90}\quad{8.47}\quad{12.43}\ {15}&{1}:{1}&{3.33}\quad{2.40}\quad{2.67}\ {15}&{1}:{5}&{0.79}\quad{0.76}\quad{0.84}\ {15}&{1}:{10}&{0.54}\quad{0.69}\quad{0.57}\ {40}&{1}:{0}&{6.84}\quad{7.68}\quad{6.79}\ {40}&{1}:{1}&{1.72}\quad{1.55}\quad{1.82}\ {40}&{1}:{5}&{0.68}\quad{0.83}\quad{0.89}\ {40}&{1}:{10}&{0.58}\quad{1.13}\quad{1.28}\ {100}&{1}:{0}&{6.61}\quad{6.66}\quad{7.43}\ {100}&{1}:{1}&{1.25}\quad{1.46}\quad{1.49}\ {100}&{1}:{5}&{1.17}\quad{1.27}\quad{1.16}\ {100}&{1}:{10}&{0.93}&{0.67}&{0.80}\ {e}{n}{d}{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}$$ a) Estimate all main effects and interactions. b) Construct an ANOVA table. You may give ranges for the P-values. c) Is the additive model plausible? Provide the value of the test statistic, its null distribution, and the P-value.
True or False
1.The goal of descriptive statistics is to simplify, summarize, and organize data.
2.A summary value, usually numerical, that describes a sample is called a parameter.
3.A researcher records the average age for a group of 25 preschool children selected to participate in a research study. The average age is an example of a statistic.
4.The median is the most commonly used measure of central tendency.
5.The mode is the best way to measure central tendency for data from a nominal scale of measurement.
6.A distribution of scores and a mean of 55 and a standard deviation of 4. The variance for this distribution is 16.
7.In a distribution with a mean of M = 36 and a standard deviation of SD = 8, a score of 40 would be considered an extreme value.
8.In a distribution with a mean of M = 76 and a standard deviation of SD = 7, a score of 91 would be considered an extreme value.
9.A negative correlation means that as the X values decrease, the Y values also tend to decrease.
10.The goal of a hypothesis test is to demonstrate that the patterns observed in the sample data represent real patterns in the population and are not simply due to chance or sampling error.
For the following exercises, use a graphing utility to create a scatter diagram of the data given in the table. Observe the shape of the scatter diagram to determine whether the data is best described by an exponential, logarithmic, or logistic model. Then use the appropriate regression feature to find an equation that models the data. When necessary, round values to five decimal places.
$$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}{x}&{1}&{2}&{3}&{4}&{5}&{6}&{7}&{8}&{9}&{10}\backslash{h}{l}\in{e}{f{{\left({x}\right)}}}&{409.4}&{260.7}&{170.4}&{110.6}&{74}&{44.7}&{32.4}&{19.5}&{12.7}&{8.1}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
1. Find each of the requested values for a population with a mean of $$? = 40$$, and a standard deviation of $$? = 8$$ A. What is the z-score corresponding to $$X = 52?$$ B. What is the X value corresponding to $$z = - 0.50?$$ C. If all of the scores in the population are transformed into z-scores, what will be the values for the mean and standard deviation for the complete set of z-scores? D. What is the z-score corresponding to a sample mean of $$M=42$$ for a sample of $$n = 4$$ scores? E. What is the z-scores corresponding to a sample mean of $$M= 42$$ for a sample of $$n = 6$$ scores? 2. True or false: a. All normal distributions are symmetrical b. All normal distributions have a mean of 1.0 c. All normal distributions have a standard deviation of 1.0 d. The total area under the curve of all normal distributions is equal to 1 3. Interpret the location, direction, and distance (near or far) of the following zscores: $$a. -2.00 b. 1.25 c. 3.50 d. -0.34$$ 4. You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with $$\mu = 78$$ and $$\sigma = 12$$. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: $$82, 74, 62, 68, 79, 94, 90, 81, 80$$. 5. You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $$12 (\mu = 42, \sigma = 12)$$. You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is$44.50 from tips. Test for a difference between this value and the population mean at the $$\alpha = 0.05$$ level of significance.
Find the mean, median, mode, and range for each data set given.
a. 7, 12, 1, 7, 6, 5, 11
b. 85, 105, 95, 90, 115
c. 10, 14, 16, 16, 8, 9, 11, 12, 3
d. 10, 8, 7, 5, 9, 10, 7
e. 45, 50, 40, 35, 75
f. 15, 11, 11, 16, 16, 9
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of $$\alpha = 0.05$$. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.) Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1]. Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities? $$\begin{matrix} \text{Lemon Imports} & 230 & 265 & 358 & 480 & 530\\ \text{Crashe Fatality Rate} & 15.9 & 15.7 & 15.4 & 15.3 & 14.9\\ \end{matrix}$$
What is the optimal time for a scuba diver to be on the bottom of the ocean? That depends on the depth of the dive. The U.S. Navy has done a lot of research on this topic. The Navy defines the "optimal time" to be the time at each depth for the best balance between length of work period and decompression time after surfacing. Let $$\displaystyle{x}=$$ depth of dive in meters, and let $$\displaystyle{y}=$$ optimal time in hours. A random sample of divers gave the following data.
$$\begin{array}{|c|c|} \hline x & 13.1 & 23.3 & 31.2 & 38.3 & 51.3 &20.5 & 22.7 \\ \hline y & 2.78 & 2.18 & 1.48 & 1.03 & 0.75 & 2.38 & 2.20 \\ \hline \end{array}$$
(a)
Find $$\displaystyleΣ{x},Σ{y},Σ{x}^{2},Σ{y}^{2},Σ{x}{y},{\quad\text{and}\quad}{r}$$. (Round r to three decimal places.)
$$\displaystyleΣ{x}=$$
$$\displaystyleΣ{y}=$$
$$\displaystyleΣ{x}^{2}=$$
$$\displaystyleΣ{y}^{2}=$$
$$\displaystyleΣ{x}{y}=$$
$$\displaystyle{r}=$$
(b)
Use a $$1\%$$ level of significance to test the claim that $$\displaystyle\rho<{0}$$. (Round your answers to two decimal places.)
$$\displaystyle{t}=$$
critical $$\displaystyle{t}=$$
Conclusion
Reject the null hypothesis. There is sufficient evidence that $$\displaystyle\rho<{0}$$.Reject the null hypothesis. There is insufficient evidence that $$\displaystyle\rho<{0}$$.
Fail to reject the null hypothesis. There is sufficient evidence that $$\displaystyle\rho<{0}$$.Fail to reject the null hypothesis. There is insufficient evidence that $$\displaystyle\rho<{0}.$$
(c)
Find $$\displaystyle{S}_{{e}},{a},{\quad\text{and}\quad}{b}$$. (Round your answers to four decimal places.)
$$\displaystyle{S}_{{e}}=$$
$$\displaystyle{a}=$$
$$\displaystyle{b}=$$
At what age do babies learn to crawl? Does it take longer to learn in the winter when babies are often bundled in clothes that restrict their movement? Data were collected from parents who brought their babies into the University of Denver Infant Study Center to participate in one of a number of experiments between 1988 and 1991. Parents reported the birth month and the age at which their child was first able to creep or crawl a distance of 4 feet within 1 minute. The resulting data were grouped by month of birth: January, May, and September: $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{c}\right\rbrace}&{C}{r}{a}{w}{l}\in{g}\ {a}\ge\backslash{h}{l}\in{e}{B}{i}{r}{t}{h}\ {m}{o}{n}{t}{h}&{M}{e}{a}{n}&{S}{t}.{d}{e}{v}.&{n}\backslash{h}{l}\in{e}{J}{a}\nu{a}{r}{y}&{29.84}&{7.08}&{32}\backslash{M}{a}{y}&{28.58}&{8.07}&{27}\backslash{S}{e}{p}{t}{e}{m}{b}{e}{r}&{33.83}&{6.93}&{38}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$ Crawling age is given in weeks. Assume the data represent three independent simple random samples, one from each of the three populations consisting of babies born in that particular month, and that the populations of crawling ages have Normal distributions. A partial ANOVA table is given below. $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{c}\right\rbrace}{S}{o}{u}{r}{c}{e}&{S}{u}{m}\ {o}{f}\ \boxempty{s}&{D}{F}&{M}{e}{a}{n}\ \boxempty\ {F}\backslash{h}{l}\in{e}{G}{r}{o}{u}{p}{s}&{505.26}\backslash{E}{r}{r}{\quad\text{or}\quad}&&&{53.45}\backslash{T}{o}{t}{a}{l}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$ What are the degrees of freedom for the groups term?