a) Population regression line
b) Equal standart deviation
c) Normal populations
d) Normal populations

Question

asked 2020-12-25

With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.

After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.

1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?

2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?

3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?

4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?

5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?

1

6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?

7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?

8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?

9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?

10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?

asked 2021-02-27

The manager of the store in the preceding exercise calculated the residual for each point in the scatterplot and made a dotplot of the residuals.

The distribution of residuals is roughly Normal with a mean of $0 and standard deviation of $22.92.

The middle 95% of residuals should be between which two values? Use this information to give an interval of plausible values for the weekly sales revenue if 5 linear feet are allocated to the store's brand of men's grooming products.

The distribution of residuals is roughly Normal with a mean of $0 and standard deviation of $22.92.

The middle 95% of residuals should be between which two values? Use this information to give an interval of plausible values for the weekly sales revenue if 5 linear feet are allocated to the store's brand of men's grooming products.

asked 2020-10-27

Answer true or false to the following statements and explain your answers.

a. In multiple linear regression, we can determine whether we are extrapolating in predicting the value of the response variable for a given set of predictor variable values by determining whether each predictor variable value falls in the range of observed values of that predictor.

b. Irregularly shaped regions of the values of predictor variables are easy to detect with two-dimensional scatterplots of pairs of predictor variables, and thus it is easy to determine whether we are extrapolating when predicting the response variable.

a. In multiple linear regression, we can determine whether we are extrapolating in predicting the value of the response variable for a given set of predictor variable values by determining whether each predictor variable value falls in the range of observed values of that predictor.

b. Irregularly shaped regions of the values of predictor variables are easy to detect with two-dimensional scatterplots of pairs of predictor variables, and thus it is easy to determine whether we are extrapolating when predicting the response variable.

asked 2020-10-18

Which of the following stetements bset descibes correlation analysis in a simple linear regression

a. Correlation analysis measures the strenght of relationship between two categorical variables.

b. Correlation analysis measures the direction of relationship between two numerical variables.

c. Correlation analysis measures the strenght and direction of relationship between two numerical variables.

d. Correlation analysis measures the strenght of relationship between two numerical variables."

a. Correlation analysis measures the strenght of relationship between two categorical variables.

b. Correlation analysis measures the direction of relationship between two numerical variables.

c. Correlation analysis measures the strenght and direction of relationship between two numerical variables.

d. Correlation analysis measures the strenght of relationship between two numerical variables."

asked 2020-10-23

1. Find each of the requested values for a population with a mean of \(? = 40\), and a
standard deviation of \(? = 8\)
A. What is the z-score corresponding to \(X = 52?\)
B. What is the X value corresponding to \(z = - 0.50?\)
C. If all of the scores in the population are transformed into z-scores, what will be the values for the mean and standard deviation for the complete set of z-scores?
D. What is the z-score corresponding to a sample mean of \(M=42\) for a sample of \(n = 4\) scores?
E. What is the z-scores corresponding to a sample mean of \(M= 42\) for a sample of \(n = 6\) scores?
2. True or false:
a. All normal distributions are symmetrical
b. All normal distributions have a mean of 1.0
c. All normal distributions have a standard deviation of 1.0
d. The total area under the curve of all normal distributions is equal to 1
3. Interpret the location, direction, and distance (near or far) of the following zscores: \(a. -2.00 b. 1.25 c. 3.50 d. -0.34\)
4. You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with \(\mu = 78\) and \(\sigma = 12\). Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: \(82, 74, 62, 68, 79, 94, 90, 81, 80\).
5. You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about \($12 (\mu = 42, \sigma = 12)\). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the \(\alpha = 0.05\) level of significance.

asked 2020-10-21

An issue of BARRON’S presented information on top wealth managers in the United States, based on individual clients with accounts of $1 million or more. Data were given for various variables, two of which were number of private client managers and private client assets.
a) Obtain a scatterplot for the data.
b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)–(f).
c) Determine and interpret the regression equation for the data.
d) Identify potential outliers and influential observations.
e) In case a potential outlier is present, remove it and discuss the effect.
f) In case a potential influential observation is present, remove it and discuss the effect.

asked 2020-11-03

Does a higher state per capita income equate to a higher per capita beer consumption? From the document Survey of Current Business, published by the U.S. Bureau of Economic Analysis, and from the Brewer’s Almanac, published by the Beer Institute, we obtained data on personal income per capita, in thousands of dollars, and per capita beer consumption, in gallons, for the 50 states and Washington, D.C.
a) Obtain a scatterplot for the data.
b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)-(f).
c) Determine and interpret the regression equation for the data.
d) Identify potential outliers and influential observations.
e) In case a potential outlier is present, remove it and discuss the effect.
f) In case a potential influential observation is present, remove it and discuss the effect.

asked 2021-02-09

Polychlorinated biphenyls (PCBs), industrial pollutants, are known to be carcinogens and a great danger to natural ecosystems. As a result of several studies, PCB production was banned in the United States in 1979 and by the Stockholm Convention on Persistent Organic Pollutants in 2001: One study, published in 1972 by R. Risebrough, is titled “Effects of Environmental Pollutants Upon Animals Other Than Man”. In that study, 50 Anacapa pelican eggs were collected and measured for their shell thickness, in millimetres (mm), and concentration of PCBs, in parts per million (ppm).
a) Obtain a scatterplot for the data.
b) Decide whether finding a regressimz line for the data is reasonable. If so, then also do parts (c)-(f).
c) Determine and interpret the regression equation for the data.
d) Identify potential outliers and influential observations.
e) In case a potential outlier is present, remove it and discuss the effect.
f) In case a potential influential observation is present, remove it and discuss the effect.

asked 2020-11-23

Geographical Analysis (Oct. 2006) published a study of a new method for analyzing remote-sensing data from satellite pixels in order to identify urban land cover. The method uses a numerical measure of the distribution of gaps, or the sizes of holes, in the pixel, called lacunarity. Summary statistics for the lacunarity measurements in a sample of 100 grassland pixels are x¯=225 and s=20s=20. It is known that the mean lacunarity measurement for all grassland pixels is 220. The method will be effective in identifying land cover if the standard deviation of the measurements is 10% (or less) of the true mean (i.e., if the standard deviation is less than 22). a. Give the null and alternative hypotheses for a test to determine whether, in fact, the standard deviation of all grassland pixels is less than 22. b. A MINITAB analysis of the data is provided below. Locate and interpret the p-value of the test. Use α=.10α=.10. Test for One Standard Deviation
Method Null hypothesisSigma = 22 Method Alternative hypothesisSigma = < 22
The standard method is only for the normal distribution. Statistics
NStDevVariance 10020.0400
Tests

asked 2021-01-17

A new thermostat has been engineered for the frozen food cases in large supermarkets. Both the old and new thermostats hold temperatures at an average of \(25^{\circ}F\). However, it is hoped that the new thermostat might be more dependable in the sense that it will hold temperatures closer to \(25^{\circ}F\). One frozen food case was equipped with the new thermostat, and a random sample of 21 temperature readings gave a sample variance of 5.1. Another similar frozen food case was equipped with the old thermostat, and a random sample of 19 temperature readings gave a sample variance of 12.8. Test the claim that the population variance of the old thermostat temperature readings is larger than that for the new thermostat. Use a \(5\%\) level of significance. How could your test conclusion relate to the question regarding the dependability of the temperature readings? (Let population 1 refer to data from the old thermostat.)

(a) What is the level of significance?

State the null and alternate hypotheses.

\(H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}>?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}\neq?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}?_{2}^{2},H1:?_{1}^{2}=?_{2}^{2}\)

(b) Find the value of the sample F statistic. (Round your answer to two decimal places.)

What are the degrees of freedom?

\(df_{N} = ?\)

\(df_{D} = ?\)

What assumptions are you making about the original distribution?

The populations follow independent normal distributions. We have random samples from each population.The populations follow dependent normal distributions. We have random samples from each population.The populations follow independent normal distributions.The populations follow independent chi-square distributions. We have random samples from each population.

(c) Find or estimate the P-value of the sample test statistic. (Round your answer to four decimal places.)

(d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis?

At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant. At the ? = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.

(e) Interpret your conclusion in the context of the application.

Reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings.Fail to reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings. Fail to reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.Reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.

(a) What is the level of significance?

State the null and alternate hypotheses.

\(H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}>?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}\neq?_{2}^{2}H0:?_{1}^{2}=?_{2}^{2},H1:?_{1}^{2}?_{2}^{2},H1:?_{1}^{2}=?_{2}^{2}\)

(b) Find the value of the sample F statistic. (Round your answer to two decimal places.)

What are the degrees of freedom?

\(df_{N} = ?\)

\(df_{D} = ?\)

What assumptions are you making about the original distribution?

The populations follow independent normal distributions. We have random samples from each population.The populations follow dependent normal distributions. We have random samples from each population.The populations follow independent normal distributions.The populations follow independent chi-square distributions. We have random samples from each population.

(c) Find or estimate the P-value of the sample test statistic. (Round your answer to four decimal places.)

(d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis?

At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant. At the ? = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant.At the ? = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant.

(e) Interpret your conclusion in the context of the application.

Reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings.Fail to reject the null hypothesis, there is sufficient evidence that the population variance is larger in the old thermostat temperature readings. Fail to reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.Reject the null hypothesis, there is insufficient evidence that the population variance is larger in the old thermostat temperature readings.