Identify which assumption is needed to use the linear regression model to obtain a meaningful fit that represents the true relationship well.

Question
Sampling distributions
Identify which assumption is needed to use the linear regression model to obtain a meaningful fit that represents the true relationship well.

2020-11-28
The assumption is the population mean of response variable yhas a straight-line relationship with an explanatory variable x.

Relevant Questions

Identify which assumption is needed to use the linear regression model to make inferences about the relationship.
Identify which assumption is the least critical.
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
To identify:An important assumption for using the bootstrap method
The distribution of height for a certain population of women is approximately normal with mean 65 inches and standard deviation 3.5 inches. Consider two different random samples taken from the population, one of size 5 and one of size 85.
Which of the following is true about the sampling distributions of the sample mean for the two sample sizes?
Both distributions are approximately normal with mean 65 and standard deviation 3.5.
A
Both distributions are approximately normal. The mean and standard deviation for size 5 are both less than the mean and standard deviation for size 85.
B
Both distributions are approximately normal with the same mean. The standard deviation for size 5 is greater than that for size 85.
C
Only the distribution for size 85 is approximately normal. Both distributions have mean 65 and standard deviation 3.5.
D
Only the distribution for size 85 is approximately normal. The mean and standard deviation for size 5 are both less than the mean and standard deviation for size 85.
E
The table shows the temperatures T (in degrees Fahrenheit) at which water boils at selected pressures p (in pounds per square inch). A model that approximates the datais: $$\displaystyle{T}={87.97}\ +\ {34.96}\ \text{In}\ {p}\ +\ {7.91}\ \sqrt{{{p}}}$$ a) Use a graphing untility to plot the data and graph the model in the same veiwing window. How well does the model fit the data? b) Use the graph to estimate the pressure at which the boiling point of water is PSK300^{\circ}\ F. c) Calculate T when the pressure is 74 pounds per square inch. Verify your answer graphically.
Which of the following is true about the sampling distribution of means?
A. Shape of the sampling distribution of means is always the same shape as the population distribution, no matter what the sample size is.
B. Sampling distributions of means are always nearly normal.
C. Sampling distributions of means get closer to normality as the sample size increases.
D. Sampling distribution of the mean is always right skewed since means cannot be smaller than 0.
Which of the following is true about sampling distributions?
-Shape of the sampling distribution is always the same shape as the population distribution, no matter what the sample size is.
-Sampling distributions are always nearly normal.
-Sampling distribution of the mean is always right skewed since means cannot be smaller than 0.
-Sampling distributions get closer to normality as the sample size increases.
Explain how to use the sampling distributions of A and B to decide which is the best estimator of $$\alpha$$.
The correct statement which is incorrect from the options about the sampling distribution of the sample mean
(a) the standard deviation of the sampling distribution will decrease as the sample size increases,
(b) the standard deviation of the sampling distribution is a measure of the variability of the sample mean among repeated samples,
(c) the sample mean is an unbiased estimator of the true population mean,
(d) the sampling distribution shows how the sample mean will vary in repeated samples,
(e) the sampling distributions shows how the sample was distributed around the sample mean.
1. The accuracy of the approximation it provides, improves when the trial success proportion p is closer to $$50\%$$