Scatterplots
What kind of plot is useful for deciding whether it is reasonable to find a regression to find a regression plane for a set of data points involving several predictor variables?

2020-11-13
Step 1
Scatterplot matrix:
A scatterplot matrix is an array or a collection of scatterplots, consisting of all possible scatterplots of the response variable drawn against each predictor and also, between every pair of predictors.
Step 2
A careful inspection of the scatterplot matrix reveals the nature of relationship between all the variables in the data set. It is known that it is reasonable to fit a regression plane to a data, only if there exists a linear relationship or at least, a somewhat linear relationship of the response variable with each of the predictor variables.
Thus, the plot that is useful for deciding whether it is reasonable to find a regression plane for a set of data points involving several predictor variables is a scatterplot matrix.

Make a scatterplot for each set of data. Tell whether the data show a linear association or a nonlinear association.
(1,2),(7,9.5),(4,7),(2,4.2),(6,8.25),(3,5.8),(5,8),(8,10),(0,0)
Make a scatterplot for the data in each table. Use the scatter plot to identify and clustering or outliers in the data.
Value of Home Over Time
Number of Years Owned: 0, 3, 6, 9, 12, 15, 18, 21
Value (1,000s of $): 80, 84, 86, 88, 89, 117, 119, 86 asked 2020-12-25 Case: Dr. Jung’s Diamonds Selection With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring. After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung. 1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest? 2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why? 3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics? 4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase? 5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why? 1 6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics? 7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase? 8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why? 9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics? 10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase? asked 2020-10-27 Answer true or false to the following statements and explain your answers. a. In multiple linear regression, we can determine whether we are extrapolating in predicting the value of the response variable for a given set of predictor variable values by determining whether each predictor variable value falls in the range of observed values of that predictor. b. Irregularly shaped regions of the values of predictor variables are easy to detect with two-dimensional scatterplots of pairs of predictor variables, and thus it is easy to determine whether we are extrapolating when predicting the response variable. asked 2021-05-04 Make a scatterplot for each set of data. Hits: 7 8 4 11 8 2 5 9 1 4 Runs: 3 2 2 7 4 2 1 3 0 1 asked 2021-02-09 Polychlorinated biphenyls (PCBs), industrial pollutants, are known to be carcinogens and a great danger to natural ecosystems. As a result of several studies, PCB production was banned in the United States in 1979 and by the Stockholm Convention on Persistent Organic Pollutants in 2001: One study, published in 1972 by R. Risebrough, is titled “Effects of Environmental Pollutants Upon Animals Other Than Man”. In that study, 50 Anacapa pelican eggs were collected and measured for their shell thickness, in millimetres (mm), and concentration of PCBs, in parts per million (ppm). a) Obtain a scatterplot for the data. b) Decide whether finding a regressimz line for the data is reasonable. If so, then also do parts (c)-(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect. asked 2020-12-22 Regarding a scatterplot matrix: a. Identify two of its uses. b. What property should the scatterplots of the response versus each predictor variable have in order to proceed to obtain the regression plane for the data? asked 2020-10-21 An issue of BARRON’S presented information on top wealth managers in the United States, based on individual clients with accounts of$1 million or more. Data were given for various variables, two of which were number of private client managers and private client assets. a) Obtain a scatterplot for the data. b) Decide whether finding a regression line for the data is reasonable. If so, then also do parts (c)–(f). c) Determine and interpret the regression equation for the data. d) Identify potential outliers and influential observations. e) In case a potential outlier is present, remove it and discuss the effect. f) In case a potential influential observation is present, remove it and discuss the effect.
Two scatterplots are shown below.
Scatterplot 1
A scatterplot has 14 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 60) up and right to approximately (95, 85).
The points are somewhat scattered.
Scatterplot 2
A scatterplot has 10 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 55) steeply up and right to approximately (70, 90), and then steeply down and right to approximately (85, 60).
The points are somewhat scattered.
Explain why it makes sense to use the least-squares line to summarize the relationship between x and y for one of these data sets but not the other.
Scatterplot 1 seems to show a relationship between x and y, while Scatterplot 2 shows a relationship between the two variables. So it makes sense to use the least squares line to summarize the relationship between x and y for the data set in , but not for the data set in .