Make a scatterplot for the data in each table. Use the scatter plot to identify and clustering or outliers in the data. Value of Home Over Time Number of Years Owned: 0, 3, 6, 9, 12, 15, 18, 21 Value (1,000s of $): 80, 84, 86, 88, 89, 117, 119, 86

Question
Scatterplots
asked 2021-02-08
Make a scatterplot for the data in each table. Use the scatter plot to identify and clustering or outliers in the data.
Value of Home Over Time
Number of Years Owned: 0, 3, 6, 9, 12, 15, 18, 21
Value (1,000s of $): 80, 84, 86, 88, 89, 117, 119, 86

Answers (1)

2021-02-09
Step 1
Scatterplot
Number of years owned is on the horizontal axis and Value (1,000s of $) is on the vertical axis.
The number of years owned range from 0 to 21, thus an appropriate scale for the horizontal axis is from —3 to 22.
The value (1,000s of $) range from 80 to 119, thus an appropriate scale for the vertical axis is from 75 to 125.
image
Step 2
There appear to be two outliers at 15 and 18 years, because the corresponding points lie far above the general pattern in the other points.
Result:
Two outliers.
0

Relevant Questions

asked 2020-12-25
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
asked 2020-10-23
A random sample of \(\displaystyle{n}_{{1}}={16}\) communities in western Kansas gave the following information for people under 25 years of age.
\(\displaystyle{X}_{{1}}:\) Rate of hay fever per 1000 population for people under 25
\(\begin{array}{|c|c|} \hline 97 & 91 & 121 & 129 & 94 & 123 & 112 &93\\ \hline 125 & 95 & 125 & 117 & 97 & 122 & 127 & 88 \\ \hline \end{array}\)
A random sample of \(\displaystyle{n}_{{2}}={14}\) regions in western Kansas gave the following information for people over 50 years old.
\(\displaystyle{X}_{{2}}:\) Rate of hay fever per 1000 population for people over 50
\(\begin{array}{|c|c|} \hline 94 & 109 & 99 & 95 & 113 & 88 & 110\\ \hline 79 & 115 & 100 & 89 & 114 & 85 & 96\\ \hline \end{array}\)
(i) Use a calculator to calculate \(\displaystyle\overline{{x}}_{{1}},{s}_{{1}},\overline{{x}}_{{2}},{\quad\text{and}\quad}{s}_{{2}}.\) (Round your answers to two decimal places.)
(ii) Assume that the hay fever rate in each age group has an approximately normal distribution. Do the data indicate that the age group over 50 has a lower rate of hay fever? Use \(\displaystyle\alpha={0.05}.\)
(a) What is the level of significance?
State the null and alternate hypotheses.
\(\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}<\mu_{{2}}\)
\(\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}>\mu_{{2}}\)
\(\displaystyle{H}_{{0}}:\mu_{{1}}=\mu_{{2}},{H}_{{1}}:\mu_{{1}}\ne\mu_{{2}}\)
\(\displaystyle{H}_{{0}}:\mu_{{1}}>\mu_{{2}},{H}_{{1}}:\mu_{{1}}=\mu_{{12}}\)
(b) What sampling distribution will you use? What assumptions are you making?
The standard normal. We assume that both population distributions are approximately normal with known standard deviations.
The Student's t. We assume that both population distributions are approximately normal with unknown standard deviations,
The standard normal. We assume that both population distributions are approximately normal with unknown standard deviations,
The Student's t. We assume that both population distributions are approximately normal with known standard deviations,
What is the value of the sample test statistic? (Test the difference \(\displaystyle\mu_{{1}}-\mu_{{2}}\). Round your answer to three decimalplaces.)
What is the value of the sample test statistic? (Test the difference \(\displaystyle\mu_{{1}}-\mu_{{2}}\). Round your answer to three decimal places.)
(c) Find (or estimate) the P-value.
P-value \(\displaystyle>{0.250}\)
\(\displaystyle{0.125}<{P}-\text{value}<{0},{250}\)
\(\displaystyle{0},{050}<{P}-\text{value}<{0},{125}\)
\(\displaystyle{0},{025}<{P}-\text{value}<{0},{050}\)
\(\displaystyle{0},{005}<{P}-\text{value}<{0},{025}\)
P-value \(\displaystyle<{0.005}\)
Sketch the sampling distribution and show the area corresponding to the P-value.
P.vaiue Pevgiue
P-value f P-value
asked 2021-02-25
a. Make a scatterplot for the data in the table below.
Height and Weight of Football Players
Height (in.): 77 75 76 70 70 73 74 74 73
Weight (lb): 230 220 212 190 201 245 218 260 196
b. Which display - the table or the scatter plot - do you think is a more appropriate display of the data? Explain your reasoning.
asked 2021-01-15
The article “Anodic Fenton Treatment of Treflan MTF” describes a two-factor experiment designed to study the sorption of the herbicide trifluralin. The factors are the initial trifluralin concentration and the \(\displaystyle{F}{e}^{{{2}}}\ :\ {H}_{{{2}}}\ {O}_{{{2}}}\) delivery ratio. There were three replications for each treatment. The results presented in the following table are consistent with the means and standard deviations reported in the article. \(\displaystyle{b}{e}{g}\in{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}\text{Initial Concentration (M)}&\text{Delivery Ratio}&\text{Sorption (%)}\ {15}&{1}:{0}&{10.90}\quad{8.47}\quad{12.43}\ {15}&{1}:{1}&{3.33}\quad{2.40}\quad{2.67}\ {15}&{1}:{5}&{0.79}\quad{0.76}\quad{0.84}\ {15}&{1}:{10}&{0.54}\quad{0.69}\quad{0.57}\ {40}&{1}:{0}&{6.84}\quad{7.68}\quad{6.79}\ {40}&{1}:{1}&{1.72}\quad{1.55}\quad{1.82}\ {40}&{1}:{5}&{0.68}\quad{0.83}\quad{0.89}\ {40}&{1}:{10}&{0.58}\quad{1.13}\quad{1.28}\ {100}&{1}:{0}&{6.61}\quad{6.66}\quad{7.43}\ {100}&{1}:{1}&{1.25}\quad{1.46}\quad{1.49}\ {100}&{1}:{5}&{1.17}\quad{1.27}\quad{1.16}\ {100}&{1}:{10}&{0.93}&{0.67}&{0.80}\ {e}{n}{d}{\left\lbrace{m}{a}{t}{r}{i}{x}\right\rbrace}\) a) Estimate all main effects and interactions. b) Construct an ANOVA table. You may give ranges for the P-values. c) Is the additive model plausible? Provide the value of the test statistic, its null distribution, and the P-value.
asked 2021-02-11
Several models have been proposed to explain the diversification of life during geological periods. According to Benton (1997), The diversification of marine families in the past 600 million years (Myr) appears to have followed two or three logistic curves, with equilibrium levels that lasted for up to 200 Myr. In contrast, continental organisms clearly show an exponential pattern of diversification, and although it is not clear whether the empirical diversification patterns are real or are artifacts of a poor fossil record, the latter explanation seems unlikely. In this problem, we will investigate three models fordiversification. They are analogous to models for populationgrowth, however, the quantities involved have a differentinterpretation. We denote by N(t) the diversification function,which counts the number of taxa as a function of time, and by rthe intrinsic rate of diversification.
(a) (Exponential Model) This model is described by \(\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{e}}}{N}\ {\left({8.86}\right)}.\) Solve (8.86) with the initial condition N(0) at time 0, and show that \(\displaystyle{r}_{{{e}}}\) can be estimated from \(\displaystyle{r}_{{{e}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ {\left({8.87}\right)}\)
(b) (Logistic Growth) This model is described by \(\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{l}}}{N}\ {\left({1}\ -\ {\frac{{{N}}}{{{K}}}}\right)}\ {\left({8.88}\right)}\) where K is the equilibrium value. Solve (8.88) with the initial condition N(0) at time 0, and show that \(\displaystyle{r}_{{{l}}}\) can be estimated from \(\displaystyle{r}_{{{l}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{K}\ -\ {N}{\left({0}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ +\ {\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{K}\ -\ {N}{\left({t}\right)}}}}\right]}\ {\left({8.89}\right)}\) for \(\displaystyle{N}{\left({t}\right)}\ {<}\ {K}.\)
(c) Assume that \(\displaystyle{N}{\left({0}\right)}={1}\) and \(\displaystyle{N}{\left({10}\right)}={1000}.\) Estimate \(\displaystyle{r}_{{{e}}}\) and \(\displaystyle{r}_{{{l}}}\) for both \(\displaystyle{K}={1001}\) and \(\displaystyle{K}={10000}.\)
(d) Use your answer in (c) to explain the following quote from Stanley (1979): There must be a general tendency for calculated values of \(\displaystyle{\left[{r}\right]}\) to represent underestimates of exponential rates,because some radiation will have followed distinctly sigmoid paths during the interval evaluated.
(e) Explain why the exponential model is a good approximation to the logistic model when \(\displaystyle\frac{{N}}{{K}}\) is small compared with 1.
asked 2021-02-09
A two-sample inference deals with dependent and independent inferences. In a two-sample hypothesis testing problem, underlying parameters of two different populations are compared. In a longitudinal (or follow-up) study, the same group of people is followed over time. Two samples are said to be paired when each data point in the first sample is matched and related to a unique data point in the second sample.
This problem demonstrates inference from two dependent (follow-up) samples using the data from the hypothetical study of new cases of tuberculosis (TB) before and after the vaccination was done in several geographical areas in a country in sub-Saharan Africa. Conclusion about the null hypothesis is to note the difference between samples.
The problem that demonstrates inference from two dependent samples uses hypothetical data from the TB vaccinations and the number of new cases before and after vaccination. PSK\begin{array}{|c|c|} \hline Geographical\ regions & Before\ vaccination & After\ vaccination\\ \hline 1 & 85 & 11\\ \hline 2 & 77 & 5\\ \hline 3 & 110 & 14\\ \hline 4 & 65 & 12\\ \hline 5 & 81 & 10\\\hline 6 & 70 & 7\\ \hline 7 & 74 & 8\\ \hline 8 & 84 & 11\\ \hline 9 & 90 & 9\\ \hline 10 & 95 & 8\\ \hline \end{array}ZSK
Using the Minitab statistical analysis program to enter the data and perform the analysis, complete the following: Construct a one-sided \(\displaystyle{95}\%\) confidence interval for the true difference in population means. Test the null hypothesis that the population means are identical at the 0.05 level of significance.
asked 2020-11-08
Testing for a Linear Correlation. In Exercises 13–28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of \(\alpha = 0.05\). Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.) Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population [based on data from “The Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),” by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1]. Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities? \(\begin{matrix} \text{Lemon Imports} & 230 & 265 & 358 & 480 & 530\\ \text{Crashe Fatality Rate} & 15.9 & 15.7 & 15.4 & 15.3 & 14.9\\ \end{matrix}\)
asked 2021-02-03
The following data on = soil depth (in centimeters) and y = percentage of montmorillonite in the soil were taken from a scatterplot in the paper "Ancient Maya Drained Field Agriculture: Its Possible Application Today in the New River Floodplain, Belize, C.A." (Agricultural Ecosystems and Environment [1984]: 67-84):
a. Draw a scatterplot of y versus x.
b. The equation of the least-squares line is 0.45x. Draw this line on your scatterplot. Do there appear to be any large residuals?
c. Compute the residuals, and construct a residual plot. Are there any unusual features in the plot?
x 40 50 60 70 80 90 100
y 58 34 32 30 28 27 22
\(\displaystyle{\left[\hat{{{y}}}={64.50}\right]}\).
asked 2021-01-13
The accompanying data on y = normalized energy \(\displaystyle{\left[{\left(\frac{{J}}{{m}^{{2}}}\right)}\right]}\) and x = intraocular pressure (mmHg) appeared in a scatterplot in the article “Evaluating the Risk of Eye Injuries: Intraocular Pressure During High Speed Projectile Impacts” (Current Eye Research, 2012: 43–49), an estimated regression function was superimposed on the plot.
x 2761 19764 25713 3980 12782 19008 y 1553 14999 32813 1667 8741 16526 x 19028 14397 9606 3905 25731 y 26770 16526 9868 6640 1220 30730
Here is Minitab output from fitting the simple linear regression model. Does the model appear to specify a useful relationship between the two variables?
Predictor Coef SE Coef T P Constant -5090 2257 -2.26 0.048 Pressure 1.2912 0.1347 9.59 0.000
\left[S=3679.36, R-Sq = 90.2%, R-Sq(adj)=89.2%\right].
asked 2021-02-19
Make a scatterplot for each set of data. Tell whether the data show a linear association or a nonlinear association.
(1,2),(7,9.5),(4,7),(2,4.2),(6,8.25),(3,5.8),(5,8),(8,10),(0,0)
...