# Make a scatterplot for each set of data. Hits: 7 8 4 11 8 2 5 9 1 4 Runs: 3 2 2 7 4 2 1 3 0 1

Question
Scatterplots
Make a scatterplot for each set of data.
Hits: 7 8 4 11 8 2 5 9 1 4
Runs: 3 2 2 7 4 2 1 3 0 1

2021-02-01
Step 1
Scatterplot
Hits is on the horizontal axis and Runs is on the vertical axis.
The number of hits range from 1 to 11, thus an appropriate scale for the horizontal axis is from 0 to 12
The number of runs range from 0 to 7, thus an appropriate scale for the vertical axis is from —1 to 8.

Result:
Hits is on the horizontal axis and Runs is on the vertical axis.

### Relevant Questions

Make a scatterplot for each set of data. Tell whether the data show a linear association or a nonlinear association.
(1,2),(7,9.5),(4,7),(2,4.2),(6,8.25),(3,5.8),(5,8),(8,10),(0,0)
For each set of data below, draw a scatterplot and decide whether or not the data exhibits approximately periodic behaviour.
a) $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}{x}&{0}&{1}&{2}&{3}&{4}&{5}&{6}&{7}&{8}&{9}&{10}&{11}&{12}\backslash{h}{l}\in{e}{y}&{0}&{1}&{1.4}&{1}&{0}&-{1}&-{1.4}&-{1}&{0}&{1}&{1.4}&{1}&{0}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
b) $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}{x}&{0}&{1}&{2}&{3}&{4}\backslash{h}{l}\in{e}{y}&{4}&{1}&{0}&{1}&{4}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
c) $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}{x}&{0}&{0.5}&{1.0}&{1.5}&{2.0}&{2.5}&{3.0}&{3.5}\backslash{h}{l}\in{e}{y}&{0}&{1.9}&{3.5}&{4.5}&{4.7}&{4.3}&{3.4}&{2.4}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
d) $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}{x}&{0}&{2}&{3}&{4}&{5}&{6}&{7}&{8}&{9}&{10}&{12}\backslash{h}{l}\in{e}{y}&{0}&{4.7}&{3.4}&{1.7}&{2.1}&{5.2}&{8.9}&{10.9}&{10.2}&{8.4}&{10.4}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
Make a scatterplot for the data in each table. Use the scatter plot to identify and clustering or outliers in the data.
Value of Home Over Time
Number of Years Owned: 0, 3, 6, 9, 12, 15, 18, 21
Value (1,000s of \$): 80, 84, 86, 88, 89, 117, 119, 86
Make a scatterplot of the data and graph the function $$\displaystyle{f{{\left({x}\right)}}}=\ -{8}{x}^{{{2}}}\ +\ {95}{x}\ +\ {745}.$$ Make a residual plot and describe how well the function fits the data. $$\displaystyle{b}{e}{g}\in{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}{\left\lbrace{\left|{c}\right|}{c}{\mid}\right\rbrace}{h}{l}\in{e}\text{Price Increase}&{0}&{1}&{2}&{3}&{4}\backslash{h}{l}\in{e}\text{Sales}&{730}&{850}&{930}&{951}&{1010}\backslash{h}{l}\in{e}{e}{n}{d}{\left\lbrace{a}{r}{r}{a}{y}\right\rbrace}$$
You can use a scatterplot to estimate a value between two known values. Estimate the world production of oil when the United States produced 12% of the world's oil.
Draw a scatterplot.
Find 12% on the vertical axis. More horizontally to the line of points. Estiamte where the new point would fit in the pattern. Then move down to the horizontal axis.
Oil Production 1960-2000 (billion barrels)
World Oil production: 45.9, 52.8, 59.9, 68.3, 72.5
U.S. Percent of world Oil Production: 21, 16, 13, 9, 7
A two-sample inference deals with dependent and independent inferences. In a two-sample hypothesis testing problem, underlying parameters of two different populations are compared. In a longitudinal (or follow-up) study, the same group of people is followed over time. Two samples are said to be paired when each data point in the first sample is matched and related to a unique data point in the second sample.
This problem demonstrates inference from two dependent (follow-up) samples using the data from the hypothetical study of new cases of tuberculosis (TB) before and after the vaccination was done in several geographical areas in a country in sub-Saharan Africa. Conclusion about the null hypothesis is to note the difference between samples.
The problem that demonstrates inference from two dependent samples uses hypothetical data from the TB vaccinations and the number of new cases before and after vaccination. PSK\begin{array}{|c|c|} \hline Geographical\ regions & Before\ vaccination & After\ vaccination\\ \hline 1 & 85 & 11\\ \hline 2 & 77 & 5\\ \hline 3 & 110 & 14\\ \hline 4 & 65 & 12\\ \hline 5 & 81 & 10\\\hline 6 & 70 & 7\\ \hline 7 & 74 & 8\\ \hline 8 & 84 & 11\\ \hline 9 & 90 & 9\\ \hline 10 & 95 & 8\\ \hline \end{array}ZSK
Using the Minitab statistical analysis program to enter the data and perform the analysis, complete the following: Construct a one-sided $$\displaystyle{95}\%$$ confidence interval for the true difference in population means. Test the null hypothesis that the population means are identical at the 0.05 level of significance.
Make a scatterplot for the data.
Length (mi) and Water Flow $$\displaystyle{\left({1},{000}{f}\frac{{t}^{{{3}}}}{{s}}\right)}$$ of Rivers
Length: 2540, 1980, 1460, 1420, 1290, 1040, 886, 774, 724, 659
Flow: 76, 225, 41, 58, 56, 57, 68, 67, 67, 41
Two scatterplots are shown below.
Scatterplot 1
A scatterplot has 14 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 60) up and right to approximately (95, 85).
The points are somewhat scattered.
Scatterplot 2
A scatterplot has 10 points.
The horizontal axis is labeled "x" and has values from 30 to 110.
The vertical axis is labeled "y" and has values from 30 to 110.
The points are plotted from approximately (55, 55) steeply up and right to approximately (70, 90), and then steeply down and right to approximately (85, 60).
The points are somewhat scattered.
Explain why it makes sense to use the least-squares line to summarize the relationship between x and y for one of these data sets but not the other.
Scatterplot 1 seems to show a relationship between x and y, while Scatterplot 2 shows a relationship between the two variables. So it makes sense to use the least squares line to summarize the relationship between x and y for the data set in , but not for the data set in .
The accompanying data on y = normalized energy $$\displaystyle{\left[{\left(\frac{{J}}{{m}^{{2}}}\right)}\right]}$$ and x = intraocular pressure (mmHg) appeared in a scatterplot in the article “Evaluating the Risk of Eye Injuries: Intraocular Pressure During High Speed Projectile Impacts” (Current Eye Research, 2012: 43–49), an estimated regression function was superimposed on the plot.