# The presidential election is coming. Five survey companies (A, B, C, D, and E) are doing survey to forecast whether or not the Republican candidate will win the election. Each company randomly selects a sample size between 1000 and 1500 people. All of these five companies interview people over the phone during Tuesday and Wednesday. The interviewee will be asked if he or she is 18 years old or above and U.S. citizen who are registered to vote. If yes, the interviewee will be further asked: will you vote for the Republican candidate? On Thursday morning, these five companies announce their survey sample and results at the same time on the newspapers. The results show that a% (from A), b% (from B), c% (from C), d% (from D), and e% (from E) will support the Republican candidate. The margin of

Question
Confidence intervals
The presidential election is coming. Five survey companies (A, B, C, D, and E) are doing survey to forecast whether or not the Republican candidate will win the election. Each company randomly selects a sample size between 1000 and 1500 people. All of these five companies interview people over the phone during Tuesday and Wednesday. The interviewee will be asked if he or she is 18 years old or above and U.S. citizen who are registered to vote. If yes, the interviewee will be further asked: will you vote for the Republican candidate? On Thursday morning, these five companies announce their survey sample and results at the same time on the newspapers. The results show that a% (from A), b% (from B), c% (from C), d% (from D), and e% (from E) will support the Republican candidate. The margin of error is plus/minus 3% for all results. Suppose that $$\displaystyle{c}{>}{a}{>}{d}{>}{e}{>}{b}$$. When you see these results from the newspapers, can you exactly identify which result(s) is (are) not reliable and not accurate? That is, can you identify which estimation interval(s) does (do) not include the true population proportion? If you can, explain why you can, if no, explain why you cannot and what information you need to identify. Discuss and explain your reasons. You must provide your statistical analysis and reasons.

2020-12-30
Step 1 Introduction: The formula for the confidence interval for a population proportion, $$\displaystyle\pi$$ is shown below, where it is assumed that the sample proportion observed from a sample of size n is observed to be p, and the level of confidence is $$\displaystyle{100}{\left({1}\ –\ \alpha\right)}\%$$, so that the upper $$\displaystyle\alpha$$-point for the standard normal distribution used in this case is $$\displaystyle{z}_{{\frac{\alpha}{{2}}}}$$. When $$\displaystyle\pi$$ is known or assumed: $$\displaystyle{\left({p}-{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{\pi{\left({1}-\pi\right)}}}{{{n}}}}}},{p}+{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{\pi{\left({1}-\pi\right)}}}{{{n}}}}}}\right)}.$$ When $$\displaystyle\pi$$ is unknown and not assumed: $$\displaystyle{\left({p}-{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{{p}{\left({1}-{p}\right)}}}{{{n}}}}}},{p}+{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{{p}{\left({1}-{p}\right)}}}{{{n}}}}}}\right)}.$$ The confidence interval gives an interval estimate of the parameter of interest. Calculation of the confidence interval for a population proportion of a characteristic of interest includes the following quantities: Point estimate, that is, sample proportion observed, Size of the sample collected, Level of confidence desired. Its width depends mainly upon the following characteristics of the analysis: Level of confidence: Higher the desired level of confidence, wider would be the confidence interval for a given sample size and variability. Sample size: Larger the sample size is, smaller is the width of the confidence interval at a given level of confidence. Variability: More the variability in the data, wider would be the confidence interval for a given sample size and level of confidence. Step 2 Discussion: In this case, the margin of error is given as 3%. In this context, it means that each of the 5 companies added and subtracted 3% from their respective point estimates (a%, b%, c%, d%, and e% respectively). Observe that, the sample sizes used by the companies are not the same- the sizes vary between 1,000 and 1,500. There may be two possibilities if each company uses 3% as the margin of error: All the companies use the same level of confidence and assume the same value of π. In this case, their chosen sample sizes should affect the width of the interval, and hence, the margin of error. The companies ignore their level of confidence, sample size and assumed value of π while choosing 3% as their margin of error. All the companies manipulate either one, or all of the level of confidence, sample size and assumed value of π, so that each can achieve 3% as the margin of error. The companies use p instead of π to calculate the confidence interval, all assume the same confidence level and different sample sizes, but the margin of error turns out to be 3% for all. The problem in the first two cases is that, the confidence intervals are not comparable, and even if they are compared, the level of accuracy of such comparison is questionable. Assume that the third case is true here. Then, at a glance, it would appear that the companies with the highest and lowest point estimate values, that is, Companies C and B, are the most likely to be unreliable, because they are the ones producing extreme estimates. However, one should not jump to a conclusion just with this information. It is necessary to subtract and add 3% to each of the 5 point estimates or percentages, to obtain the 5 intervals. The intervals (in percentages) would be as follows: Company A: $$\displaystyle{\left({a}-{3},{a}+{3}\right)}$$ Company B: $$\displaystyle{\left({b}-{3},{b}+{3}\right)}$$ Company C: $$\displaystyle{\left({c}-{3},{c}+{3}\right)}$$ Company D: $$\displaystyle{\left({d}-{3},{d}+{3}\right)}$$ Company E: $$\displaystyle{\left({e}-{3},{e}+{3}\right)}$$ If the estimates of all the companies are to be reliable, then each of the above intervals must greatly overlap with all the others. In that case, it would not at all be possible to say which estimate is unreliable, which is not. If one or both of the extreme companies (C or B) is/are such that their intervals are completely detached from, or only slightly overlapping with the other intervals, then that company/companies can be considered as the most likely to be unreliable. Another possibility is that, the first 2 (or 3) companies with the highest point estimates (C and A, or, C, A, and D) have highly overlapping intervals, while the remaining 3 (or 2) companies with the lowest point estimates (D, E, and B, or, E and B) have highly overlapping intervals among themselves, but the two groups overlap very slightly or not at all. In that case, again, it would be difficult to identify which estimates are reliable, which are not. Note that, even if some idea can be formed about the most unreliable/inaccurate estimates, it would be in terms of which company (or companies) are the most likely to be unreliable, it is not possible to exactly identify such companies.

### Relevant Questions

Give a full and correct answer Why is it important that a sample be random and representative when conducting hypothesis testing? Representative Sample vs. Random Sample: An Overview Economists and researchers seek to reduce sampling bias to near negligible levels when employing statistical analysis. Three basic characteristics in a sample reduce the chances of sampling bias and allow economists to make more confident inferences about a general population from the results obtained from the sample analysis or study: * Such samples must be representative of the chosen population studied. * They must be randomly chosen, meaning that each member of the larger population has an equal chance of being chosen. * They must be large enough so as not to skew the results. The optimal size of the sample group depends on the precise degree of confidence required for making an inference. Representative sampling and random sampling are two techniques used to help ensure data is free of bias. These sampling techniques are not mutually exclusive and, in fact, they are often used in tandem to reduce the degree of sampling error in an analysis and allow for greater confidence in making statistical inferences from the sample in regard to the larger group. Representative Sample A representative sample is a group or set chosen from a larger statistical population or group of factors or instances that adequately replicates the larger group according to whatever characteristic or quality is under study. A representative sample parallels key variables and characteristics of the large society under examination. Some examples include sex, age, education level, socioeconomic status (SES), or marital status. A larger sample size reduced sampling error and increases the likelihood that the sample accurately reflects the target population. Random Sample A random sample is a group or set chosen from a larger population or group of factors of instances in a random manner that allows for each member of the larger group to have an equal chance of being chosen. A random sample is meant to be an unbiased representation of the larger population. It is considered a fair way to select a sample from a larger population since every member of the population has an equal chance of getting selected. Special Considerations: People collecting samples need to ensure that bias is minimized. Representative sampling is one of the key methods of achieving this because such samples replicate as closely as possible elements of the larger population under study. This alone, however, is not enough to make the sampling bias negligible. Combining the random sampling technique with the representative sampling method reduces bias further because no specific member of the representative population has a greater chance of selection into the sample than any other. Summarize this article in 250 words.
1. A researcher is interested in finding a 98% confidence interval for the mean number of times per day that college students text. The study included 144 students who averaged 44.7 texts per day. The standard deviation was 16.5 texts. a. To compute the confidence interval use a ? z t distribution. b. With 98% confidence the population mean number of texts per day is between and texts. c. If many groups of 144 randomly selected members are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population number of texts per day and about percent will not contain the true population mean number of texts per day. 2. You want to obtain a sample to estimate how much parents spend on their kids birthday parties. Based on previous study, you believe the population standard deviation is approximately $$\displaystyle\sigma={40.4}$$ dollars. You would like to be 90% confident that your estimate is within 1.5 dollar(s) of average spending on the birthday parties. How many parents do you have to sample? n = 3. You want to obtain a sample to estimate a population mean. Based on previous evidence, you believe the population standard deviation is approximately $$\displaystyle\sigma={57.5}$$. You would like to be 95% confident that your estimate is within 0.1 of the true population mean. How large of a sample size is required?
Several models have been proposed to explain the diversification of life during geological periods. According to Benton (1997), The diversification of marine families in the past 600 million years (Myr) appears to have followed two or three logistic curves, with equilibrium levels that lasted for up to 200 Myr. In contrast, continental organisms clearly show an exponential pattern of diversification, and although it is not clear whether the empirical diversification patterns are real or are artifacts of a poor fossil record, the latter explanation seems unlikely. In this problem, we will investigate three models fordiversification. They are analogous to models for populationgrowth, however, the quantities involved have a differentinterpretation. We denote by N(t) the diversification function,which counts the number of taxa as a function of time, and by rthe intrinsic rate of diversification.
(a) (Exponential Model) This model is described by $$\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{e}}}{N}\ {\left({8.86}\right)}.$$ Solve (8.86) with the initial condition N(0) at time 0, and show that $$\displaystyle{r}_{{{e}}}$$ can be estimated from $$\displaystyle{r}_{{{e}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ {\left({8.87}\right)}$$
(b) (Logistic Growth) This model is described by $$\displaystyle{\frac{{{d}{N}}}{{{\left.{d}{t}\right.}}}}={r}_{{{l}}}{N}\ {\left({1}\ -\ {\frac{{{N}}}{{{K}}}}\right)}\ {\left({8.88}\right)}$$ where K is the equilibrium value. Solve (8.88) with the initial condition N(0) at time 0, and show that $$\displaystyle{r}_{{{l}}}$$ can be estimated from $$\displaystyle{r}_{{{l}}}={\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{K}\ -\ {N}{\left({0}\right)}}}{{{N}{\left({0}\right)}}}}\right]}\ +\ {\frac{{{1}}}{{{t}}}}\ {\ln{\ }}{\left[{\frac{{{N}{\left({t}\right)}}}{{{K}\ -\ {N}{\left({t}\right)}}}}\right]}\ {\left({8.89}\right)}$$ for $$\displaystyle{N}{\left({t}\right)}\ {<}\ {K}.$$
(c) Assume that $$\displaystyle{N}{\left({0}\right)}={1}$$ and $$\displaystyle{N}{\left({10}\right)}={1000}.$$ Estimate $$\displaystyle{r}_{{{e}}}$$ and $$\displaystyle{r}_{{{l}}}$$ for both $$\displaystyle{K}={1001}$$ and $$\displaystyle{K}={10000}.$$
(d) Use your answer in (c) to explain the following quote from Stanley (1979): There must be a general tendency for calculated values of $$\displaystyle{\left[{r}\right]}$$ to represent underestimates of exponential rates,because some radiation will have followed distinctly sigmoid paths during the interval evaluated.
(e) Explain why the exponential model is a good approximation to the logistic model when $$\displaystyle\frac{{N}}{{K}}$$ is small compared with 1.
The table below shows the number of people for three different race groups who were shot by police that were either armed or unarmed. These values are very close to the exact numbers. They have been changed slightly for each student to get a unique problem.
Suspect was Armed:
Black - 543
White - 1176
Hispanic - 378
Total - 2097
Suspect was unarmed:
Black - 60
White - 67
Hispanic - 38
Total - 165
Total:
Black - 603
White - 1243
Hispanic - 416
Total - 2262
Give your answer as a decimal to at least three decimal places.
a) What percent are Black?
b) What percent are Unarmed?
c) In order for two variables to be Independent of each other, the P $$(A and B) = P(A) \cdot P(B) P(A and B) = P(A) \cdot P(B).$$
This just means that the percentage of times that both things happen equals the individual percentages multiplied together (Only if they are Independent of each other).
Therefore, if a person's race is independent of whether they were killed being unarmed then the percentage of black people that are killed while being unarmed should equal the percentage of blacks times the percentage of Unarmed. Let's check this. Multiply your answer to part a (percentage of blacks) by your answer to part b (percentage of unarmed).
Remember, the previous answer is only correct if the variables are Independent.
d) Now let's get the real percent that are Black and Unarmed by using the table?
If answer c is "significantly different" than answer d, then that means that there could be a different percentage of unarmed people being shot based on race. We will check this out later in the course.
Let's compare the percentage of unarmed shot for each race.
e) What percent are White and Unarmed?
f) What percent are Hispanic and Unarmed?
If you compare answers d, e and f it shows the highest percentage of unarmed people being shot is most likely white.
Why is that?
This is because there are more white people in the United States than any other race and therefore there are likely to be more white people in the table. Since there are more white people in the table, there most likely would be more white and unarmed people shot by police than any other race. This pulls the percentage of white and unarmed up. In addition, there most likely would be more white and armed shot by police. All the percentages for white people would be higher, because there are more white people. For example, the table contains very few Hispanic people, and the percentage of people in the table that were Hispanic and unarmed is the lowest percentage.
Think of it this way. If you went to a college that was 90% female and 10% male, then females would most likely have the highest percentage of A grades. They would also most likely have the highest percentage of B, C, D and F grades
The correct way to compare is "conditional probability". Conditional probability is getting the probability of something happening, given we are dealing with just the people in a particular group.
g) What percent of blacks shot and killed by police were unarmed?
h) What percent of whites shot and killed by police were unarmed?
i) What percent of Hispanics shot and killed by police were unarmed?
You can see by the answers to part g and h, that the percentage of blacks that were unarmed and killed by police is approximately twice that of whites that were unarmed and killed by police.
j) Why do you believe this is happening?
Do a search on the internet for reasons why blacks are more likely to be killed by police. Read a few articles on the topic. Write your response using the articles as references. Give the websites used in your response. Your answer should be several sentences long with at least one website listed. This part of this problem will be graded after the due date.
Case: Dr. Jung’s Diamonds Selection
With Christmas coming, Dr. Jung became interested in buying diamonds for his wife. After perusing the Web, he learned about the “4Cs” of diamonds: cut, color, clarity, and carat. He knew his wife wanted round-cut earrings mounted in white gold settings, so he immediately narrowed his focus to evaluating color, clarity, and carat for that style earring.
After a bit of searching, Dr. Jung located a number of earring sets that he would consider purchasing. But he knew the pricing of diamonds varied considerably. To assist in his decision making, Dr. Jung decided to use regression analysis to develop a model to predict the retail price of different sets of round-cut earrings based on their color, clarity, and carat scores. He assembled the data in the file Diamonds.xls for this purpose. Use this data to answer the following questions for Dr. Jung.
1) Prepare scatter plots showing the relationship between the earring prices (Y) and each of the potential independent variables. What sort of relationship does each plot suggest?
2) Let X1, X2, and X3 represent diamond color, clarity, and carats, respectively. If Dr. Jung wanted to build a linear regression model to estimate earring prices using these variables, which variables would you recommend that he use? Why?
3) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
4) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
5) Dr. Jung now remembers that it sometimes helps to perform a square root transformation on the dependent variable in a regression problem. Modify your spreadsheet to include a new dependent variable that is the square root on the earring prices (use Excel’s SQRT( ) function). If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
1
6) Suppose Dr. Jung decides to use clarity (X2) and carats (X3) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
7) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must actually square the model’s estimates to convert them to price estimates.) Which sets of earring appears to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
8) Dr. Jung now also remembers that it sometimes helps to include interaction terms in a regression model—where you create a new independent variable as the product of two of the original variables. Modify your spreadsheet to include three new independent variables, X4, X5, and X6, representing interaction terms where: X4 = X1 × X2, X5 = X1 × X3, and X6 = X2 × X3. There are now six potential independent variables. If Dr. Jung wanted to build a linear regression model to estimate the square root of earring prices using the same independent variables as before, which variables would you recommend that he use? Why?
9) Suppose Dr. Jung decides to use color (X1), carats (X3) and the interaction terms X4 (color * clarity) and X5 (color * carats) as independent variables in a regression model to predict the square root of the earring prices. What is the estimated regression equation? What is the value of the R2 and adjusted-R2 statistics?
10) Use the regression equation identified in the previous question to create estimated prices for each of the earring sets in Dr. Jung’s sample. (Remember, your model estimates the square root of the earring prices. So you must square the model’s estimates to convert them to actual price estimates.) Which sets of earrings appear to be overpriced and which appear to be bargains? Based on this analysis, which set of earrings would you suggest that Dr. Jung purchase?
A psychologist is interested in constructing a $$99\%$$ confidence interval for the proportion of people who accept the theory that a person's spirit is no more than the complicated network of neurons in the brain. 68 of the 702 randomly selected people who were surveyed agreed with this theory. Round answers to 4 decimal places where possible.
a)
With $$99\%$$ confidence the proportion of all people who accept the theory that a person's spirit is no more than the complicated network of neurons in the brain is between ____ and ____.
b)
If many groups of 702 randomly selected people are surveyed, then a different confidence interval would be produced from each group. About ____ percent of these confidence intervals will contain the true population proportion of all people who accept the theory that a person’s spirit is no more than the complicated network of neurons in the brain and about ____ percent will not contain the true population proportion.
Is the gift you purchased for that special someone really appreciated? This was the question investigated in the Journal of Experimental Social Psychology (Vol. 45, 2009). Toe researchers examined the link between engagement ring price (dollars) and level of appreciation of the recipient $$\displaystyle{\left(\text{measured on a 7-point scale where}\ {1}=\ \text{"not at all" and}\ {7}=\ \text{to a great extent"}\right)}.$$ Participants for the study were those who used a popular Web site for engaged couples. The Web site's directory was searched for those with "average" American names (e.g., "John Smith," "Sara Jones"). These individuals were then invited to participate in an online survey in exchange for a \$10 gift certificate. Of the respondents, those who paid really high or really low prices for the ring were excluded, leaving a sample size of 33 respondents. a) Identify the experimental units for this study. b) What are the variables of interest? Are they quantitative or qualitative in nature? c) Describe the population of interest. d) Do you believe the sample of 33 respondents is representative of the population? Explain. e. In a second, designed study, the researchers investigated whether the link between gift price and level of appreciation was stronger for birthday gift givers than for birthday gift receivers. Toe participants were randomly assigned to play the role of gift-giver or gift-receiver. Assume that the sample consists of 50 individuals. Use a random number generator to randomly assign 25 individuals to play the gift-receiver role and 25 to play the gift-giver role.
A catalog sales company promises to deliver orders placed on the Internet within 3 days. Follow-up calls to a few randomly selected customers show that a 95% confidence interval for the proportion of all orders that arrive on time is 88
a) What does this mean? Are these conclusions correct? Explain.
b) 95% of all random samples of customers will show that 88% of orders arrive on time.
c) 95% of all random samples of customers will show that 82% to 94% of orders arrive on time.
d) We are 95% sure that between 82% and 94% of the orders placed by the sampled customers arrived on time.
e) On 95% of the days, between 82% and 94% of the orders will arrive on time.