# Explain why the tandart deviation would likely not be reliable measure of variability for a distribution of data that includes at least one extreme outlier.

Question
Data distributions
Explain why the tandart deviation would likely not be reliable measure of variability for a distribution of data that includes at least one extreme outlier.

2021-01-03
To analyse the standard deviation that will not be reliable based on the measure of variability that includes the data distribution with atleast one extreme outlier as follows.
The particular data set in an extreme is defined as an outlier. It is many a time that includes the statistical analysis that is being dominated.
The outliers are not present since the mean and standard deviation are different. The Standard deviation is calculated by the formula as,
$$\displaystyle\sigma=\sqrt{{\frac{{{\sum_{{{i}={1}}}^{{n}}}{\left({X}_{{i}}-\overline{{X}}\right)}^{{2}}}}{{n}}}}$$
Here, sigma is the Standard deviation. It involves the mean deviation in which the value is affected based on the presence of the outliers in which the Standard deviation is affected. It is not reliable that includes the variability measure forming the data distribution that includes the one outlier. It is measured based on the squared difference in which the single extreme value results in having the disproportionate effect.

### Relevant Questions

True or False
1.The goal of descriptive statistics is to simplify, summarize, and organize data.
2.A summary value, usually numerical, that describes a sample is called a parameter.
3.A researcher records the average age for a group of 25 preschool children selected to participate in a research study. The average age is an example of a statistic.
4.The median is the most commonly used measure of central tendency.
5.The mode is the best way to measure central tendency for data from a nominal scale of measurement.
6.A distribution of scores and a mean of 55 and a standard deviation of 4. The variance for this distribution is 16.
7.In a distribution with a mean of M = 36 and a standard deviation of SD = 8, a score of 40 would be considered an extreme value.
8.In a distribution with a mean of M = 76 and a standard deviation of SD = 7, a score of 91 would be considered an extreme value.
9.A negative correlation means that as the X values decrease, the Y values also tend to decrease.
10.The goal of a hypothesis test is to demonstrate that the patterns observed in the sample data represent real patterns in the population and are not simply due to chance or sampling error.
Anthony is working for an engineering company that is building a Ferris wheel to be used at county fairs. He wants to create an algebraic model that describes the height of a rider on the wheel in terms of time. He knows that the diameter of the wheel will be 90 feet and that the axle will be built to stand 55 feet off the ground. He also knows they plan to set the wheel to make one rotation every 60 seconds. Write at least two equations that model the height of a rider in terms of t, seconds on the ride, assuming that when t = 0, the rider is at his or her lowest possible height. Explain why both equations are accurate.
Part 2:One of Anthony's co-workers says, "Sine and cosine are basically the same thing." Anthony is not so sure, and can see things either way. Provide one piece of evidence that would confirm the co-worker's point of view. Provide one piece of evidence that would refute it. Hint: It may be helpful to consider the domain and range of different functions, as well as the relationship of each of these functions to triangles in the unit circle
The table below shows the number of people for three different race groups who were shot by police that were either armed or unarmed. These values are very close to the exact numbers. They have been changed slightly for each student to get a unique problem.
Suspect was Armed:
Black - 543
White - 1176
Hispanic - 378
Total - 2097
Suspect was unarmed:
Black - 60
White - 67
Hispanic - 38
Total - 165
Total:
Black - 603
White - 1243
Hispanic - 416
Total - 2262
Give your answer as a decimal to at least three decimal places.
a) What percent are Black?
b) What percent are Unarmed?
c) In order for two variables to be Independent of each other, the P $$(A and B) = P(A) \cdot P(B) P(A and B) = P(A) \cdot P(B).$$
This just means that the percentage of times that both things happen equals the individual percentages multiplied together (Only if they are Independent of each other).
Therefore, if a person's race is independent of whether they were killed being unarmed then the percentage of black people that are killed while being unarmed should equal the percentage of blacks times the percentage of Unarmed. Let's check this. Multiply your answer to part a (percentage of blacks) by your answer to part b (percentage of unarmed).
Remember, the previous answer is only correct if the variables are Independent.
d) Now let's get the real percent that are Black and Unarmed by using the table?
If answer c is "significantly different" than answer d, then that means that there could be a different percentage of unarmed people being shot based on race. We will check this out later in the course.
Let's compare the percentage of unarmed shot for each race.
e) What percent are White and Unarmed?
f) What percent are Hispanic and Unarmed?
If you compare answers d, e and f it shows the highest percentage of unarmed people being shot is most likely white.
Why is that?
This is because there are more white people in the United States than any other race and therefore there are likely to be more white people in the table. Since there are more white people in the table, there most likely would be more white and unarmed people shot by police than any other race. This pulls the percentage of white and unarmed up. In addition, there most likely would be more white and armed shot by police. All the percentages for white people would be higher, because there are more white people. For example, the table contains very few Hispanic people, and the percentage of people in the table that were Hispanic and unarmed is the lowest percentage.
Think of it this way. If you went to a college that was 90% female and 10% male, then females would most likely have the highest percentage of A grades. They would also most likely have the highest percentage of B, C, D and F grades
The correct way to compare is "conditional probability". Conditional probability is getting the probability of something happening, given we are dealing with just the people in a particular group.
g) What percent of blacks shot and killed by police were unarmed?
h) What percent of whites shot and killed by police were unarmed?
i) What percent of Hispanics shot and killed by police were unarmed?
You can see by the answers to part g and h, that the percentage of blacks that were unarmed and killed by police is approximately twice that of whites that were unarmed and killed by police.
j) Why do you believe this is happening?
Do a search on the internet for reasons why blacks are more likely to be killed by police. Read a few articles on the topic. Write your response using the articles as references. Give the websites used in your response. Your answer should be several sentences long with at least one website listed. This part of this problem will be graded after the due date.
What is a frequency distribution of qualitative data and why is it useful?
The presidential election is coming. Five survey companies (A, B, C, D, and E) are doing survey to forecast whether or not the Republican candidate will win the election. Each company randomly selects a sample size between 1000 and 1500 people. All of these five companies interview people over the phone during Tuesday and Wednesday. The interviewee will be asked if he or she is 18 years old or above and U.S. citizen who are registered to vote. If yes, the interviewee will be further asked: will you vote for the Republican candidate? On Thursday morning, these five companies announce their survey sample and results at the same time on the newspapers. The results show that a% (from A), b% (from B), c% (from C), d% (from D), and e% (from E) will support the Republican candidate. The margin of error is plus/minus 3% for all results. Suppose that $$\displaystyle{c}{>}{a}{>}{d}{>}{e}{>}{b}$$. When you see these results from the newspapers, can you exactly identify which result(s) is (are) not reliable and not accurate? That is, can you identify which estimation interval(s) does (do) not include the true population proportion? If you can, explain why you can, if no, explain why you cannot and what information you need to identify. Discuss and explain your reasons. You must provide your statistical analysis and reasons.
Describe a possible application of hypothesis testing to a business setting. The application you describe need not be a real one (one that has actually occurred), it can be an application that you invent.
You should describe
a) what the business setting is
b) what data you would use
c) what null hypothesis you would test and what the alternative hypothesis is.
d) what type of test you would use
e) You should explain what level of significance you would use and why.
f) You should explain how to interpret the outcome of the test: what does it tell you if you could reject or could not reject the null hypothesis?
g) Finally, you should explain the possible advantages of using hypothesis testing in this application and its possible downsides / dangers (when / how using a hypothesis test could lead to mistaken inferences or erroneous decisions).
Geographical Analysis (Oct. 2006) published a study of a new method for analyzing remote-sensing data from satellite pixels in order to identify urban land cover. The method uses a numerical measure of the distribution of gaps, or the sizes of holes, in the pixel, called lacunarity. Summary statistics for the lacunarity measurements in a sample of 100 grassland pixels are x¯=225 and s=20s=20. It is known that the mean lacunarity measurement for all grassland pixels is 220. The method will be effective in identifying land cover if the standard deviation of the measurements is 10% (or less) of the true mean (i.e., if the standard deviation is less than 22). a. Give the null and alternative hypotheses for a test to determine whether, in fact, the standard deviation of all grassland pixels is less than 22. b. A MINITAB analysis of the data is provided below. Locate and interpret the p-value of the test. Use α=.10α=.10. Test for One Standard Deviation Method Null hypothesisSigma = 22 Method Alternative hypothesisSigma = < 22 The standard method is only for the normal distribution. Statistics NStDevVariance 10020.0400 Tests
Studies indicate that drinking water supplied by some old lead-lined city piping systems may contain harmful levels of lead. Based on data presented by Karalekas and colleagues,it appears that the distribution of lead content readings for individual water specimens has mean $$.033 mg/L$$ and standard deviation $$.10 mg/L$$. Explain why it is obvious that the lead content readings are not normally distributed.
M. F. Driscoll and N. A. Weiss discussed the modeling and solution of problems concerning motel reservation networks in “An Application of Queuing Theory to Reservation Networks” (TIMS, Vol. 22, No. 5, pp. 540–546). They defined a Type 1 call to be a call from a motel’s computer terminal to the national reservation center. For a certain motel, the number, X, of Type 1 calls per hour has a Poisson distribution with parameter $$\displaystyle\lambda={1.7}$$.