Why does the standard deviation change from confidence intervals to hypothesis tests? When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done.

Hugh Soto 2022-09-13 Answered
Why does the standard deviation change from confidence intervals to hypothesis tests?
When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done.
The standard deviation used for a difference of proportions in creating a confidence interval is p 1 ( 1 p 1 ) n 1 + p 2 ( 1 p 2 ) n 2
However, the standard deviation used for confidence intervals is p ( 1 p ) n 1 + p ( 1 p ) n 2 , where p = x 1 + x 2 n 1 + n 2 , x 1 = p 1 n 1 , and x 2 = p 2 n 2
What I don't understand is why these are different. They're both the standard deviation of the same proportion, so why should they differ?
You can still ask an expert for help

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

Solve your problem for the price of one coffee

  • Available 24/7
  • Math expert for every subject
  • Pay only if we can solve it
Ask Question

Answers (2)

Clarence Mills
Answered 2022-09-14 Author has 18 answers
Step 1
In hypothesis testing you are making an assumption (the null hypothesis) that p 1 = p 2 . If they are truly equal, then we can call the common parameter by p.
You need to use something as a value for p when it shows up in the standard deviation expression. If you fully follow the assumption that p 1 and p 2 are equal, then neither sample proportion is your best guess for a value of p. Rather the pooled proportion is better because it uses more individuals. If p 1 really does equal p 2 , using all n 1 + n 2 individuals would give a better estimate for p, aka p 1 , aka p 2 (by assumption). Since we are now using a sample statistic in place of a population parameter, we are working with a standard error rather than a standard deviation.
Step 2
On the other hand with a confidence interval for p 1 p 2 , we make no assumption that p 1 = p 2 ; if we did, we'd be done! p 1 p 2 = 0 and that's that. So we use the best guess we have for each pi separately.
I'm assuming that you understand standard deviation, variance, and how variance is additive in the first place to understand why the big messy square root arises in the first place.
Not exactly what you’re looking for?
Ask My Question
moidu13x8
Answered 2022-09-15 Author has 2 answers
Step 1
Below, I will use p i ^ to indicate sample proportions and p i to indicate true values (population parameters). I will use 95% intervals for demonstration.
The test of differences in proportions starts with the null hypothesis p 1 = p 2 = p. Under this assumption, p 1 p 2 is approximately normal with variance p ( 1 p ) n 1 + p ( 1 p ) n 2 and mean 0. When this is true, the 95% probability interval (the interval for which, if the null-hypothesis is true, the value p 1 ^ p 2 ^ will be within 95% of the time) is approximately
p 1 ^ p 2 ^ { 1.96 p ^ ( 1 p ^ ) n 1 + p ^ ( 1 p ^ ) n 2 , 1.96 p ^ ( 1 p ^ ) n 1 + p ^ ( 1 p ^ ) n 2 }
The alternate formula (with p 1 , p 2 rather than p), is a less efficient estimator of the standard deviation of the sample proportion difference since the entire data set is not used to estimate p and instead p 1 and p 2 are estimated separately.
On the other hand, the 95% confidence interval is the set of all potential true values of p 1 p 2 for which the a sample value less extreme than p 1 ^ p 2 ^ would be generated from the same sampling procedure at least 95% of the time. In constructing this interval, one could not make the assumption p 1 = p 2 = p since asking about the potential true values about p 1 p 2 while making an assumption about that value is meaningless.
Step 2
Without the assumption of equivalence, our best estimate of the standard deviation of the difference is the alternate formula and the approximate 95% confidence interval is given by:
p 1 p 2 { ( p 1 ^ p 2 ^ ) 1.96 p 1 ^ ( 1 p 1 ^ ) n 1 + p 2 ^ ( 1 p 2 ^ ) n 2 , ( p 1 ^ p 2 ^ ) + 1.96 p 1 ^ ( 1 p 1 ^ ) n 1 + p 2 ^ ( 1 p 2 ^ ) n 2 }
Not exactly what you’re looking for?
Ask My Question

Expert Community at Your Service

  • Live experts 24/7
  • Questions are typically answered in as fast as 30 minutes
  • Personalized clear answers
Learn more

You might be interested in

asked 2021-08-09

A clinical trial was conducted to test the effectiveness of a drug for treating insomnia in older subjects. Before​ treatment, 13 subjects had a mean wake time of 101.0 min. After​ treatment, the 13 subjects had a mean wake time of 94.6 min and a standard deviation of 24.9 min. Assume that the 13 sample values appear to be from a normally distributed population and construct a 95% confidence interval estimate of the mean wake time for a population with drug treatments. What does the result suggest about the mean wake time of 101.0 min before the​ treatment? Does the drug appear to be​ effective?
Construct the 95% confidence interval estimate of the mean wake time for a population with the treatment.
min<μ<min ​(Round to one decimal place as​ needed.)
What does the result suggest about the mean wake time of 101.0 min before the​ treatment? Does the drug appear to be​ effective?
The confidence interval ▼ does not include| includes the mean wake time of 101.0 min before the​ treatment, so the means before and after the treatment ▼ could be the same |are different. This result suggests that the drug treatment ▼ does not have | has a significant effect.

asked 2020-12-27
Consider the next 1000 98% Cis for mu that a statistical consultant will obtain for various clients. Suppose the data sets on which the intervals are based are selected independently of one another. How many of these 1000 intervals do you expect to capture the corresponding value of μ?
What isthe probability that between 970 and 990 of these intervals conta the corresponding value of ? (Hint: Let
Round your answer to four decimal places.)
‘the number among the 1000 intervals that contain What king of random variable s 2) (Use the normal approximation to the binomial distribution
asked 2021-03-09

In a study of the accuracy of fast food​ drive-through orders, Restaurant A had 298 accurate orders and 51 that were not accurate. a. Construct a 90​% confidence interval estimate of the percentage of orders that are not accurate. b. Compare the results from part​ (a) to this 90​% confidence interval for the percentage of orders that are not accurate at Restaurant​ B: 0.127<p<0.191. What do you​ conclude? a. Construct a 90​% confidence interval. Express the percentages in decimal form. ___

asked 2021-08-04
A random sample of 100 automobile owners in the state of Virginia shows that an automobile is driven on average 23,500 kilometers per year with a standard deviation of 3900 kilometers.
Assume the distribution of measurements to be approximately normal.
a) Construct a 99% confidence interval for the average number of kilometers an automobile is driven annually in Virginia.
b) What can we assert with 99% confidence about the possible size of our error if we estimate the average number of kilometers driven by car owners in Virginia to be 23,500 kilometers per year?
asked 2022-09-17
Confidence Interval. Bernoulli Distribution
I am reviewing the construction of confidence intervals for a random sample with Bernoulli distribution. The book uses the statistics of the central limit theorem that distributes N(0,1) to estimate the interval :
Z n = X 1 + X 2 + + X n n μ σ n
Why are the intervals constructed from these statistics symmetrical around the origin?
The book says: "Since it is desirable that the length of the interval be as small as possible and since the standard normal distribution is symmetrical around the origin, it turns out that the minimum length interval must also be symmetric around the origin", but I don't understand this.
asked 2021-09-15

How do you solve?
A healthcare provider monitors the number of CAT scans performed each month in each of its clinics. The most recent year of data for a particular clinics follows (the reported variable is the number of CAT scans each month expressed as the number of CAT scans per thousand members of the health plan):
2.31,2.09,2.36,1.95,1.98,2.25,2.16,2.07,1.88,1.94,1.97,2.02.
Find a two-sided 95% confidence interval for the standard deviation. What should you do to address any reservations about this confidence interval?
Round your answers to two decimal places (e.g. 98.76).
σ

asked 2021-03-05
The marks of DMT students results in June 2020 sessional examinations were normally disyributed with a mean pass mark of 9 and a standard deviation pass mark of 0.15. After moderation, a sample of 30 papers was selected to see if the mean pass mark had changed. The mean pass mark of the sample was 8.95. a) Find the 95% confidence interval of students mean mark. b) Calculate for the critical regions of the 95% confidence intervals. c) Using your results in "a" and "b" above, is there evidence of a change in the mean pass mark of the DMT students.

New questions