# Why does the standard deviation change from confidence intervals to hypothesis tests? When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done.

Why does the standard deviation change from confidence intervals to hypothesis tests?
When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done.
The standard deviation used for a difference of proportions in creating a confidence interval is $\sqrt{\frac{{p}_{1}\left(1-{p}_{1}\right)}{{n}_{1}}+\frac{{p}_{2}\left(1-{p}_{2}\right)}{{n}_{2}}}$
However, the standard deviation used for confidence intervals is $\sqrt{\frac{p\left(1-p\right)}{{n}_{1}}+\frac{p\left(1-p\right)}{{n}_{2}}}$, where $p=\frac{{x}_{1}+{x}_{2}}{{n}_{1}+{n}_{2}}$, ${x}_{1}={p}_{1}{n}_{1}$, and ${x}_{2}={p}_{2}{n}_{2}$
What I don't understand is why these are different. They're both the standard deviation of the same proportion, so why should they differ?
You can still ask an expert for help

• Questions are typically answered in as fast as 30 minutes

Solve your problem for the price of one coffee

• Math expert for every subject
• Pay only if we can solve it

Clarence Mills
Step 1
In hypothesis testing you are making an assumption (the null hypothesis) that ${p}_{1}={p}_{2}$. If they are truly equal, then we can call the common parameter by p.
You need to use something as a value for p when it shows up in the standard deviation expression. If you fully follow the assumption that ${p}_{1}$ and ${p}_{2}$ are equal, then neither sample proportion is your best guess for a value of p. Rather the pooled proportion is better because it uses more individuals. If ${p}_{1}$ really does equal ${p}_{2}$, using all ${n}_{1}+{n}_{2}$ individuals would give a better estimate for p, aka ${p}_{1}$, aka ${p}_{2}$ (by assumption). Since we are now using a sample statistic in place of a population parameter, we are working with a standard error rather than a standard deviation.
Step 2
On the other hand with a confidence interval for ${p}_{1}-{p}_{2}$, we make no assumption that ${p}_{1}={p}_{2}$; if we did, we'd be done! ${p}_{1}-{p}_{2}=0$ and that's that. So we use the best guess we have for each pi separately.
I'm assuming that you understand standard deviation, variance, and how variance is additive in the first place to understand why the big messy square root arises in the first place.
###### Not exactly what you’re looking for?
moidu13x8
Step 1
Below, I will use $\stackrel{^}{{p}_{i}}$ to indicate sample proportions and ${p}_{i}$ to indicate true values (population parameters). I will use 95% intervals for demonstration.
The test of differences in proportions starts with the null hypothesis ${p}_{1}={p}_{2}=p$. Under this assumption, ${p}_{1}-{p}_{2}$ is approximately normal with variance $\frac{p\left(1-p\right)}{{n}_{1}}+\frac{p\left(1-p\right)}{{n}_{2}}$ and mean 0. When this is true, the 95% probability interval (the interval for which, if the null-hypothesis is true, the value $\stackrel{^}{{p}_{1}}-\stackrel{^}{{p}_{2}}$ will be within 95% of the time) is approximately
$\stackrel{^}{{p}_{1}}-\stackrel{^}{{p}_{2}}\in \left\{-1.96\ast \sqrt{\frac{\stackrel{^}{p}\left(1-\stackrel{^}{p}\right)}{{n}_{1}}+\frac{\stackrel{^}{p}\left(1-\stackrel{^}{p}\right)}{{n}_{2}}},1.96\ast \sqrt{\frac{\stackrel{^}{p}\left(1-\stackrel{^}{p}\right)}{{n}_{1}}+\frac{\stackrel{^}{p}\left(1-\stackrel{^}{p}\right)}{{n}_{2}}}\right\}$
The alternate formula (with ${p}_{1},{p}_{2}$ rather than p), is a less efficient estimator of the standard deviation of the sample proportion difference since the entire data set is not used to estimate p and instead ${p}_{1}$ and ${p}_{2}$ are estimated separately.
On the other hand, the 95% confidence interval is the set of all potential true values of ${p}_{1}-{p}_{2}$ for which the a sample value less extreme than $\stackrel{^}{{p}_{1}}-\stackrel{^}{{p}_{2}}$ would be generated from the same sampling procedure at least 95% of the time. In constructing this interval, one could not make the assumption ${p}_{1}={p}_{2}=p$ since asking about the potential true values about ${p}_{1}-{p}_{2}$ while making an assumption about that value is meaningless.
Step 2
Without the assumption of equivalence, our best estimate of the standard deviation of the difference is the alternate formula and the approximate 95% confidence interval is given by:
${p}_{1}-{p}_{2}\in \left\{\left(\stackrel{^}{{p}_{1}}-\stackrel{^}{{p}_{2}}\right)-1.96\ast \sqrt{\frac{\stackrel{^}{{p}_{1}}\left(1-\stackrel{^}{{p}_{1}}\right)}{{n}_{1}}+\frac{\stackrel{^}{{p}_{2}}\left(1-\stackrel{^}{{p}_{2}}\right)}{{n}_{2}}},\left(\stackrel{^}{{p}_{1}}-\stackrel{^}{{p}_{2}}\right)+1.96\ast \sqrt{\frac{\stackrel{^}{{p}_{1}}\left(1-\stackrel{^}{{p}_{1}}\right)}{{n}_{1}}+\frac{\stackrel{^}{{p}_{2}}\left(1-\stackrel{^}{{p}_{2}}\right)}{{n}_{2}}}\right\}$