Step 1
Introduction:
The formula for the confidence interval for a population proportion, \(\displaystyle\pi\) is shown below, where it is assumed that the sample proportion observed from a sample of size n is observed to be p, and the level of confidence is \(\displaystyle{100}{\left({1}\ –\ \alpha\right)}\%\), so that the upper \(\displaystyle\alpha\)-point for the standard normal distribution used in this case is \(\displaystyle{z}_{{\frac{\alpha}{{2}}}}\).
When \(\displaystyle\pi\) is known or assumed:
\(\displaystyle{\left({p}-{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{\pi{\left({1}-\pi\right)}}}{{{n}}}}}},{p}+{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{\pi{\left({1}-\pi\right)}}}{{{n}}}}}}\right)}.\)
When \(\displaystyle\pi\) is unknown and not assumed:
\(\displaystyle{\left({p}-{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{{p}{\left({1}-{p}\right)}}}{{{n}}}}}},{p}+{z}_{{\frac{\alpha}{{2}}}}\sqrt{{{\frac{{{p}{\left({1}-{p}\right)}}}{{{n}}}}}}\right)}.\)
The confidence interval gives an interval estimate of the parameter of interest. Calculation of the confidence interval for a population proportion of a characteristic of interest includes the following quantities:
Point estimate, that is, sample proportion observed,
Size of the sample collected,
Level of confidence desired.
Its width depends mainly upon the following characteristics of the analysis:
Level of confidence: Higher the desired level of confidence, wider would be the confidence interval for a given sample size and variability.
Sample size: Larger the sample size is, smaller is the width of the confidence interval at a given level of confidence.
Variability: More the variability in the data, wider would be the confidence interval for a given sample size and level of confidence.
Step 2
Discussion:
In this case, the margin of error is given as 3%. In this context, it means that each of the 5 companies added and subtracted 3% from their respective point estimates (a%, b%, c%, d%, and e% respectively).
Observe that, the sample sizes used by the companies are not the same- the sizes vary between 1,000 and 1,500. There may be two possibilities if each company uses 3% as the margin of error:
All the companies use the same level of confidence and assume the same value of π. In this case, their chosen sample sizes should affect the width of the interval, and hence, the margin of error. The companies ignore their level of confidence, sample size and assumed value of π while choosing 3% as their margin of error.
All the companies manipulate either one, or all of the level of confidence, sample size and assumed value of π, so that each can achieve 3% as the margin of error.
The companies use p instead of π to calculate the confidence interval, all assume the same confidence level and different sample sizes, but the margin of error turns out to be 3% for all.
The problem in the first two cases is that, the confidence intervals are not comparable, and even if they are compared, the level of accuracy of such comparison is questionable.
Assume that the third case is true here. Then, at a glance, it would appear that the companies with the highest and lowest point estimate values, that is, Companies C and B, are the most likely to be unreliable, because they are the ones producing extreme estimates.
However, one should not jump to a conclusion just with this information.
It is necessary to subtract and add 3% to each of the 5 point estimates or percentages, to obtain the 5 intervals. The intervals (in percentages) would be as follows:
Company A: \(\displaystyle{\left({a}-{3},{a}+{3}\right)}\)
Company B: \(\displaystyle{\left({b}-{3},{b}+{3}\right)}\)
Company C: \(\displaystyle{\left({c}-{3},{c}+{3}\right)}\)
Company D: \(\displaystyle{\left({d}-{3},{d}+{3}\right)}\)
Company E: \(\displaystyle{\left({e}-{3},{e}+{3}\right)}\)
If the estimates of all the companies are to be reliable, then each of the above intervals must greatly overlap with all the others. In that case, it would not at all be possible to say which estimate is unreliable, which is not.
If one or both of the extreme companies (C or B) is/are such that their intervals are completely detached from, or only slightly overlapping with the other intervals, then that company/companies can be considered as the most likely to be unreliable.
Another possibility is that, the first 2 (or 3) companies with the highest point estimates (C and A, or, C, A, and D) have highly overlapping intervals, while the remaining 3 (or 2) companies with the lowest point estimates (D, E, and B, or, E and B) have highly overlapping intervals among themselves, but the two groups overlap very slightly or not at all. In that case, again, it would be difficult to identify which estimates are reliable, which are not.
Note that, even if some idea can be formed about the most unreliable/inaccurate estimates, it would be in terms of which company (or companies) are the most likely to be unreliable, it is not possible to exactly identify such companies.