I am testing out a Binomial distributed dataset in excel.The dataset is litterally a "RANDBETWEEN(1;2)" So it simply randomizes between the number 1 and the number 2 with 50% chance of each, on a range of 10.000 cells.The standard deviation for 10.000 = n and a probability of success of 50% = pI get a Std Dev of: 50&nbsp;What I don't understand is why the spread of the dataset is way larger than 50. often times it is even more than 200 (as opposed to the Std Dev of 50). Actually by continually refreshing all the 10.000 cells, the spread is surprisingly often above 150, (3x standard deviations). Just about every 1/3 times I refresh the data is goes above 150 spread.By my understanding, a bigger spread than 3x Standard Deviation should occur very rarely (0,03% of the time) every 333rd time of refreshing the data in excel. Or am I wrong here?

Question

I am testing out a Binomial distributed dataset in excel.The dataset is litterally a &quot;RANDBETWEEN(1;2)&quot; So it simply randomizes between the number 1 and the number 2 with 50% chance of each, on a range of 10.000 cells.The standard deviation for 10.000 = n and a probability of success of 50% = pI get a Std Dev of: 50&amp;nbsp;What I don&#039;t understand is why the spread of the dataset is way larger than 50. often times it is even more than 200 (as opposed to the Std Dev of 50). Actually by continually refreshing all the 10.000 cells, the spread is surprisingly often above 150, (3x standard deviations). Just about every 1/3 times I refresh the data is goes above 150 spread.By my understanding, a bigger spread than 3x Standard Deviation should occur very rarely (0,03% of the time) every 333rd time of refreshing the data in excel. Or am I wrong here?

alenahelenash · Accepted Answer

The standard deviation of a binomial distribution with parameters n and p is given by np(1−p). In this case, n = 10,000 and p = 0.5, so the standard deviation should be 10,000·0.5·0.5=50.It is true that the spread of the dataset should be mostly within 3 standard deviations of the mean. In this case, the mean is also 0.5 since the probability of success is 0.5. Therefore, the spread should mostly be within the range of 0 to 1.5. However, it is important to note that this is a theoretical expectation and there can be variations in the actual results due to randomness.In your case, it is possible that the spread is larger than 150 due to random chance. When you refresh the data, you are essentially generating a new random sample. Depending on the sample, you may get a larger spread than expected. However, if you continue to refresh the data and collect more samples, the overall distribution of spreads should still be mostly within the range of 0 to 150.It is also possible that there may be errors in the way you are calculating the standard deviation or measuring the spread of the dataset. Double-checking your calculations and methodology may help identify any errors or discrepancies.

I am testing out a Binomial distributed

Answered question

Answer & Explanation

New Questions in High school statistics