Baardegem3Gw

Answered

2022-11-24

How do I know if a Binomial model is appropriate?

I have a question which is about the number of weeks out of 5 in which an event occurs. I have a frequency table with a sample of 40 - with $x=0,1,2,3,4,5$ and freq, 2,7,11,12,6,2.

I have worked out the unbiased population mean and estimate - but then I'm not sure whether binomial what I need or not? I have to decide if a bionomial model is appropriate.

I can see that the data is discrete but its not binary like "event happens" or "event does not happen". It seems relatively symmetrical - and almost normally distributed? I'm not really sure how to work this out? Is a binomial model right or not?

I have a question which is about the number of weeks out of 5 in which an event occurs. I have a frequency table with a sample of 40 - with $x=0,1,2,3,4,5$ and freq, 2,7,11,12,6,2.

I have worked out the unbiased population mean and estimate - but then I'm not sure whether binomial what I need or not? I have to decide if a bionomial model is appropriate.

I can see that the data is discrete but its not binary like "event happens" or "event does not happen". It seems relatively symmetrical - and almost normally distributed? I'm not really sure how to work this out? Is a binomial model right or not?

Answer & Explanation

Henry Arellano

Expert

2022-11-25Added 12 answers

Step 1

If this is your first chi-squared test, the clues in the comments may be a bit too sparse. Without working the problem for you, I offer the following more complete outline: (Use it along with whatever examples your text or class notes may have to offer.)

It is appropriate to try a binomial model, and obviously $n=5.$ From the given data you can find the sample mean of the 40 observations.

By looking at the PDF of Binom(5,0.495). you can find the expected counts ${E}_{i}.$ (multiply the probabilities by 40.) Your observed counts are $F=(2,7,11,12,6,2).$

Step 2

Next, you can find the chi-squared statistic $Q=\sum _{i=0}^{5}\frac{({F}_{i}-{E}_{i}{)}^{2}}{{E}_{i}},$ which is approximately distributed as $\mathsf{C}\mathsf{h}\mathsf{i}\mathsf{s}\mathsf{q}(\nu =4).$ [Ordinarily, a chi-squared test with 6 categories would have $\nu =6-1=5,$ but you have used the data to estimate parameter p, so you 'lose' a degree of freedom for that and $\nu =4.]$]

I got $Q=\mathrm{1.1815.}$ The critical value for a chi-squared test with $\nu =4$ at the 5% level is the 95th percentile $c=9.487$ of $\mathsf{C}\mathsf{h}\mathsf{i}\mathsf{s}\mathsf{q}(\nu =4).$ You can find this number in printed tables of the chi-squared distribution or using software (as with R below).

qchisq(.95, 4)

9.487729

This means that you would reject the null hypothesis that the data are consistent with $\mathsf{B}\mathsf{i}\mathsf{n}\mathsf{o}\mathsf{m}(n=5,p=0.495)$ only if $Q\ge c=\mathrm{9.487.}$

There is one remaining difficulty. The chi-squared test is usually deemed to be accurate only if all expected counts exceed 5. Your first and last about:blanks are too small. One cure for this is to combine 'categories' 0 and 1, and 'categories' 4 and 5. In each tail, combine categories by adding the two observed frequencies and adding the two expected frequencies.

You will now have four categories and $\nu =4-1-1=2$ degrees of freedom. Re-compute Q and find the new c (as below). [According to my computations, you will still not reject ${H}_{0}.]$]

qchisq(.95, 2)

[1] 5.991465

If this is your first chi-squared test, the clues in the comments may be a bit too sparse. Without working the problem for you, I offer the following more complete outline: (Use it along with whatever examples your text or class notes may have to offer.)

It is appropriate to try a binomial model, and obviously $n=5.$ From the given data you can find the sample mean of the 40 observations.

By looking at the PDF of Binom(5,0.495). you can find the expected counts ${E}_{i}.$ (multiply the probabilities by 40.) Your observed counts are $F=(2,7,11,12,6,2).$

Step 2

Next, you can find the chi-squared statistic $Q=\sum _{i=0}^{5}\frac{({F}_{i}-{E}_{i}{)}^{2}}{{E}_{i}},$ which is approximately distributed as $\mathsf{C}\mathsf{h}\mathsf{i}\mathsf{s}\mathsf{q}(\nu =4).$ [Ordinarily, a chi-squared test with 6 categories would have $\nu =6-1=5,$ but you have used the data to estimate parameter p, so you 'lose' a degree of freedom for that and $\nu =4.]$]

I got $Q=\mathrm{1.1815.}$ The critical value for a chi-squared test with $\nu =4$ at the 5% level is the 95th percentile $c=9.487$ of $\mathsf{C}\mathsf{h}\mathsf{i}\mathsf{s}\mathsf{q}(\nu =4).$ You can find this number in printed tables of the chi-squared distribution or using software (as with R below).

qchisq(.95, 4)

9.487729

This means that you would reject the null hypothesis that the data are consistent with $\mathsf{B}\mathsf{i}\mathsf{n}\mathsf{o}\mathsf{m}(n=5,p=0.495)$ only if $Q\ge c=\mathrm{9.487.}$

There is one remaining difficulty. The chi-squared test is usually deemed to be accurate only if all expected counts exceed 5. Your first and last about:blanks are too small. One cure for this is to combine 'categories' 0 and 1, and 'categories' 4 and 5. In each tail, combine categories by adding the two observed frequencies and adding the two expected frequencies.

You will now have four categories and $\nu =4-1-1=2$ degrees of freedom. Re-compute Q and find the new c (as below). [According to my computations, you will still not reject ${H}_{0}.]$]

qchisq(.95, 2)

[1] 5.991465

Most Popular Questions