powerinojSs

Answered

2022-11-23

If randomly picked from m symbols, each symbol being equally likely, how long on average would it take to pick m/2 different symbols?

Lets take the specific example of the English alphabet. In this case, I have 26 letters (symbols). Lets say these letters are plastic toys in a box. At random, I take one toy letter from the box, and write it down on a piece of paper. I then put that toy letter back into the box, and repeat the process. How many times, on average, would I have to repeat this process for my piece of paper to contain 13 of the 26 letters of the alphabet?

Is there a way to generalize the answer to m symbols instead of 26? What happens when m is odd?

Lets take the specific example of the English alphabet. In this case, I have 26 letters (symbols). Lets say these letters are plastic toys in a box. At random, I take one toy letter from the box, and write it down on a piece of paper. I then put that toy letter back into the box, and repeat the process. How many times, on average, would I have to repeat this process for my piece of paper to contain 13 of the 26 letters of the alphabet?

Is there a way to generalize the answer to m symbols instead of 26? What happens when m is odd?

Answer & Explanation

Shiloh Davenport

Expert

2022-11-24Added 9 answers

Step 1

The first pick from the box of 26 is guaranteed to produce a letter that you haven't drawn already. To get the second such letter requires on average $\frac{26}{25}$ picks — close to one more, but a little bit more than that because you have a small chance of drawing the same letter as you did the first time. To get the third distinct letter requires on average $\frac{26}{24}$ picks, and so on.

To pick at least k different elements from a box containing n different elements one expects to need a total of

1$1+\frac{n}{n-1}+\frac{n}{n-2}+\dots +\frac{n}{n-k+1}=n\sum _{i=1}^{k}\frac{1}{n-i+1}$ picks.

Step 2

For your example of $n=26,k=13$, this is approximately 17.5 picks.

You wanted to know what happens when $k=\frac{n}{2}$. When n is not too small the number of picks required to get about $\frac{n}{2}$ different items can be shown to be approximately

0.69n

where $0.69\approx \mathrm{log}2$. (To show this, write the formula above as

$n\sum _{i=1}^{k}\frac{1}{n-i+1}=n({H}_{n}-{H}_{n-k})$

where ${H}_{i}$ are the harmonic numbers, and use the approximation ${H}_{n}\approx \mathrm{log}n+\gamma $)

The first pick from the box of 26 is guaranteed to produce a letter that you haven't drawn already. To get the second such letter requires on average $\frac{26}{25}$ picks — close to one more, but a little bit more than that because you have a small chance of drawing the same letter as you did the first time. To get the third distinct letter requires on average $\frac{26}{24}$ picks, and so on.

To pick at least k different elements from a box containing n different elements one expects to need a total of

1$1+\frac{n}{n-1}+\frac{n}{n-2}+\dots +\frac{n}{n-k+1}=n\sum _{i=1}^{k}\frac{1}{n-i+1}$ picks.

Step 2

For your example of $n=26,k=13$, this is approximately 17.5 picks.

You wanted to know what happens when $k=\frac{n}{2}$. When n is not too small the number of picks required to get about $\frac{n}{2}$ different items can be shown to be approximately

0.69n

where $0.69\approx \mathrm{log}2$. (To show this, write the formula above as

$n\sum _{i=1}^{k}\frac{1}{n-i+1}=n({H}_{n}-{H}_{n-k})$

where ${H}_{i}$ are the harmonic numbers, and use the approximation ${H}_{n}\approx \mathrm{log}n+\gamma $)

Most Popular Questions