powerinojSs

2022-11-23

If randomly picked from m symbols, each symbol being equally likely, how long on average would it take to pick m/2 different symbols?
Lets take the specific example of the English alphabet. In this case, I have 26 letters (symbols). Lets say these letters are plastic toys in a box. At random, I take one toy letter from the box, and write it down on a piece of paper. I then put that toy letter back into the box, and repeat the process. How many times, on average, would I have to repeat this process for my piece of paper to contain 13 of the 26 letters of the alphabet?
Is there a way to generalize the answer to m symbols instead of 26? What happens when m is odd?

Shiloh Davenport

Expert

Step 1
The first pick from the box of 26 is guaranteed to produce a letter that you haven't drawn already. To get the second such letter requires on average $\frac{26}{25}$ picks — close to one more, but a little bit more than that because you have a small chance of drawing the same letter as you did the first time. To get the third distinct letter requires on average $\frac{26}{24}$ picks, and so on.
To pick at least k different elements from a box containing n different elements one expects to need a total of
1$1+\frac{n}{n-1}+\frac{n}{n-2}+\dots +\frac{n}{n-k+1}=n\sum _{i=1}^{k}\frac{1}{n-i+1}$ picks.
Step 2
For your example of $n=26,k=13$, this is approximately 17.5 picks.
You wanted to know what happens when $k=\frac{n}{2}$. When n is not too small the number of picks required to get about $\frac{n}{2}$ different items can be shown to be approximately
0.69n
where $0.69\approx \mathrm{log}2$. (To show this, write the formula above as
$n\sum _{i=1}^{k}\frac{1}{n-i+1}=n\left({H}_{n}-{H}_{n-k}\right)$
where ${H}_{i}$ are the harmonic numbers, and use the approximation ${H}_{n}\approx \mathrm{log}n+\gamma$)

Do you have a similar question?