Say we have $m$ buckets. We select a random bucket and put a ball in it, we repeat this (select-put) $n$ times. In the end what is the probability of having at least one bucket with $k$ balls?My initial attempts to solve it failed, and I posted the question on math.stackexchange. An approximation for large values of $m$ and $n$ was posted as an answer but I was interested in exact results. Once I realized I had no way of finding someone else's solution, I was able to solve it myself.

## Simulation

First, to check if my solution is correct, I have written a small program to simulate the select-put procedure million times and report the experimental probability for different values of $m$,$n$ and $k$. If you are interested, you can find the java code at the end of my post.## Solution

First we will start with writing the probability of 1 box having $k$ balls. Then using inclusion-exclusion principle we will generalize. Let's write the probability of putting first $k$ balls into the first box and remaining balls to the other boxes: $$ \left(\frac1{m}\right)^k \left(1-\frac1{m}\right)^{n-k} $$ We have $\binom{n}{k}$ ways of putting $k$ balls into a particular box (e.g. first box), and we have $m$ such boxes. So the probability becomes: $$ m \binom{n}{k} \left(\frac1{m}\right)^k \left(1-\frac1{m}\right)^{n-k} $$ But the expression above suffers from*double counting*, i.e. it counts configurations that contains multiple boxes with $k$ balls more than once. So we need to subtract the value for 2 boxes having $k$ balls, add the value for 3 boxes having $k$ balls, etc... Let's write the probability of putting first $k$ balls into the 1st box, next $k$ balls into the 2nd box, and remaining balls to the other boxes. $$ \left(\frac1{m}\right)^k \left(\frac1{m}\right)^k \left(1-\frac1{m}\right)^{n-2k} $$ We have $\binom{n}{k}\binom{n-k}{k}$ ways of putting $2k$ balls into two particular boxes (e.g. first and second box), and we have $\binom{m}{2}$ such boxes. So the probability becomes: $$ \binom{m}{2}\binom{n}{k}\binom{n-k}{k}\left(\frac1{m}\right)^k \left(\frac1{m}\right)^k \left(1-\frac1{m}\right)^{n-2k} $$ At this point you should be able to see the pattern that is emerging.

## Generalization

If we put together and generalize what we have done so far we get the following expression as probability of having a box with $k$ balls: $$ \sum^{\lfloor \frac{n}{k} \rfloor}_{i=1} (-1)^{i+1} \binom{m}{i} \left[ \prod^{i-1}_{j=0} \binom{n-jk}{k} \right] \left( \frac1{m} \right)^{ik} \left( \frac{m-i}{m} \right)^{n-ik} $$ The powers of $-1$ at the beginning enables adding values for odd number of balls and subtracting values for even number of balls, i.e. inclusion-exclusion principle.## The code

```
private static double calculateKcollisions(int m, int n, int k) {
double probability = 0;
int sign = 1;
for (int i = 1; n - i * k >= 0 && i <= m; i++) {
double p = binomial(m, i) * pow(1.0 / m, i * k);
p *= pow((double) (m - i) / m, n - i * k);
for (int j = 0; j < i; j++) {
p *= binomial(n - j * k, k);
}
probability += sign * p;
sign *= -1;
}
return probability;
}
```

## Simulation code

```
private static final Random RNG = new Random();
public static void main(String[] args) {
int sampleCount = 1000000;
for (int experimentNo = 0; experimentNo < 10; experimentNo++) {
int m = RNG.nextInt(7) + 3;
int n = RNG.nextInt(m) + RNG.nextInt(10) + 1;
int k = RNG.nextInt(n + 1);
System.out.println(format("m:%d n:%d k:%d", m, n, k));
int successCount = 0;
for (int sampleNo = 0; sampleNo < sampleCount; sampleNo++) {
int[] boxes = new int[m];
for (int i = 0; i < n; i++)
boxes[RNG.nextInt(m)]++;
for (int i = 0; i < m; i++) {
if (boxes[i] == k) {
successCount++;
break;
}
}
}
double experimental = (double) successCount / sampleCount;
double calculated = calculateKcollisions(m, n, k);
System.out.println(format("xperimental: %f calculated: %f", experimental, calculated));
System.out.println(format("diff: %.2f\n", abs(experimental - calculated)));
System.out.println();
}
}
```

## No comments:

## Post a Comment