Derivation of Hypergeometric Distribution's Expectation and Variance

从装有$N$个球，其中$K$个白球的袋子中不放回的抽取n个球，n个球中白球个数的分布即为超几何分布，wiki百科上有关于超几何分布的详尽介绍，但是没有两个重要数字特征，期望和方差的推导，本文将用两种方法对这两个数字特征进行数学推导。
Let $X$ denote the number of white balls selected when $n$ balls are chosen at random from an urn containing $N$ balls $K$ of which are white. The distribution of $X$ is Hypergeometric Distribution. You can find detail description at Wikipedia, but the derivation of Expectation and Variance is omitted. I’ll show the derivation here from two different aspects.

First, Just Do It!
Obviously,
$$\begin{align}
P\{X=i\} &= \frac{{K \choose i}{N-K \choose n-i}}{{N \choose n}} \\
N &\in \{0, 1, 2, \dots\} \\
K &\in \{0, 1, 2, \dots, N\} \\
n &\in \{0, 1, 2, \dots, N\} \\
i &\in \{max(0, n+K-N), \dots, min(n, K)\}\\ \\
E[X] &= \sum iP\{X=i\} =\frac{i{K \choose i}{N-K \choose n-i}}{{N \choose n}} \\
&= K\sum \frac{{K-1 \choose i-1}{N-K \choose n-i}}{{N \choose n}}
\end{align}$$
We find the latter part in sum is very similar to the sum of the probability of a variable obeying hypergeometric distribution(which will equal to 1). Thus we do some transformations and get,
$$\begin{align}
E[X] &= K\sum \frac{{K-1 \choose i-1}{(N-1)-(K-1) \choose (n-1)-(i-1)}}{\frac{N}{n}{N-1 \choose n-1}} \\
&= \frac{kn}{N} \sum \frac{{K-1 \choose i-1}{(N-1)-(K-1) \choose (n-1)-(i-1)}}{{N-1 \choose n-1}} \\
&= \frac{kn}{N} \\
\end{align}$$
We can do it recursively and get,
$$\begin{align}
E[X(X-1)] &= \frac{K(K-1)n(n-1)}{N(N-1)} \sum \frac{{K-2 \choose i-2}{(N-2)-(K-2) \choose (n-2)-(i-2)}}{{N-2 \choose n-2}} \\
&= \frac{K(K-1)n(n-1)}{N(N-1)} \\
Var(X) &= E[X^2] – E^2[X] = E[X(X-1)] + E[X] – E^2[X]\\
&= \frac{Kn(N-K)(N-n)}{N^2(N-1)}
\end{align}$$

It’s very time-consuming. Though hardcore, it works. Next I want to introduce a smarter way.

Second, Divide and Conquer!

Let $X_i$ to be the $ith$ ball in the N balls and $X_i = 1$ if it’s white, it’s easy to see that $P(X_i = 1) = \frac{K}{N}$ which is independent from the position $i$ and whether the balls was removed after chosen. So we get,
$$\begin{align}
E[X_i] &= \frac{K}{N}\\
Var(X_i) &= \frac{K(N-K)}{N^2}
\end{align}$$
And we can easily get the expectation,
$$ E[X] = \sum_{n}E[X_i] = \frac{Kn}{N} $$
Though all the $X_i$ obeys the same distribution, they are not independent. To calculate the variance, we have to compute their covariance first.
$$\begin{align}
Cov(X_i, X_j) &= E[X_i X_j] – E[X_i]E[X_j] \\
&= P\{X_i = 1, X_j = 1\} – \frac{K^2}{N^2}\\
&= P\{X_i = 1 | X_j = 1\}P\{X_j = 1\} – \frac{K^2}{N^2}\\
&= \frac{K-1}{N-1} \frac{K}{N} – \frac{K^2}{N^2}\\
&= -\frac{K(N-K)}{N^2(N-1)}\\
Var(X) &= \sum Var(X_i) + \sum_{i\neq j}Cov(X_i, X_j) \\
&= n\frac{K(N-K)}{N^2} – n(n-1)\frac{K(N-K)}{N^2(N-1)} \\
&= \frac{Kn(N-K)(N-n)}{N^2(N-1)}
\end{align}$$

A Little More 🙂

Since all $X_i$ obeys the same distribution, and is independent from the position $i$ and whether the balls was removed after chosen, we can also conclude that “Draw Lots” is a fair game. Regardless the order you draw the lot, you get the equal probability.

Leave a Reply Cancel reply