Polya urn simulation


Let us consider an urn containing \(a\) white balls and \(b\) red balls at time \(0\). At each time step \(n\), we draw a ball from the urn, observe its color and replace it in the urn with another ball of the same color.

Let \(X_i = 1\) if the \(i^{th}\) ball is white and \(0\) otherwise, the proportion of white balls at time \(n\) then writes

\[Y_n = \frac{a+ \sum_{i=1}^n X_i}{a+b+n}\]

We are interested in \(Y_n\) when \(n\) gets large. We can prove that this proportion converges to a random variable with a Beta distribution, i.e. with density function

\[f(x) = \left\{ \begin{array}{cc} C x^{a-1}(1-x)^{b-1} & x \in [0,1] \\ 0 & otherwise \end{array} \right.\]

where \(C = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\) is the constant of normalization which makes the integral of the density equal to 1, with \(\Gamma(x) = \int_0^{+\infty} t^{x-1}e^{-t} dt \).

For \(a=b=1\), this corresponds to an uniform distribution.


The seed used to initialize the pseudorandom number generator changes at each call of the algorithm if we set the variable \(seed = 0\). If we choose an integer different from zero, this seed will be taken to initialize the pseudorandom generator. This can be useful when we want to reproduce twice the same simulation.

We set a number of trials \(n\) and a number of samples \(\omega_1, \ldots,\omega_K\) and we plot the evolution of the variable \(Y_i\) for \(i=1,\ldots, n\) for each sample.

For each \(j=1,\ldots, K\) we can observe that \(\lim_{n \rightarrow \infty} Y_n(\omega_j) = Y(\omega_j)\).

In order to visualize the convergence to the Beta distribution, we make 1000 simulations of \(Y_n\) and we plot the histogram approximating the density function of \(Y_n\).