Posted on 25 Feb 2014

The Chinese Restaurant Process is a prior for the likelihoods of clustered data. Let be the cluster assignments of users. Let be the number of clusters and let be a vector of size , indicating the probability of being assigned to cluster . If we put a Dirichlet prior on , we have:

Assuming that is a fixed parameter, the joint probability of and is:

If we integrate out , we get the marginal probability of :

Renaming as , multiplying and dividing by , and rearranging the elements so that it is visually clear, we get:

where the value of the integral is 1, since the factor inside is a Dirichlet density function. Expanding the definition of the Beta function, we get:

Note that the presence of the counts makes it impossible to factorize into the product of the individual . However, we can derive the conditional probability of one given all the other assignments:

To compute the denominator we assume cluster assignments are exchangeable, that is, the joint distribution is the same regardless the order in which clusters are assigned. This allows us to assume that is the last assignment, therefore obtaining by considering how was the equation before was assigned to cluster .

Plugging and into , we get:

Assuming that all have the same value , and since the sum of all is :

and finally exploiting the fact that :

we can reparametrize as . Then we have the standard result:

where is the number of users in cluster before the assignment of .

The Chinese Restaurant Process is the consequence of considering . For clusters where , we have:

and the probability of assigning to any of the (infinite) empty clusters is:

which written in a compact equation is: