To Join Via Zoom: To join this seminar, please request Zoom connection details from headsec [at] stat.ubc.ca.
Abstract: A useful step in data analysis is clustering, in which observations are grouped together in a hopefully meaningful way. The mainstay model for Bayesian Nonparametric clustering is the Dirichlet Process Mixture Model, which has one key advantage of inferring the number of clusters automatically. However, the Dirichlet Process Mixture Model is not perfect, and there is further research to be done into other Bayesian Nonparametric models that address the weaknesses of the Dirichlet Process Mixture Model while maintaining automatic inference of the number of clusters.
In this thesis, we introduce the Neutral-to-the-Left Mixture Model, a family of Bayesian Nonparametric infinite mixture models which serves as a strict generalization of the Dirichlet Process Mixture Model. This family of mixture models has two key parameters: the distribution of arrival times of new clusters, and the parameters of the distribution of the stick breaking representation of this model, whose customization allows the analyst to inject prior beliefs regarding the structure of the clusters into the model. We describe sampling algorithms to infer the posterior distribution of clusterings given data for the model. We consider one particular parameterization of the Neutral-to-the-Left Mixture Model with characteristics that are distinct from the Dirichlet Process Mixture Model, evaluate its performance on simulated data, and compare these to results from a Dirichlet Process Mixture Model. Finally, we apply one parameterization of the Neutral-to-the-Left Mixture Model to cluster Twitter datasets to reveal temporal evolution of tweets.