T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Please use this identifier to cite or link to this item:
|Title: ||Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering|
|Authors: ||Mnih, Andriy|
|Advisor: ||Hinton, Geoffrey|
|Department: ||Computer Science|
|Keywords: ||statistical language modelling|
|Issue Date: ||31-Aug-2010|
|Abstract: ||With the increasing availability of large datasets machine learning techniques
are becoming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant.
In this thesis we introduce several models for large sparse discrete datasets. Our approach, which is based on probabilistic models that use distributed representations to alleviate the effects of data sparsity, is applied to statistical language modelling and collaborative filtering.
We introduce three probabilistic language models that represent words using learned
real-valued vectors. Two of the models are based on the Restricted Boltzmann Machine (RBM) architecture while the third one
is a simple deterministic model. We show that the deterministic model outperforms the widely used n-gram models and learns sensible word representations.
To reduce the time complexity of training and making predictions with the deterministic model,
we introduce a hierarchical version of the model, that can be exponentially faster.
The speedup is achieved by structuring the vocabulary as a tree over words and
taking advantage of this structure. We propose a simple feature-based
algorithm for automatic construction of trees over words from data and show that the
resulting models can outperform non-hierarchical neural models as well as the
best n-gram models.
We then turn our attention to collaborative filtering
and show how RBM models can be used to model the distribution of sparse
high-dimensional user rating vectors efficiently, presenting inference
and learning algorithms that scale linearly in the number of observed ratings.
We also introduce the Probabilistic Matrix Factorization model which is based
on the probabilistic formulation of the low-rank matrix approximation problem
for partially observed matrices. The two models are then extended to
allow conditioning on the identities of the rated items whether or not the
actual rating values are known. Our results on the Netflix Prize dataset show
that both RBM and PMF models outperform online SVD models.|
|Appears in Collections:||Doctoral|
Department of Computer Science - Doctoral theses
Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.