test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works

Advanced Search
& Collections
Issue Date   
Sign on to:   
Receive email
My Account
authorized users
Edit Profile   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/19230

Title: Machine Learning in Computational Biology: Models of Alternative Splicing
Authors: Shai, Ofer
Advisor: Frey, Brendan J.
Department: Electrical and Computer Engineering
Keywords: Machine Learning
Graphical Models
Computational Biology
Alternative Splicing
Issue Date: 3-Mar-2010
Abstract: Alternative splicing, the process by which a single gene may code for similar but different proteins, is an important process in biology, linked to development, cellular differentiation, genetic diseases, and more. Genome-wide analysis of alternative splicing patterns and regulation has been recently made possible due to new high throughput techniques for monitoring gene expression and genomic sequencing. This thesis introduces two algorithms for alternative splicing analysis based on large microarray and genomic sequence data. The algorithms, based on generative probabilistic models that capture structure and patterns in the data, are used to study global properties of alternative splicing. In the first part of the thesis, a microarray platform for monitoring alternative splicing is introduced. A spatial noise removal algorithm that removes artifacts and improves data fidelity is presented. The GenASAP algorithm (generative model for alternative splicing array platform) models the non-linear process in which targeted molecules bind to a microarray’s probes and is used to predict patterns of alternative splicing. Two versions of GenASAP have been developed. The first uses variational approximation to infer the relative amounts of the targeted molecules, while the second incorporates a more accurate noise and generative model and utilizes Markov chain Monte Carlo (MCMC) sampling. GenASAP, the first method to provide quantitative predictions of alternative splicing patterns on large scale data sets, is shown to generate useful and precise predictions based on independent RT-PCR validation (a slow but more accurate approach to measuring cellular expression patterns). In the second part of the thesis, the results obtained by GenASAP are analysed to reveal jointly regulated genes. The sequences of the genes are examined for potential regulatory factors binding sites using a new motif finding algorithm designed for this purpose. The motif finding algorithm, called GenBITES (generative model for binding sites) uses a fully Bayesian generative model for sequences, and the MCMC approach used for inference in the model includes moves that can efficiently create or delete motifs, and extend or contract the width of existing motifs. GenBITES has been applied to several synthetic and real data sets, and is shown to be highly competitive at a task for which many algorithms already exist. Although developed to analyze alternative splicing data, GenBITES outperforms most reported results on a benchmark data set based on transcription data.
URI: http://hdl.handle.net/1807/19230
Appears in Collections:Doctoral
The Edward S. Rogers Sr. Department of Electrical & Computer Engineering - Doctoral theses

Files in This Item:

File Description SizeFormat
Shai_Ofer_200911_PhD_thesis.pdf3.39 MBAdobe PDF

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.