test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works

Advanced Search
& Collections
Issue Date   
Sign on to:   
Receive email
My Account
authorized users
Edit Profile   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/29820

Title: Computational Prediction of Gene Function From High-throughput Data Sources
Authors: Mostafavi, Sara
Advisor: Morris, Quaid
Department: Computer Science
Keywords: Computational Biology
Machine Learning
Predicting Gene Function
Biological Networks
Combining High-Throughput Data Sources
Issue Date: 31-Aug-2011
Abstract: A large number and variety of genome-wide genomics and proteomics datasets are now available for model organisms. Each dataset on its own presents a distinct but noisy view of cellular state. However, collectively, these datasets embody a more comprehensive view of cell function. This motivates the prediction of function for uncharacterized genes by combining multiple datasets, in order to exploit the associations between such genes and genes of known function--all in a query-specific fashion. Commonly, heterogeneous datasets are represented as networks in order to facilitate their combination. Here, I show that it is possible to accurately predict gene function in seconds by combining multiple large-scale networks. This facilitates function prediction on-demand, allowing users to take advantage of the persistent improvement and proliferation of genomics and proteomics datasets and continuously make up-to-date predictions for large genomes such as humans. Our algorithm, GeneMANIA, uses constrained linear regression to combine multiple association networks and uses label propagation to make predictions from the combined network. I introduce extensions that result in improved predictions when the number of labeled examples for training is limited, or when an ontological structure describing a hierarchy of gene function categorization scheme is available. Further, motivated by our empirical observations on predicting node labels for general networks, I propose a new label propagation algorithm that exploits common properties of real-world networks to increase both the speed and accuracy of our predictions.
URI: http://hdl.handle.net/1807/29820
Appears in Collections:Doctoral

Files in This Item:

File Description SizeFormat
Mostafavi_Sara_201107_PhD_thesis.pdf3.02 MBAdobe PDF

This item is licensed under a Creative Commons License
Creative Commons

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.