Home

Browse
Communities
& Collections

Issue Date
Author
Title
Subject

Sign on to:

My Account
authorized users

Edit Profile

Help
 Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/32794

 Title: Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping Arrays Authors: Kowgier, Matthew Advisor: Kustra, Rafal Department: Dalla Lana School of Public Health Keywords: DNA copy numberBayesian HMMSNP array Issue Date: 31-Aug-2012 Abstract: DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays. A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead to biologically implausible results. Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally, the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data. URI: http://hdl.handle.net/1807/32794 Appears in Collections: Doctoral

Files in This Item:

File Description SizeFormat