test Browse by Author Names Browse by Titles of Works Browse by Subjects of Works Browse by Issue Dates of Works
       

Advanced Search
Home   
 
Browse   
Communities
& Collections
  
Issue Date   
Author   
Title   
Subject   
 
Sign on to:   
Receive email
updates
  
My Account
authorized users
  
Edit Profile   
 
Help   
About T-Space   

T-Space at The University of Toronto Libraries >
School of Graduate Studies - Theses >
Doctoral >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/29749

Title: Contribution to Statistical Techniques for Identifying Differentially Expressed Genes in Microarray Data
Authors: Hossain, Ahmed
Advisor: Beyene, Joseph
Willan, Andrew R.
Department: Dalla Lana School of Public Health
Keywords: Micorarray gene expressilon
Generalized Logistic Distribution
Receiver operating Characteristic curve
FDR
Issue Date: 30-Aug-2011
Abstract: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes (features or genomic biomarkers) simultaneously in one single experiment. Robust and accurate gene selection methods are required to identify differentially expressed genes across different samples for disease diagnosis or prognosis. The problem of identifying significantly differentially expressed genes can be stated as follows: Given gene expression measurements from an experiment of two (or more)conditions, find a subset of all genes having significantly different expression levels across these two (or more) conditions. Analysis of genomic data is challenging due to high dimensionality of data and low sample size. Currently several mathematical and statistical methods exist to identify significantly differentially expressed genes. The methods typically focus on gene by gene analysis within a parametric hypothesis testing framework. In this study, we propose three flexible procedures for analyzing microarray data. In the first method we propose a parametric method which is based on a flexible distribution, Generalized Logistic Distribution of Type II (GLDII), and an approximate likelihood ratio test (ALRT) is developed. Though the method considers gene-by-gene analysis, the ALRT method with distributional assumption GLDII appears to provide a favourable fit to microarray data. In the second method we propose a test statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is greater than 0.5 allowing different variances for each gene. This proposed method is computationally less intensive and can identify genes that are reasonably stable with satisfactory prediction performance. The third method is based on comparing two AUCs for a pair of genes that is designed for selecting highly correlated genes in the microarray datasets. We propose a nonparametric procedure for selecting genes with expression levels correlated with that of a ``seed" gene in microarray experiments. The test proposed by DeLong et al. (1988) is the conventional nonparametric procedure for comparing correlated AUCs. It uses a consistent variance estimator and relies on asymptotic normality of the AUC estimator. Our proposed method includes DeLong's variance estimation technique in comparing pair of genes and can identify genes with biologically sound implications. In this thesis, we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. We assess the proposed approaches by extensive simulation studies and demonstrate the methods on real datasets. The simulation study indicates that the parametric method performs favorably well at any settings of variance, sample size and treatment effects. Importantly, the method is found less sensitive to contaminated by noise. The proposed nonparametric methods do not involve complicated formulas and do not require advanced programming skills. Again both methods can identify a large fraction of truly differentially expressed (DE) genes, especially if the data consists of large sample sizes or the presence of outliers. We conclude that the proposed methods offer good choices of analytical tools to identify DE genes for further biological and clinical analysis.
URI: http://hdl.handle.net/1807/29749
Appears in Collections:Doctoral

Files in This Item:

File Description SizeFormat
Hossain_Ahmed_201111_PhD_thesis.pdf.pdf2.39 MBAdobe PDF
View/Open

Items in T-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

uoft