Home

Browse
Communities
& Collections

Issue Date
Author
Title
Subject

Sign on to:

My Account
authorized users

Edit Profile

Help
 Please use this identifier to cite or link to this item: http://hdl.handle.net/1807/29749
 Title: Contribution to Statistical Techniques for Identifying Differentially Expressed Genes in Microarray Data Authors: Hossain, Ahmed Advisor: Beyene, JosephWillan, Andrew R. Department: Dalla Lana School of Public Health Keywords: Micorarray gene expressilonGeneralized Logistic DistributionReceiver operating Characteristic curveFDR Issue Date: 30-Aug-2011 Abstract: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes (features or genomic biomarkers) simultaneously in one single experiment. Robust and accurate gene selection methods are required to identify differentially expressed genes across different samples for disease diagnosis or prognosis. The problem of identifying significantly differentially expressed genes can be stated as follows: Given gene expression measurements from an experiment of two (or more)conditions, find a subset of all genes having significantly different expression levels across these two (or more) conditions. Analysis of genomic data is challenging due to high dimensionality of data and low sample size. Currently several mathematical and statistical methods exist to identify significantly differentially expressed genes. The methods typically focus on gene by gene analysis within a parametric hypothesis testing framework. In this study, we propose three flexible procedures for analyzing microarray data. In the first method we propose a parametric method which is based on a flexible distribution, Generalized Logistic Distribution of Type II (GLDII), and an approximate likelihood ratio test (ALRT) is developed. Though the method considers gene-by-gene analysis, the ALRT method with distributional assumption GLDII appears to provide a favourable fit to microarray data. In the second method we propose a test statistic for testing whether area under receiver operating characteristic curve (AUC) for each gene is greater than 0.5 allowing different variances for each gene. This proposed method is computationally less intensive and can identify genes that are reasonably stable with satisfactory prediction performance. The third method is based on comparing two AUCs for a pair of genes that is designed for selecting highly correlated genes in the microarray datasets. We propose a nonparametric procedure for selecting genes with expression levels correlated with that of a seed" gene in microarray experiments. The test proposed by DeLong et al. (1988) is the conventional nonparametric procedure for comparing correlated AUCs. It uses a consistent variance estimator and relies on asymptotic normality of the AUC estimator. Our proposed method includes DeLong's variance estimation technique in comparing pair of genes and can identify genes with biologically sound implications. In this thesis, we focus on the primary step in the gene selection process, namely, the ranking of genes with respect to a statistical measure of differential expression. We assess the proposed approaches by extensive simulation studies and demonstrate the methods on real datasets. The simulation study indicates that the parametric method performs favorably well at any settings of variance, sample size and treatment effects. Importantly, the method is found less sensitive to contaminated by noise. The proposed nonparametric methods do not involve complicated formulas and do not require advanced programming skills. Again both methods can identify a large fraction of truly differentially expressed (DE) genes, especially if the data consists of large sample sizes or the presence of outliers. We conclude that the proposed methods offer good choices of analytical tools to identify DE genes for further biological and clinical analysis. URI: http://hdl.handle.net/1807/29749 Appears in Collections: Doctoral