Genome-wide association studies examine genetic and phenotypic variation across a large number of individuals to identify the genetic loci that are responsible for increased disease susceptibility. The cause of many complex disease syndromes involves the complex interplay of a large number of genomic variations that perturb disease-related genes in the context of a regulatory network. As patient cohorts are routinely surveyed for a large number of traits such as hundreds of clinical phenotypes and genome-wide profiling for thousands of gene expressions, association analysis of such large and complex datasets raises new computational challenges in identifying epistatic and pleitropic interactions among genomes, transcriptomes, and phenomes.
In this talk, I will present a new framework, called structured genome-transcriptome-phenome association analysis, that goes beyond the conventional approach of examining the correlation between a single genetic marker and a single trait and leverages various types of structural information in genomes, transcriptomes, and phenomes for an effective detection of association signals. I will discuss several algorithms within this framework based on sparse regression methods with penalty functions that encourage structured sparsity in the estimated regression parameters. I will discuss an efficient learning algorithm for these methods that allows for analysis of genome-wide datasets. Our results show that the new methods can discover more effectively weak association signals with few false positives, compared to other previous methods.