Active Centers of Excellence in Genomic Science Awards
Center for Personal Dynamic Regulomes
Despite the rapidly increasing capacity to sequence human genomes, our incomplete ability to read and interpret the information content in genomes and epigenomes remain a central challenge. A comprehensive set of regulatory events across a genome - the regulome - is needed to make full use of genomic information, but is currently out of reach for practically all clinical applications and many biological systems. The proposed Center will develop technologies that greatly increase the sensitivity, speed, and comprehensiveness of understanding genome regulation. We will develop new technologies to interrogate the transactions between the genome and regulatory factors, such as proteins and noncoding RNAs, and integrate variations in DNA sequences and chromatin states over time and across individuals. Novel molecular engineering and biosensor strategies are deployed to encapsulate the desired complex DNA transformations into the probe system, such that the probe system can be directly used on very small human clinical samples and capture genome-wide information in one or two steps. These technologies will be applied to clinical samples and workflows in real time to exercise their robustness and reveal for the first time epigenomic dynamics of human diseases during progression and treatment. These technologies will be broadly applicable to many biomedical investigations, and the Center will disseminate the technologies via training and diverse means.
Causal Transcriptional Consequences of Human Genetic Variation
George M. Church
Harvard University, Cambridge, Mass.
The Center for Transcriptional Consequences of Human Genetic Variation (CTCHGV) will develop innovative and powerful genetic engineering methods and use them to identify genetic variations that causally control gene transcription levels. Genome Wide Association Studies (GWAS) find many variations associated with disease and other phenotypes, but the variations that may actually cause these conditions are hard to identify because nearby variations in the same haplotype blocks consistently co-occur with them in human populations, so that specifically causative ones cannot be distinguished. About 95% of GWAS variations are not in gene coding regions, and many of these presumably associate with altered gene expression levels. CTCHGV will identify the variations that directly control gene expression by engineering precise combinations of changes to gene regulatory regions that break down the haplotype blocks, allowing each variations' effect on gene expression to be discerned independently of the others. To perform this analysis, CTCHGV will extract ~100kbps gene regulatory regions from human cell samples, create precise variations in them in E. coli, and re-introduce the altered regions back into human cells, using zinc finger nucleases (ZFNs) to efficiently induce recombination. CTCHGV will target 1000 genes for this analysis (Aim 1), and will use human induced Pluripotent Stem cells (iPS) to study the effects of variations in diverse human cell types (Aim 2). To explore the effects of variations in complex human tissues, CTCHGV will develop methods of measuring gene expression at transcriptome-wide levels in many single cells, including in situ in structured tissues (Aim 3). Finally, CTCHGV will develop novel advanced technologies that integrate DNA sequencing and synthesis to construct thousands of large DNA constructs from oligonucleotides, that enable very precise targeting and highly efficient performance of ZFNs, and that enable cells to be sorted on the basis of morphology as well as fluorescence and labeling (Aim 4). CTCHGV will also develop direct oligo-mediated engineering of human cells, and create "marked allele" iPS that will enable easy ascertainment of complete exon distributions for many pairs of gene alleles in many cell types.
Center for the Epigenetics of Common Human Disease
Andrew P. Feinberg
Johns Hopkins University, Baltimore
(co-funded by National Institute of Mental Health)
Epigenetics, the study of non-DNA sequence-related heredity, is at the epicenter of modern medicine because it can help to explain the relationship between an individual's genetic background, the environment, aging, and disease. The Center for the Epigenetics of Common Human Disease was created in 2004 to begin to develop the interface between epigenetics and epidemiologic-based phenotype studies, recognizing that epigenetics requires new ways of thinking about disease. We created a highly interdisciplinary group of faculty and trainees, including molecular biologists, biostatisticians, epidemiologists, and clinical investigators. We developed novel approaches to genome-wide DNA methylation (DNAm) analysis, allele-specific expression, and new statistical epigenetic tools. Using these tools, we discovered that most variable DNAm is in neither CpG islands nor promoters, but in what we term "CpG island shores," regions of lower CpG density up to several kb from islands, and we have found altered DNAm in these regions in cancer, depression and autism. In the renewal period, we will develop the novel field of epigenetic epidemiology, the relationship between epigenetic variation, genetic variation, environment and phenotype. We will continue to pioneer genome-wide epigenetic technology that is cost effective for large scale analysis of population-based samples, applying our knowledge from the current period to second-generation sequencing for epigenetic measurement, including DNAm and allele-specific methylation. We will continue to pioneer new statistical approaches for quantitative and binary DNAm assessment in populations, including an Epigenetic Barcode. We will develop Foundational Epigenetic Epidemiology, examining: time-dependence, heritability and environmental relationship of epigenetic marks; heritability in MZ and DZ twins; and develop an epigenetic transmission disequilibrium test. We will then pioneer Etiologic Epigenetic Epidemiology, by integrating novel genome-wide methylation scans (GWMs) with existing Genome-Wide Association Study (GWAS) and epidemiologic phenotype data, a design we term Genome-Wide Integrated Susceptibility (GWIS), focusing on bipolar disorder, aging, and autism as paradigms for epigenetic studies of family-based samples, longitudinal analyses, and parent-of-origin effects, respectively. This work will be critical to realizing the full value of previous genetic and phenotypic studies, by developing and applying molecular and statistical tools necessary to integrate DNA sequence with epigenetic and environmental causes of disease.
Center Web Site: Center of Excellence in Genomic Science at Johns Hopkins
Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID)
Isaac S. Kohane
Harvard Medical School
As a result of the accelerated pace of development of technologies for characterizing the human genome, the rate-limiting step for large scale genomic investigation in clinical populations is now phenotyping. This is particularly the case for neuropsychiatric (NP) illness, where phenotypes are complex, biomarkers are lacking, and the primary cell types of interest are difficult to access directly. It has become apparent that both rare and common genetic variation contributes to disease risk and that this risk crosses traditional diagnostic boundaries in psychiatry. Taking advantage of a large, already-established NP biobank could dramatically accelerate progress toward understanding the cross-disorder mechanism of action of disease liability genes. This study proposes novel applications of emerging technologies in informatics and cellular neurobiology to eliminate this phenotyping bottleneck. In doing so, it will accelerate investigation of clinical and cellular phenotypes for understanding single and multilocus/polygenic associations. Aim 1: Adapt and expand one of the largest NP cellular biobanks by parsing electronic health records with gold-standard assessment of cognition and other RDoC phenotypes. Aim 2: Define the genome-wide multidimensional functional genomics (MFG) landscape in NP disease into which the transcriptomic signature (RNA-seq) of each induced neuron (IN) representing a clinically characterized individual is projected. The projection provides the mapping from molecular to phenotypic characterization and a directionality towards healthful/neurotypical states used in Aim 3. Aim 3: Develop a probabilistic model of gene expression dependencies that will predict which small molecular perturbations are likely to shift the IN transcriptomic signature in a healthful direction in the MFG and to then update the model based on measured perturbations in the MFG. Aim 4: Select patient samples to study in greater detail for epigenetic (DNA methylation, histone marks and RNA editing) and transcriptional control particularly with regard to activity dependent changes that have been implicated in many NP diseases. Aim 5: Here we assess just how well the clinical phenotypes are informed by the genome-wide characterizations and assess which is more robust.
Wisconsin Center of Excellence in Genomics Science
Medical College of Wisconsin, Milwaukee
The successful completion of the human genome and model organism sequences has ushered in a new era in biological research, with attention now focused on understanding the way in which genome sequence information is expressed and controlled. The focus of this proposed Wisconsin Center of Excellence in Genomics Science is to facilitate understanding of the complex and integrated regulatory mechanisms affecting gene transcription by developing novel technology for the comprehensive characterization and quantitative analysis of proteins interacting with DNA. This new technology will help provide for a genome-wide functional interpretation of the underlying mechanisms by which gene transcriptional regulation is altered during biological processes, development, disease, and in response to physiological, pharmacological, or environmental stressors. The development of chromatin immunoprecipitation approaches has allowed identification of the specific DNA sequences bound by proteins of interest. We propose to reverse this strategy and develop an entirely novel technology that will use oligonucleotide capture to pull down DNA sequences of interest, and mass spectrometry to identify and characterize the proteins and protein complexes bound and associated with particular DNA regions. This new approach will create an invaluable tool for deciphering the critical control processes regulating an essential biological function. The proposed interdisciplinary and multi-institutional Center of Excellence in Genomics Science combines specific expertise at the Medical College of Wisconsin, the University of Wisconsin Madison, and Marquette University. Technological developments in four specific areas will be pursued to develop this new approach: (1) cross-linking of proteins to DNA and fragmentation of chromatin; (2) capture of the protein-DNA complexes in a DNA sequence-specific manner; (3) mass spectrometry analysis to identify and quantify bound proteins; and (4) informatics to develop tools enabling the global analysis of the relationship between changes in protein-DNA interactions and gene expression. The Center will use carefully selected biological systems to develop and test the technology in an integrated genome-wide analysis platform that includes efficient data management and analysis tools. As part of the Center mission, we will combine our technology development efforts with an interdisciplinary training program for students and fellows designed to train qualified scientists experienced in cutting-edge genomics technology. Data, technology, and software will be widely disseminated by multiple mechanisms including licensing and commercialization activities.
Collaborating Institutions: University of Wisconsin-Madison, Marquette University
Center Web Site: Wisconsin Center of Excellence in Genomic Science
p>In this application, we propose a highly ambitious yet realistically attainable goal: to align existing expertise at UNC-Chapel Hill into a CEGS called CISGen. The overarching purpose of CISGen is to develop as a resource and to exploit the utility of the murine Collaborative Cross (CC) mouse model of the heterogeneous human population to delineate genetic and environmental determinants of complex phenotypes drawn from psychiatry, which are among the most intractable set of problems in all of biomedicine. Psychiatric disorders present a paradox - the associated morbidity, mortality, and costs are enormous and yet, despite over a century of scientific study, there are few hard facts about the etiology of the core diseases. Although our GWAS meta- analyses are in progress, early results suggest that strong and replicable findings may be elusive. Therefore, our proposal provides a complementary approach to the study of fundamental psychiatric phenotypes.
Fernando Pardo-Manuel de Villena
University of North Carolina, Chapel Hill
We propose a particularly challenging definition of success - we will identify high probability etiological models (which can be realistically complex) and then prove the predictive capacity of these models by generating novel strains of mice predicted to be at very high risk for the phenotype. Once validated, these high confidence models can then be tested in subsequent human studies - we do not propose human extension studies in CISGen but this is achievable for the investigators and their colleagues. Data collected in CISGen would be a valuable resource to the wider scientific community and could be applied to a large set of biological problems and these data can rapidly add to the knowledge base for any new genomewide association study (GWAS) finding. Delivery of sophisticated and user-friendly databases are a key component of CISGen.
Accomplishing this overarching goal requires an exceptional diversity of scientific expertise - psychiatry, human genetics, mouse behavior, mouse genetics, statistical genetics, computational biology, and systems biology. Experts in these disciplines are deeply involved in CISGen and are committed to the projects described herein. Successful integration of these diverse fields is non-trivial; however, all scientists on this application have had extensive interactions over the past five years, already know how to work together, and have a working knowledge of their colleagues' expertise. UNC-Chapel Hill has an intense commitment to inter- disciplinary genomics research and provides a fertile backdrop for 21st century projects like CISGen.
Collaborating Institutions: The Jackson Laboratory, North Carolina State University, University of Texas at Arlington
Center Web Site: Center for Integrated Systems Genomics at UNC (CISGen)
Center for Cell Circuits
The Broad Institute, Inc., Cambridge, Mass.
Systematic reconstruction of genetic and molecular circuits in mammalian cells remains a significant, largescale and unsolved challenge in genomics. The urgency to address it is underscored by the sizeable number of GWAS-derived disease genes whose functions remain largely obscure, limiting our progress towards biological understanding and therapeutic intervention. Recent advances in probing and manipulating cellular circuits on a genomic scale open the way for the development of a systematic method for circuit reconstruction. Here, we propose a Center for Cell Circuits to develop the reagents, technologies, algorithms, protocols and strategies needed to reconstruct molecular circuits. Our preliminary studies chart an initial path towards a universal strategy, which we will fully implement by developing a broad and integrated experimental and computational toolkit. We will develop methods for comprehensive profiling, genetic perturbations and mesoscale monitoring of diverse circuit layers (Aim 1). In parallel, we will develop a computational framework to analyze profiles, derive provisional models, use them to determine targets for perturbation and monitoring, and evaluate, refine and validate circuits based on those experiments (Aim 2). We will develop, test and refine this strategy in the context of two distinct and complementary mammalian circuits. First, we will produce an integrated, multi-layer circuit of the transcriptional response to pathogens in dendritic cells (Aim 3) as an example of an acute environmental response. Second, we will reconstruct the circuit of chromatin factors and non-coding RNAs that control chromatin organization and gene expression in mouse embryonic stem cells (Aim 4) as an example of the circuitry underlying stable cell states. These detailed datasets and models will reveal general principles of circuit organization, provide a resource for scientists in these two important fields, and allow computational biologists to test and develop algorithms. We will broadly disseminate our tools and methods to the community, enabling researchers to dissect any cell circuit of interest at unprecedented detail. Our work will open the way for reconstructing cellular circuits in human disease and individuals, to improve the accuracy of both diagnosis and treatment.
Center Web Site: Center for Cell Circuits
Genomic Analysis of the Genotype-Phenotype Map
University of Southern California, Los Angeles
Our Center, which started in 2003, focused on implications of haplotype structure in the human genome. Since that time, there have been extraordinary advances in genomics: Genome-wide association studies using single nucleotide polymorphisms and copy number variants are now commonplace, and we are rapidly moving towards whole-genome sequence data for large samples of individuals. Our Center has undergone similar dramatic changes. While the underlying theme remains the same -- making sense of genetic variation -- our focus is now explicitly on how we can use the heterogeneous data produced by modern genomics technologies to achieve such an understanding. The overall goal of our proposal is to develop an intellectual framework, together with computational and statistical analysis tools, for illuminating the path from genotype to phenotype, and for predicting the latter from the former. We will address three broad questions related to this problem: 1) How do we infer mechanisms by which genetic variation leads to changes in phenotype? 2) How do we improve the design, understanding and interpretation of association studies by exploiting prior information? 3) How do we identify general principles about the genotype-phenotype map? We will approach these questions through a series of interrelated projects that combine computational and experimental methods, explored in Arabidopsis, Drosophila and human, and involve a wide range of researchers including molecular biologists, population geneticists, genetic epidemiologists, statisticians, computer scientists, and mathematicians.
Collaborating Institutions: University of Utah
Center Web Site: The USC Center of Excellence in Genomic Science
Genomic Analysis of Network Perturbations in Human Disease
Dana-Farber Cancer Institute, Boston
Genetic differences between individuals can greatly influence their susceptibility to disease. The information originating from the Human Genome Project (HGP), including the genome sequence and its annotation, together with projects such as the HapMap and the Human Cancer Genome Project (HCGP) have greatly accelerated our ability to find genetic variants and associate genes with a wide range of human diseases. Despite these advances, linking individual genes and their variations to disease remains a daunting challenge. Even where a causal variant has been identified, the biological insight that must precede a strategy for therapeutic intervention has generally been slow in coming. The primary reason for this is that the phenotypic effects of functional sequence variants are mediated by a dynamic network of gene products and metabolites, which exhibit emergent properties that cannot be understood one gene at a time. Our central hypothesis is that both human genetic variations and pathogens such as viruses influence local and global properties of networks to induce "disease states." Therefore, we propose a general approach to understanding cellular networks based on environmental and genetic perturbations of network structure and readout of the effects using interactome mapping, proteomic analysis, and transcriptional profiling. We have chosen a defined model system with a variety of disease outcomes: viral infection. We will explore the concept that one must understand changes in complex cellular networks to fully understand the link between genotype, environment, and phenotype. We will integrate observations from network-level perturbations caused by particular viruses together with genome-wide human variation datasets for related human diseases with the goal of developing general principles for data integration and network prediction, instantiation of these in open-source software tools, and development of testable hypotheses that can be used to assess the value of our methods. Our plans to achieve these goals are summarized in the following specific aims: 1. Profile all viral-host protein-protein interactions for a group of viruses with related biological properties. 2. Profile the perturbations that viral proteins induce on the transcriptome of their host cells. 3. Combine the resulting interaction and perturbation data to derive cellular network-based models. 4. Use the developed models to interpret genome-wide genetic variations observed in human disease, 5. Integrate the bioinformatics resources developed by the various CCSG members within a Bioinformatics Core for data management and dissemination. 6. Building on existing education and outreach programs, we plan to develop a genomic and network centered educational program, with particular emphasis on providing access for underrepresented minorities to internships, workshop and scientific meetings.
Last Updated: September 22, 2014