Active Centers of Excellence in Genomic Science Awards
Center for In Toto Genomic Analysis of Vertebrate Development
California Institute of Technology, Pasadena, Calif.
This Center of Excellence in Genomic Science (CEGS) assembles a multidisciplinary group of investigators to develop innovative technologies with the goal of imaging and mutating every developmentally important vertebrate gene. Novel "in toto imaging" tools make it possible to use a systems-based approach for analysis of gene function in developing vertebrate embryos in real time and space. These tools can digitize in vivo data in a systematic, high-throughput, and quantitative fashion. Combining in toto imaging with novel gene traps permits a means to rapidly screen for developmentally relevant expression patterns, followed by the ability to immediately mutagenize genes of interest. Initially, key technologies will be developed and tested in the zebrafish embryo due to its transparency and the ability to obtain rapid feedback. Once validated, these techniques will be applied to an amniote, the avian embryo, due to several advantages including accessibility and similarity to human embryogenesis. Finally, to monitor alterations in gene expression in normal and mutant embryos, we will develop new techniques for in situ hybridization that permit simultaneous analysis of multiple marker genes in a sensitive and potentially quantitative manner. Our goal is to combine real time analysis of gene expression on a genome-wide scale coupled with the ability to mutate genes of interest and examine global alterations in gene expression as a result of gene loss. Much of the value will come from the development of new and broadly applicable technologies. In contrast to a typical technology development grant, however, there will be experimental fruit emerging from at least two vertebrate systems (zebrafish and avian). The following aims will be pursued: Specific Aim 1: Real-time "in toto" image analysis of reporter gene expression; Specific Aim 2: Comprehensive spatiotemporal analysis of gene function of the developing vertebrate embryo using the FlipTrap approach for gene trapping; Specific Aim 3: Design of quantitative, multiplexed 'hybridization chain reaction' (HCR) amplifiers for in vivo imaging with active background suppression; Specific Aim 4: Data analysis and integration of data sets to produce a "digital" fish and a "digital" bird. The technologies and the resulting atlases will be made broadly available via electronic publication.
Causal Transcriptional Consequences of Human Genetic Variation
George M. Church
Harvard University, Cambridge, Mass.
The Center for Transcriptional Consequences of Human Genetic Variation (CTCHGV) will develop innovative and powerful genetic engineering methods and use them to identify genetic variations that causally control gene transcription levels. Genome Wide Association Studies (GWAS) find many variations associated with disease and other phenotypes, but the variations that may actually cause these conditions are hard to identify because nearby variations in the same haplotype blocks consistently co-occur with them in human populations, so that specifically causative ones cannot be distinguished. About 95% of GWAS variations are not in gene coding regions, and many of these presumably associate with altered gene expression levels. CTCHGV will identify the variations that directly control gene expression by engineering precise combinations of changes to gene regulatory regions that break down the haplotype blocks, allowing each variations' effect on gene expression to be discerned independently of the others. To perform this analysis, CTCHGV will extract ~100kbps gene regulatory regions from human cell samples, create precise variations in them in E. coli, and re-introduce the altered regions back into human cells, using zinc finger nucleases (ZFNs) to efficiently induce recombination. CTCHGV will target 1000 genes for this analysis (Aim 1), and will use human induced Pluripotent Stem cells (iPS) to study the effects of variations in diverse human cell types (Aim 2). To explore the effects of variations in complex human tissues, CTCHGV will develop methods of measuring gene expression at transcriptome-wide levels in many single cells, including in situ in structured tissues (Aim 3). Finally, CTCHGV will develop novel advanced technologies that integrate DNA sequencing and synthesis to construct thousands of large DNA constructs from oligonucleotides, that enable very precise targeting and highly efficient performance of ZFNs, and that enable cells to be sorted on the basis of morphology as well as fluorescence and labeling (Aim 4). CTCHGV will also develop direct oligo-mediated engineering of human cells, and create "marked allele" iPS that will enable easy ascertainment of complete exon distributions for many pairs of gene alleles in many cell types.
Center for the Epigenetics of Common Human Disease
Andrew P. Feinberg
Johns Hopkins University, Baltimore
(co-funded by National Institute of Mental Health)
Epigenetics, the study of non-DNA sequence-related heredity, is at the epicenter of modern medicine because it can help to explain the relationship between an individual's genetic background, the environment, aging, and disease. The Center for the Epigenetics of Common Human Disease was created in 2004 to begin to develop the interface between epigenetics and epidemiologic-based phenotype studies, recognizing that epigenetics requires new ways of thinking about disease. We created a highly interdisciplinary group of faculty and trainees, including molecular biologists, biostatisticians, epidemiologists, and clinical investigators. We developed novel approaches to genome-wide DNA methylation (DNAm) analysis, allele-specific expression, and new statistical epigenetic tools. Using these tools, we discovered that most variable DNAm is in neither CpG islands nor promoters, but in what we term "CpG island shores," regions of lower CpG density up to several kb from islands, and we have found altered DNAm in these regions in cancer, depression and autism. In the renewal period, we will develop the novel field of epigenetic epidemiology, the relationship between epigenetic variation, genetic variation, environment and phenotype. We will continue to pioneer genome-wide epigenetic technology that is cost effective for large scale analysis of population-based samples, applying our knowledge from the current period to second-generation sequencing for epigenetic measurement, including DNAm and allele-specific methylation. We will continue to pioneer new statistical approaches for quantitative and binary DNAm assessment in populations, including an Epigenetic Barcode. We will develop Foundational Epigenetic Epidemiology, examining: time-dependence, heritability and environmental relationship of epigenetic marks; heritability in MZ and DZ twins; and develop an epigenetic transmission disequilibrium test. We will then pioneer Etiologic Epigenetic Epidemiology, by integrating novel genome-wide methylation scans (GWMs) with existing Genome-Wide Association Study (GWAS) and epidemiologic phenotype data, a design we term Genome-Wide Integrated Susceptibility (GWIS), focusing on bipolar disorder, aging, and autism as paradigms for epigenetic studies of family-based samples, longitudinal analyses, and parent-of-origin effects, respectively. This work will be critical to realizing the full value of previous genetic and phenotypic studies, by developing and applying molecular and statistical tools necessary to integrate DNA sequence with epigenetic and environmental causes of disease.
Center Web Site: Center of Excellence in Genomic Science at Johns Hopkins
Genomic Basis of Vertebrate Diversity
David M. Kingsley
Stanford University, Stanford, Calif.
The long-term goal of this project is to understand the genomic mechanisms that generate phenotypic diversity in vertebrates. Rapid progress in genomics has provided nearly complete sequences for several organisms. Comparative analysis suggests many fundamental pathways and gene networks are conserved between organisms. And yet, the morphology, physiology, and behavior of different species are obviously and profoundly different. What are the mechanisms that generate these key differences? Are unique traits controlled by few or many genetic changes? What kinds of changes? Are there particular genes and mechanisms that are used repeatedly when organisms adapt to new environments? Can better understanding of these mechanisms help explain dramatic differences in disease susceptibility that also exist between groups? The Stanford CEGS will use an innovative combination of approaches in fish, mice, and humans to identify the molecular basis of major phenotypic change in natural populations of vertebrates. Specific aims include: 1) cross stickleback fish and develop a genome wide map of the chromosomes, genes, and mutations that control a broad range of new morphological, physiological, and behavioral traits in natural environments; 2) test which population genetic measures provide the most reliable "signatures of selection" surrounding genes that are known to have served as the basis of parallel adaptive change in many different natural populations around the world; 3) assemble the stickleback proto Y chromosome and test whether either sex or autosomal rearrangements play an important role in generating phenotypic diversity, or are enriched in genomic regions that control phenotypic change; 4) test whether particular genes and mechanisms are used repeatedly to control phenotypic change in many different vertebrates. Preliminary data suggests that mechanisms identified as the basis of adaptive change in natural fish populations may be broadly predictive of adaptive mechanisms across a surprisingly large range of animals, including humans. Genetic regions hypothesized to be under selection in humans will be compared to genetic regions under selection in fish. Regions predicted to play an important role in natural human variation and disease susceptibility will be modeled in mice, generating new model systems for confirming functional variants predicted from human population genetics and comparative genomics.
Center Web Site: Stanford Genome Evolution Center
Microscale Life Sciences Center
Deirdre R. Meldrum
Arizona State University, Tempe
Increasingly, it is becoming apparent that understanding, predicting, and diagnosing disease states is confounded by the inherent heterogeneity of in situ cell populations. This variation in cell fate can be dramatic, for instance, one cell living while an adjacent cell dies. Thus, in order to understand fundamental pathways involved in disease states, it is necessary to link preexisting cell state to cell fate in the disease process at the individual cell level.
The Microscale Life Sciences Center (MLSC) at the University of Washington is focused on solving this problem, by developing cutting-edge microscale technology for high throughput genomic-level and multi-parameter single-cell analysis, and applying that technology to fundamental problems of biology and health. Our vision is to address pathways to disease states directly at the individual cell level, at increasing levels of complexity that progressively move to an in vivo understanding of disease. We propose to apply MLSC technological innovations to questions that focus on the balance between cell proliferation and cell death. The top three killers in the United States, cancer, heart disease and stroke, all involve an imbalance in this cellular decision-making process. Because of intrinsic cellular heterogeneity in the live/die decision, this fundamental cellular biology problem is an example of one for which analysis of individual cells is essential for developing the link between genomics, cell function, and disease. The specific systems to be studied are proinflammatory cell death (pyroptosis) in a mouse macrophage model, and neoplastic progression in the Barrett's Esophagus (BE) precancerous model. In each case, diagnostic signatures for specific cell states will be determined by measuring both physiological (cell cycle, ploidy, respiration rate, membrane potential) and genomic (gene expression profiles by single-cell proteomics, qRT-PCR and transcriptomics; LOH by LATE-PCR) parameters. These will then be correlated with cell fate via the same sets of measurements after a challenge is administered, for instance, a cell death stimulus for pyroptosis or a predisposing risk factor challenge (acid reflux) for BE. Ultimately, time series will be taken to map out the pathways that underlie the live/die decision.
Finally, this information will be used as a platform to define cell-cell interactions at the single-cell level, to move information on disease pathways towards greater in vivo relevance. New technology will be developed and integrated into the existing MLSC Living Cell Analysis cassette system to support these ambitious biological goals including 1) automated systems for cell placement, off-chip device interconnects, and high throughput data analysis with user friendly interfaces; 2) new optical and electronic sensors based on a new detection platform, new dyes and nanowires; and 3) new micromodules for single-cell qRT-PCR, LATE-PCR for LOH including single-cell pyrosequencing, on-chip single-cell proteomics, and single-cell transcriptomics using barcoded nanobeads.
Collaborating Institutions: Fred Hutchison Cancer Research Center, Brandeis University, University of Washington.
Center Web Site: Microscale Life Sciences Center
Wisconsin Center of Excellence in Genomics Science
Medical College of Wisconsin, Milwaukee
The successful completion of the human genome and model organism sequences has ushered in a new era in biological research, with attention now focused on understanding the way in which genome sequence information is expressed and controlled. The focus of this proposed Wisconsin Center of Excellence in Genomics Science is to facilitate understanding of the complex and integrated regulatory mechanisms affecting gene transcription by developing novel technology for the comprehensive characterization and quantitative analysis of proteins interacting with DNA. This new technology will help provide for a genome-wide functional interpretation of the underlying mechanisms by which gene transcriptional regulation is altered during biological processes, development, disease, and in response to physiological, pharmacological, or environmental stressors. The development of chromatin immunoprecipitation approaches has allowed identification of the specific DNA sequences bound by proteins of interest. We propose to reverse this strategy and develop an entirely novel technology that will use oligonucleotide capture to pull down DNA sequences of interest, and mass spectrometry to identify and characterize the proteins and protein complexes bound and associated with particular DNA regions. This new approach will create an invaluable tool for deciphering the critical control processes regulating an essential biological function. The proposed interdisciplinary and multi-institutional Center of Excellence in Genomics Science combines specific expertise at the Medical College of Wisconsin, the University of Wisconsin Madison, and Marquette University. Technological developments in four specific areas will be pursued to develop this new approach: (1) cross-linking of proteins to DNA and fragmentation of chromatin; (2) capture of the protein-DNA complexes in a DNA sequence-specific manner; (3) mass spectrometry analysis to identify and quantify bound proteins; and (4) informatics to develop tools enabling the global analysis of the relationship between changes in protein-DNA interactions and gene expression. The Center will use carefully selected biological systems to develop and test the technology in an integrated genome-wide analysis platform that includes efficient data management and analysis tools. As part of the Center mission, we will combine our technology development efforts with an interdisciplinary training program for students and fellows designed to train qualified scientists experienced in cutting-edge genomics technology. Data, technology, and software will be widely disseminated by multiple mechanisms including licensing and commercialization activities.
Collaborating Institutions: University of Wisconsin-Madison, Marquette University
Center Web Site: Wisconsin Center of Excellence in Genomic Science
p>In this application, we propose a highly ambitious yet realistically attainable goal: to align existing expertise at UNC-Chapel Hill into a CEGS called CISGen. The overarching purpose of CISGen is to develop as a resource and to exploit the utility of the murine Collaborative Cross (CC) mouse model of the heterogeneous human population to delineate genetic and environmental determinants of complex phenotypes drawn from psychiatry, which are among the most intractable set of problems in all of biomedicine. Psychiatric disorders present a paradox - the associated morbidity, mortality, and costs are enormous and yet, despite over a century of scientific study, there are few hard facts about the etiology of the core diseases. Although our GWAS meta- analyses are in progress, early results suggest that strong and replicable findings may be elusive. Therefore, our proposal provides a complementary approach to the study of fundamental psychiatric phenotypes.
Fernando Pardo-Manuel de Villena
University of North Carolina, Chapel Hill
We propose a particularly challenging definition of success - we will identify high probability etiological models (which can be realistically complex) and then prove the predictive capacity of these models by generating novel strains of mice predicted to be at very high risk for the phenotype. Once validated, these high confidence models can then be tested in subsequent human studies - we do not propose human extension studies in CISGen but this is achievable for the investigators and their colleagues. Data collected in CISGen would be a valuable resource to the wider scientific community and could be applied to a large set of biological problems and these data can rapidly add to the knowledge base for any new genomewide association study (GWAS) finding. Delivery of sophisticated and user-friendly databases are a key component of CISGen.
Accomplishing this overarching goal requires an exceptional diversity of scientific expertise - psychiatry, human genetics, mouse behavior, mouse genetics, statistical genetics, computational biology, and systems biology. Experts in these disciplines are deeply involved in CISGen and are committed to the projects described herein. Successful integration of these diverse fields is non-trivial; however, all scientists on this application have had extensive interactions over the past five years, already know how to work together, and have a working knowledge of their colleagues' expertise. UNC-Chapel Hill has an intense commitment to inter- disciplinary genomics research and provides a fertile backdrop for 21st century projects like CISGen.
Collaborating Institutions: The Jackson Laboratory, North Carolina State University, University of Texas at Arlington
Center Web Site: Center for Integrated Systems Genomics at UNC (CISGen)
Center for Cell Circuits
The Broad Institute, Inc., Cambridge, Mass.
Systematic reconstruction of genetic and molecular circuits in mammalian cells remains a significant, largescale and unsolved challenge in genomics. The urgency to address it is underscored by the sizeable number of GWAS-derived disease genes whose functions remain largely obscure, limiting our progress towards biological understanding and therapeutic intervention. Recent advances in probing and manipulating cellular circuits on a genomic scale open the way for the development of a systematic method for circuit reconstruction. Here, we propose a Center for Cell Circuits to develop the reagents, technologies, algorithms, protocols and strategies needed to reconstruct molecular circuits. Our preliminary studies chart an initial path towards a universal strategy, which we will fully implement by developing a broad and integrated experimental and computational toolkit. We will develop methods for comprehensive profiling, genetic perturbations and mesoscale monitoring of diverse circuit layers (Aim 1). In parallel, we will develop a computational framework to analyze profiles, derive provisional models, use them to determine targets for perturbation and monitoring, and evaluate, refine and validate circuits based on those experiments (Aim 2). We will develop, test and refine this strategy in the context of two distinct and complementary mammalian circuits. First, we will produce an integrated, multi-layer circuit of the transcriptional response to pathogens in dendritic cells (Aim 3) as an example of an acute environmental response. Second, we will reconstruct the circuit of chromatin factors and non-coding RNAs that control chromatin organization and gene expression in mouse embryonic stem cells (Aim 4) as an example of the circuitry underlying stable cell states. These detailed datasets and models will reveal general principles of circuit organization, provide a resource for scientists in these two important fields, and allow computational biologists to test and develop algorithms. We will broadly disseminate our tools and methods to the community, enabling researchers to dissect any cell circuit of interest at unprecedented detail. Our work will open the way for reconstructing cellular circuits in human disease and individuals, to improve the accuracy of both diagnosis and treatment.
Analysis of Human Genome Using Integrated Technologies
Michael P. Snyder
Yale University, New Haven, Conn.
We propose to establish a center to build genomic DNA arrays and develop novel technologies that will use these arrays for the large-scale functional analysis of the human genome. 0.3-1.4 kb fragments of nonrepetitive DNA from each of chromosomes 22, 21, 20, 19,7, 17, and perhaps the X chromosome will be prepared by PCR and attached to microscope slides. The arrays will be used to develop technologies for the large-scale mapping of 1) Transcribed sequences. 2) Binding sites of chromosomal proteins. 3) Origins of replication. 4) Genetic mutation and variation. A web-accessible database will be constructed to house the information generated in this study; data from other studies will also be integrated into the database. The arrays and technologies will be made available throughout both the Yale University and the larger scientific community. They will be integrated into our training programs for postdoctoral fellows, graduate students and undergraduates at Yale. We expect these procedures to be applicable to the analysis of the entire human genome and the genomes of many other organisms.
Center Web Site: Yale University Center for Excellence in Genomic Science
Genomic Analysis of the Genotype-Phenotype Map
University of Southern California, Los Angeles
Our Center, which started in 2003, focused on implications of haplotype structure in the human genome. Since that time, there have been extraordinary advances in genomics: Genome-wide association studies using single nucleotide polymorphisms and copy number variants are now commonplace, and we are rapidly moving towards whole-genome sequence data for large samples of individuals. Our Center has undergone similar dramatic changes. While the underlying theme remains the same -- making sense of genetic variation -- our focus is now explicitly on how we can use the heterogeneous data produced by modern genomics technologies to achieve such an understanding. The overall goal of our proposal is to develop an intellectual framework, together with computational and statistical analysis tools, for illuminating the path from genotype to phenotype, and for predicting the latter from the former. We will address three broad questions related to this problem: 1) How do we infer mechanisms by which genetic variation leads to changes in phenotype? 2) How do we improve the design, understanding and interpretation of association studies by exploiting prior information? 3) How do we identify general principles about the genotype-phenotype map? We will approach these questions through a series of interrelated projects that combine computational and experimental methods, explored in Arabidopsis, Drosophila and human, and involve a wide range of researchers including molecular biologists, population geneticists, genetic epidemiologists, statisticians, computer scientists, and mathematicians.
Collaborating Institutions: University of Utah
Center Web Site: The USC Center of Excellence in Genomic Science
Genomic Analysis of Network Perturbations in Human Disease
Dana-Farber Cancer Institute, Boston
Genetic differences between individuals can greatly influence their susceptibility to disease. The information originating from the Human Genome Project (HGP), including the genome sequence and its annotation, together with projects such as the HapMap and the Human Cancer Genome Project (HCGP) have greatly accelerated our ability to find genetic variants and associate genes with a wide range of human diseases. Despite these advances, linking individual genes and their variations to disease remains a daunting challenge. Even where a causal variant has been identified, the biological insight that must precede a strategy for therapeutic intervention has generally been slow in coming. The primary reason for this is that the phenotypic effects of functional sequence variants are mediated by a dynamic network of gene products and metabolites, which exhibit emergent properties that cannot be understood one gene at a time. Our central hypothesis is that both human genetic variations and pathogens such as viruses influence local and global properties of networks to induce "disease states." Therefore, we propose a general approach to understanding cellular networks based on environmental and genetic perturbations of network structure and readout of the effects using interactome mapping, proteomic analysis, and transcriptional profiling. We have chosen a defined model system with a variety of disease outcomes: viral infection. We will explore the concept that one must understand changes in complex cellular networks to fully understand the link between genotype, environment, and phenotype. We will integrate observations from network-level perturbations caused by particular viruses together with genome-wide human variation datasets for related human diseases with the goal of developing general principles for data integration and network prediction, instantiation of these in open-source software tools, and development of testable hypotheses that can be used to assess the value of our methods. Our plans to achieve these goals are summarized in the following specific aims: 1. Profile all viral-host protein-protein interactions for a group of viruses with related biological properties. 2. Profile the perturbations that viral proteins induce on the transcriptome of their host cells. 3. Combine the resulting interaction and perturbation data to derive cellular network-based models. 4. Use the developed models to interpret genome-wide genetic variations observed in human disease, 5. Integrate the bioinformatics resources developed by the various CCSG members within a Bioinformatics Core for data management and dissemination. 6. Building on existing education and outreach programs, we plan to develop a genomic and network centered educational program, with particular emphasis on providing access for underrepresented minorities to internships, workshop and scientific meetings.
Last Updated: August 1, 2011