NHGRI logo

Active CEGS Awards

The Center for Synthetic Regulatory Genomics

RM1 HG009491
Jef D. Boeke
New York University Langone Health

The Center for Synthetic Regulatory Genomics (SyRGe) is tasked with development and application of revolutionary technology for making dramatic, coordinated changes to extensive gene loci in the form of Big DNAs of 50-1000 kb, which enable broad investigation of the function of regulatory sequences and foster translational applications to biotechnology, personalized medicine and aggressively humanized mouse models. Specifically, we will (i) enhance our Design – Build – Deliver Technology pipeline to make it ever more efficient and precise, ii) enable “Bottom up” fully synthetic regulatory landscapes, as well as multilocus engineering in both cells and animals, culminating in projects to look at brain function and behavior iii) deploy combinatorial genomics at the level of Big DNA, iv) Develop new tools to engineer megabase changes to mammalian genomes in a locus-agnostic manner and v) Develop imaging technology to “light up” the Big DNA molecules we are delivering. The Center will dramatically supersede present and predicted technologies for manipulation and assessment of Big DNAs. Some of this work will culminate in the production of extensively humanized animal models called Genomically Rewritten and Tailored Genetically Engineered Mouse Models (GREAT-GEMMs), an element of our program worthy of bioethical debate and discussion. The Center also features a unique and highly successful outreach program whereby undergraduates from diverse backgrounds play a crucial role in genome assembly, as well as a Fellows program to expose researchers and students from other fields to transformative new technology and facilitate its promulgation throughout the larger human genetics, model organism and genomics communities. Finally, mechanisms to ensure long-term sustainable access to the technology developed in this Center to anyone who wishes to deploy it have been put in place.


Center for Personal Dynamic Regulomes

RM1 HG007735
Howard Chang
Stanford University

Tens of thousands of human genomes have been sequenced, but the central challenge is their interpretation. A comprehensive set of regulatory events across a genome — the regulome — is needed to make full use of genomic information, but is currently out of reach for most clinical applications and biological systems. The Center will develop technologies that greatly increase the sensitivity, speed, and comprehensiveness of understanding genome regulation. We will develop new technologies to interrogate the transactions between the genome and regulatory factors, such as proteins and noncoding RNAs from single cells, and integrate variations in DNA sequences and chromatin states over time and across individuals. Novel molecular engineering and biosensor strategies are deployed to encapsulate the desired complex DNA transformations into the probe system, such that the probe system can be directly used on very small human clinical samples and capture genome-wide information in one or two steps. These technologies will be applied to clinical samples with genomic aberrations to exercise their robustness, and reveal for the first time epigenomic dynamics of human diseases during progression and treatment. These technologies will be broadly applicable to many biomedical investigations, and the Center will disseminate the technologies via training and diverse means.


Center for Genome Editing and Recording

RM1 HG009490
Jonathan Weissman
Whitehead Institute

Recent advances in DNA sequencing and bioinformatics have generated vast numbers of sequence variants associated with disease that, in principle, hold the keys to breakthroughs in preventive medicine and therapeutic intervention. However, realizing the promise of personalized medicine will require accurate manipulation of DNA sequences and gene expression as well as interrogation of the functional consequences of sequence variants at a scale and level of accuracy not currently possible. The Center for Genome Editing and Recording (CGER) will address these challenges by creating technologies to detect, alter and record the sequence and output of the genome in individual cells and tissues. The CGER will exploit the programmable DNA binding and nickase activity of CRISPR-Cas proteins as well as engineered zinc finger and TALE proteins to create a new generation of tools to precisely engineer the genome and epigenome. These platforms will enable high-precision engineering of both the nuclear and mitochondrial genomes as well as heritable silencing or activation of messenger RNAs, long noncoding RNA and a host of other regulatory elements such as enhancers and insulator regions. Critically, these alterations can be made without introducing double stranded breaks to DNA, thereby avoiding DNA toxicity and minimizing reliance on complex and difficult-to-control endogenous DNA repair pathways. Collectively, these technologies will usher in a new era of safer, high precision and multiplexed genome and epigenome editing. In addition, we will exploit these platforms to develop higher-level multichannel molecular recorders that will allow us to track and reconstruct the life history of cells in an in vivo setting. We will add the ability to follow cells in space and time as well as record the history of their past cellular states to the existing phylogenetic lineage tracing systems pioneered in the previous embodiment of this Center. The center will be led by a team that collectively has a remarkable track record of developing bold, impactful new tools to expand their precision, efficacy, safety and scope, and finally exploiting these new capabilities to develop novel strategies to explore fundamental biological and biomedical problems. Our multidisciplinary team has a rich history of working together, which has been greatly accelerated by the CEGS structure in a manner that would simply not be possible if each co-PI had been working on similar problems in isolation. Leveraging the capabilities of the Whitehead Institute, MIT, The Broad Institute, Harvard University, Harvard Medical School, The Lewis Sigler Institute for Integrative Genomics at Princeton and the Massachusetts General Hospital, we aim to create transformative capabilities and have access to state-of-the- art research facilities as well as resources for training, education and outreach that will attract diverse talent to the field of genomics research. PUBLIC HEALTH RELEVANCE: The Center for Genomic Editing and Recording will create technologies to enable robust, comprehensive exploration of genes and genetic pathways responsible for human disease in addition to the development of higher-level multichannel molecular recorders that will allow us to track and reconstruct the life history of cells in an in vivo setting. Collectively, these technologies have profound implications for genome science, therapeutic strategies for somatic disorders, and genetic diseases as well as understanding normal development and disease processes such as tumor evolution and the mechanism of metastases and response to therapeutic challenges. In addition, we aim to create a vibrant environment in which scholars across educational and socioeconomic levels can engage without barriers, where diverse students at every stage of development will be exposed to novel opportunities for training, research and skill development, and new ideas in a dynamic interdisciplinary research environment.


Center for the Multiplexed Assessment Of Phenotype

RM1 HG010461
Douglas Fowler
University of Washington

To date, millions of human genetic variants have been found, many in the coding or regulatory sequence of genes. However, for only a tiny fraction of these variants do we understand how the expression or function of the encoded product is affected. As a consequence, the promise of sequencing human genomes to understand human phenotypes – especially the risk for many diseases with genetic components – has gone largely unfulfilled. What is needed are facile, high-throughput methods for generating libraries of human cells bearing mutant sequence elements and for assessing these libraries to determine each variant's effect on molecular and cellular phenotypes. Thus, the Center for the Multiplexed Assessment of Phenotype, based largely in the University of Washington's Department of Genome Sciences, proposes to develop highly generalizable, reproducible and scalable technologies to generate, and assess the functional impact of, variants in human genes. In the first specific aim, the Center will establish two workhorse methods of mutagenesis to produce variants: saturation editing of genes at their endogenous loci in the human genome, and in vitro generation of variant libraries that are recombined into safe harbor sites. In the second specific aim, the Center will develop approaches to explore the impact of mutations in noncoding regions on versions of genes that have been minimized – pared down to partially remove intronic sequence but still capable of providing essential activity. Further, it will develop mass spectrometry methods to analyze variation in coding sequences for its effect on protein abundance, stability, interactions, turnover and aggregation. In the third specific aim, the Center will assess variant effects on cell morphology, behavior and internal organization by using a novel, microscopy- based phenotyping technology, and on global transcription by developing a massively parallel single-cell mRNA profiling method. Center-developed technologies will be piloted on a set of human genes with disease relevance, enabling comparisons between each variant's functional effects and the effects of known pathogenic or benign variants. This effort will inform the use in the clinic of the large-scale functional data the Center's technologies will generate. Additionally, variants will be assessed under different conditions, such as in multiple cell lines, in combination with another mutation, or in the presence of a drug. The Center will also train early career experimentalists, clinical geneticists and data scientists to obtain and use large-scale functional data. This training will include internships in Center laboratories for one to three months, and apprenticeships for one to two years. These close interactions will generate medically- and biologically-relevant results and reveal the best paths for translating the vast amounts of Center-generated functional data for clinical use. Through these new technologies and their dissemination to the broader clinical community, the Center will advance the promise of the Human Genome Project by interpreting the vast landscape of human genetic variation.


A phenomics-first resource for interpretation of variants

RM1 HG010860
Melissa Haendel
University of Colorado, Denver

Genomics is key to precision medicine; however, despite the ease of sequencing, clinical interpretation is still thwarted because relevant data (disease, phenotype, and variant) is complex, heterogeneous, and disaggregated across sources. Moreover, this evidence is sometimes incomplete, conflicting, and erroneous. Consequently, clinicians face long lists of candidate diseases, genes, and countless variants of unknown significance. This situation will not improve without capturing and harmonizing the underlying phenotypic information; computability of this information is the bedrock for the emerging field of ​phenomics​. From basic science to clinical care, communities need structured ways to represent and exchange phenotypes and disease definitions. Addressing these fundamental phenomics needs makes it possible to computationally assess and reveal links between diseases and variants. We have previously shown how the addition of phenotypic information using the Human Phenotype Ontology (HPO) can improve the diagnostic yield for hard-to-diagnose patients, and HPO is therefore now a global standard for “deep phenotyping”. We have demonstrated the applicability of deep phenotyping in the evaluation of rare diseases which have overlapping mechanistic underpinnings with common/complex diseases as well as evolutionarily conserved mechanisms in model organisms. Having coordinated the community and prototyped the underlying computational platforms, we will now align both phenotype ontologies and clinical terminologies, enabling better comparison and inference of phenotypes for improved diagnostic efficacy. We propose to develop a Phenomics-First Resource (PFR). ​Specifically we will: 1. Create a community-driven framework of interoperable phenotype definitions across species​ (uPheno) 2. Harmonize human disease definitions with the ​MONDO​ disease alignment resource 3. Create a community-wide exchange standard for clinical and model-organism phenotypes (​Phenopackets​) 4. Develop an integrated phenomics platform ​to provide the research ​(e.g. BioLink) and clinical (​FHIR​) communities with programmatic access to phenomics ontologies, data, and algorithms The dynamic suite of interlinked technologies will together leverage community-developed knowledge in order to make variant interpretation more reliable, better provenanced, and more clinically actionable.


Center for Live Cell Genomics

RM1 HG011543
David Haussler
University of California Santa Cruz

Center for Live Cell Genomics We will build new methodology and capacity for large-scale, long-term, inexpensive, modular, customizable, shared, Internet-of-Things-controlled, reproducible live cell culture and tissue-based experimental genomics disease models. Tissue models include traditional cell culture as well as organoid and primary tissue explants obtained from surgery or biopsy. Organoid factories supporting tissue growth and maintenance will be integrated with external and on-chip electro-optofluidic analytical modules to become part of an ecosystem that is modeled after open-source software. It will use commodity sensors, cameras, and computers linked in platforms flexibly designed using simple 3D printing, molding, etching and milling techniques potentially available at any institution. This will stimulate rapid innovation in experimental platforms for tissue culture. We will push this technology and use its best-in-class capabilities to make progress in neurodevelopment and pediatric cancer, addressing big questions. What genes contribute most importantly and specifically to human brain development? How do they go wrong in neurodevelopmental disease or brain injury? What specific molecular pathways are disrupted in individual pediatric cancer cases? How can we test pathway-specific treatments in a tissue model specific to each patient? Our education and outreach plans include a training program to develop a diverse and inclusive cohort of undergraduate students trained in genomic science through secondary school and community college outreach as well as coding workshops and research-based laboratory classes for UCSC undergraduates to develop core competencies. Participation in these activities serve as a basis for training graduate students and postdocs in inclusive pedagogy and mentorship. We are also developing a one-stop information hub to form an online community and to share our technology through immersive webinars and tutorials aimed at a broader audience with a range of expertise from the general public to scientists and clinicians at research institutions. Our work will enable significant advances in neuroscience and cancer research and education, stimulate a new open-source culture in cell biology and genomics, and democratize scientific and educational access beyond elite institutions, extending sharing projects like NHGRI AnVIL beyond data and code to include experiments and Internet-connected experimental platforms. PUBLIC HEALTH RELEVANCE: We will introduce a new experimental platform paradigm to genomic studies of live cells and tissues that will allow scientists to be more creative, more productive, more collaborative and use fewer resources for their live cell genomics experiments. We will create new platforms that provide living cells growing in realistic 3D cultures that mimic actual tissues in the body and incorporate Internet-based remote control and analysis capabilities. We will deploy this technology to perform new, more robust and more reproducible experiments leading to discoveries in neurodevelopmental disease and cancer.


Center for Dynamic RNA Epitranscriptomes

RM1 HG008935
Chuan He
University of Chicago

Chemical modifications on mammalian messenger RNA (mRNA) have recently been shown to play critical and diverse regulatory roles in mRNA metabolism and translation. For example, the most abundant mRNA modification, N6-methyladenosine (m6A), is crucial for mammalian stem cell differentiation and tissue development in almost all systems tested so far. Dedicated proteins, many of which are essential in mammals, have evolved to install, recognize, and remove m6A marks (writers, readers, and erasers, respectively). Dysregulation of m6A methylation has been connected to a variety of human diseases and disorders. Functional roles have also been proposed for other internal modifications present in mammalian mRNA, including, but not limited to: pseudouridine (Ψ), 5-methylcytosine (m5C), 2’-O-methylation (Nm), N1-methyladenosine (m1A), N7- methylguanosine (m7G), and N3-methylcytosine (m3C). Our most recent research has uncovered modifications on chromosome-associated regulatory RNAs (carRNAs), such as promoter-associated RNA (paRNA), enhancer RNA (eRNA), and repeat RNAs, as well as frequent modifications in introns of pre-mRNA. The carRNA modifications have been shown to regulate chromatin state and transcription, and intron modifications may affect pre-mRNA processing. Despite rapid advances in the discovery and functional characterization of various RNA modifications and their effector proteins, a significant bottleneck limits the entire field of epitranscriptomics research: a dearth of quantitative sequencing methods that can comprehensively map most RNA modifications at base resolution with exact modification fraction information. The availability of such methods is critical for assessing the importance of these modifications in different regions of mRNA, examining the effects of dynamic changes in modification fraction, assigning modifications to different writers and analyzing their functional relevance, identifying target transcripts and sites of demethylation and analyzing their functional relevance, discovering new effectors for RNA modifications by overlapping with known RBP-binding sites or genomics features, and evaluating the physiological consequences of RNA modifications in biological processes. We have established both nucleic acid chemistry and directed protein evolution platforms to invent new technologies that transform RNA modifications to be read out as mutations or deletions that are universally compatible with extant sequencing platforms. Computational pipelines and RNA modification databases will be built to support the new method development and epitranscriptome research in the broad community. These new technologies will be optimized to work on low-input samples, particularly neuronal and clinical samples. We will focus on integrating new methods into robust protocols to map multiple RNA modifications in single experiments. Our proposed research will deliver high-throughput, high-resolution, and high-sensitivity methods that simultaneously map multiple RNA modifications in all biological areas.


Center for Genomic Information Encoded by RNA Nucleotide Modifications

RM1 HG011563
Samie Jaffrey
Weill Medical College of Cornell University

Post-transcriptional mechanisms control gene expression in virtually every cell. A major mediator of post-transcriptional gene regulation is the translating ribosome, which comprises three different types of RNAs: rRNA, tRNA and mRNA. These RNAs, along with ribosomal and mRNA-binding proteins, form a multi-RNA/multi-protein complex that can markedly influence mRNA stability and translation. Importantly, this complex is not constitutive. Instead, rRNA-tRNA, rRNA-mRNA, and tRNA-mRNA interactions are highly regulated, although the mechanisms of its regulation are poorly understood. A potential mechanism may involve chemical modification of their nucleotides. Indeed, rRNA, tRNA and mRNA are subjected to diverse chemical modifications whose stoichiometry is highly regulated in different tissues or disease states. Our underlying hypothesis is that the regulated nucleotide modifications in rRNA, tRNA, and mRNA act as a “code” that controls these RNAs and their mutual interactions, thus encoding unique patterns of gene expression. Although rRNA, tRNA, and mRNA nucleotide modifications are poised to be critical regulators of gene expression, studying how these modifications influence each other to control gene expression has been difficult to explore. In part this reflects the lack of scalable methods to quantify and profile nucleotide modifications in rRNA, tRNA, and mRNA. Another problem is that understanding the interactions of rRNA, tRNA, and mRNA requires specialized expertise in each of these three major areas of RNA biology. It is therefore critical for experts in rRNA, tRNA, and mRNA to work together to decipher the mutual interactions of these RNAs. The Center will bring together a team of experts in these diverse types of RNAs who will work together to develop novel techniques to probe nucleotide modifications and how they interact to orchestrate unique patterns of gene expression. The Center will develop novel technologies for mapping and quantifying rRNA, tRNA, and mRNA modifications, identify the dynamic modification sites in tissues and disease, and determine the function of these dynamic modifications. The methods and datasets that will be developed in the Center will provide the foundational knowledge needed to accelerate new areas of epitranscriptomics research in rRNA, tRNA, and mRNA biology. The Center has a major outreach and educational mission. The outreach/educational opportunities will include sponsored undergraduate research, breakout project funding, project funding for underrepresented minority trainees, funding for training visiting outside investigators, and funding for an annual symposium. We will also develop a website that curates the epitranscriptomic mapping data generated by the Center to ensure rapid and easy access to the new data we generate. Overall, the Center’s mission is to serve as a hub for training researchers in epitranscriptomics, as well as to develop new enabling technologies, develop foundational datasets, and reveal fundamental principles of modified nucleotide function in rRNA, tRNA, and mRNA that are needed to open up new areas of epitranscriptomics research. PUBLIC HEALTH RELEVANCE: Ribosomal RNA (rRNA), transfer RNA (tRNA), and messenger RNA (mRNA) are subjected to regulated and dynamic tissue-specific, signaling-specific, or disease-specific chemical modifications, many of which appear to regulate the fate and function of these RNAs. Although N6-methyladenosine (m6A) in mRNA comprises the major focus of the scientific community, other highly prevalent, and possibly similarly important, modifications in rRNA, tRNA and mRNA have not been studied in similar depth due to the lack the tools to quantify their precise stoichiometries in a site-specific and high-throughput manner. The Center for Quantitative and Site-Specific Analysis of RNA Modifications comprises an interdisciplinary and highly synergistic group of investigators who will develop a suite of enabling technologies that will provide the foundational tools, datasets, protocols, and concepts regarding modified nucleotide function, which will be needed to unravel the function of the still unresolved epitranscriptomic code in rRNA, tRNA, and mRNA.


Center for Sub-Cellular Genomics 

RM1 HG010023
Junhyong Kim
University of Pennsylvania

A cell is a highly complex system with distributed molecular physiologies in structured sub- cellular compartments whose interplay with the nuclear genome determine the functional characteristics of the cell. A classic example of distributed genomic processes is found in neurons. Learning and memory requires modulation of individual synapses through RNA localization, localized translation, and localized metabolites such as those from dendritic mitochondria. Dendrites of neurons integrate distributed synaptic signals into both electrical and nuclear transcriptional response. Dysfunction of these distributed genomic functions in neurons can result in a broad spectrum of neuropsychiatric diseases such as bipolar and depressive disorders, autism, among others. Understanding complex genomic interactions within a single cell requires new technologies: we need nano-scale ability to make genome-wide measurements at highly localized compartments and to effect highly localized functional genomic manipulations, especially in live tissues. To address this need, we propose to establish a Center for Sub-Cellular Genomics using neurons as model systems. The center will develop new optical and nanotechnology approaches to isolate sub-cellular scale components for genomic, metabolomics, and lipidomic analyses. The center will also develop new mass spectrometry methods, molecular biology methods, and informatics models to create a platform technology for sub-cellular genomics.


Genetic & Social Determinants of Health: Center for Admixture Science and Technology

RM1 HG011558
Lucila Ohno Machado
University of California San Diego

It is imperative to understand the underlying sources of the large health disparities among individuals from different racial and ethnic groups living in the United States (US). Complex relationships between genetics and social factors influence health outcomes. Approximately 33% of people in the US belong to an ethnic minority group and ~12.5% live below the federal poverty line. Historical and recent mixing of Europeans, Native Americans, Africans and Asians resulted in the US population having a relatively large number of admixed individuals who carry ancestry from outside their self-identified race. The All of Us (AoU) Program and the Million Veterans Program (MVP) include genetic, health and socioeconomic information on all participants, and therefore provide an opportunity to identify factors contributing to health disparities. However, the AoU program and MVP require their data to stay within local hosting sites, therefore conducting joint analyses on these cohorts requires the development of algorithms that enable privacy-protecting distributed computing (i.e., without revealing individual-level data). There are three important gaps in understanding genetic determinants of health: 1) most studies have been dominated by European individuals, and while they control for global ancestry, there is no attempt to model the patchwork of local ancestry characteristic of admixed individuals; 2) GWAS are primarily conducted using SNPs, while important sources of ancestry-specific genetic variation (tandem repeats (TRs) and the major histocompatibility complex (MHC) interval) are not assayed; and 3) most GWAS do not adjust for socioeconomic factors. The American College of Medical Genetics and Genomics (ACMG) has published a list of medically actionable cancer and cardiovascular genes recommended for return of incidental findings of pathogenic variants to reduce morbidity and mortality, but having minorities excluded from healthcare follow up due to common barriers (e.g., language and access) makes it difficult to distinguish between the genetic and socioeconomic factors that contribute to disparate health outcomes. The goal of the CAST (Center for Admixture Science and Technology) program is to improve the clinical utility of genetic information for all populations living in the US. In Aim 1, we will develop and apply multivariate models of disease risk prediction that incorporate local ancestry, complex variants (TRs and HLA types). In Aim 2, we will conduct scalable distributed computing using data from millions of individuals across the AoU and MVP compute enclaves. In Aim 3, we will develop new approaches to characterize phenotypes using electronic health records and surveys from AoU and MVP, assess the impact of including social determinants of health in our models, and prospectively evaluate them with new AoU and MVP participants. To achieve these goals, we assembled a highly interdisciplinary group of researchers with expertise in Genetics, Genome Biology, Data Sharing Policy and Technology, Health Disparities, Phenotyping, and Statistics. PUBLIC HEALTH RELEVANCE: It is imperative to understand the underlying sources of the large health disparities among individuals from different racial and ethnic groups living in the United States. Complex relationships between genetics, individual behavior, socioeconomic status and the environment influence health. The goal of the CAST (Center for Admixture Science and Technology) program is to improve the utility of genome science for all populations living in the United States.


The Duke FUNCTION Center: Pioneering the comprehensive identification of combinatorial noncoding causes of disease

RM1 HG011123
Tim Reddy
Duke University

Noncoding genetic variation that alters gene regulatory element activity has major impacts on health, disease, and evolution. Because measuring regulatory element activity has long been a major challenge, the mechanisms underlying thousands of genetic associations with disease remain unknown. Recent advances in high-throughput technologies have disruptively advanced the ability to measure the activity of individual regulatory elements, and the first population- and genome-scale uses of those methods are now underway. However, regulatory elements do not act alone. They interact with promoters, other regulatory elements, and the surrounding chromatin, all in ways that are complex and difficult to predict. Though there are now a plethora of technologies to measure the activity of individual regulatory elements, the ability to recapitulate the effects of combinations of regulatory elements is woefully inadequate and severely hinders efforts to establish the gene regulatory contributions to traits and diseases. The goal of the Duke FUNCTION Center of Excellence in Genomic Science is to make the study of the combinatorial activity of regulatory elements routine. Aim 1 is to develop a suite of new technologies to measure the combinatorial effects of regulatory elements in their endogenous genomic contexts. Those technologies will leverage very recent discoveries of CRISPR enzymes other than Cas9 that greatly expand the ability to manipulate the human genome. Aim 2 is to develop the matched computational, statistical, and evolutionary models needed to interpret and predict the measured effects of combinations of regulatory variants on human traits and diseases. Aim 3 is to demonstrate the broad applicability of the technologies developed through case studies of human diseases with prevalence ranging from common to ultra rare. Example case studies will include studies of schizophrenia, rare recessive disorders, and undiagnosed genetic disorders. We will also use a nationwide request for applications to identify Pilot Projects that will expand applications to other disease areas. Aim 4 is to create an electronic platform for distributing results from functional studies of the noncoding genome to the broad research community. The platform will integrate our results with those from studies in other labs and consortia, such as ENCODE; and will enable researchers with diverse expertise to benefit from the Center. Finally, our Education and Outreach Aim is to expand genomics capacity locally and nationally, and with a particular emphasis on increasing use of our new technologies for translational research. The expected outcome of this project will be a paradigm shift in human genetic and genomics in which it will become possible to finally understand the full regulatory complexity that controls the expression of human genes. We anticipate that ability will be particularly powerful for translating genetic associations into disease mechanisms, thus creating a windfall of new knowledge about which genes contribute most to disease, and how to manipulate those genes for therapeutic benefit. Long term, we envision this work being critical to realizing the full potential of whole genome sequencing to detect causes of disease.


Center for Integrated Cellular Analysis

1 RM1 HG011014-01
Rahul Satija
New York Genome Center

While rapid advances in single-cell RNA-sequencing are yielding comprehensive taxonomies of cell states in the human body, understanding the complex molecular and environmental factors that regulate cell behavior remains a central challenge. New methods for simultaneous measurement of multiple molecular modalities, spatial context, and lineage relationships are needed to address this goal, but are currently outside the scope of present technologies which largely focus on a single data type. We propose to create a Center for Integrated Cellular Analysis, with a mission to develop a comprehensive suite of technologies and analytical methods to measure and integrate the molecular and environmental determinants of cellular identity. To achieve these goals, we propose the following series of synergistic Aims that will be developed in parallel: 1) Develop massively parallel assays to simultaneously profile multiple molecular components across millions of cells; 2) Identify the spatial and environmental determinants of cellular state in complex interacting populations; 3)  Develop scalable platforms to profile inherited molecular components, and determine the role of cell lineage in establishing molecular and phenotypic differences across cells; and 4) Develop methods to harmonize single cell profiles across distinct modalities, enabling the inference of cellular identity. Our Center will address critical challenges in data integration, and produce software and protocols that will be applicable to diverse biological  systems. We will share these resources broadly with the community, alongside a broader educational focus to encourage New York City students from under-represented backgrounds to pursue academic training in Genomics and Systems Biology.


Center for Genome Imaging

RM1 HG011016
Ting Wu
Harvard Medical School

Three-dimensional (3D) genome organization is a major contributor to genome function, and yet, we are only at the very dawn of discovering the structural signatures that underlie that organization. Thus, the goal of the proposed studies is to develop and apply tools that will enable sequence-specific imaging of human genomes, in their entirety, with high genomic resolution. In particular, the proposed work will innovate methods for fixed and live cell imaging using diffraction-limited light microscopy and super-resolution microscopy as well as develop new tools for image analysis and genome modeling. To this end, it will involve the continued collaboration of four laboratories, whose collective breadth of expertise covers the fields of classical and molecular genetics, chromosome dynamics, imaging, Hi-C analysis, convolutional neural networks, and polymer physics-based and restraint-based modeling. An equally important objective of the proposed studies is to ensure a generation of researchers whose personal breadth of expertise will come to match that of the entire current team.

Health relatedness: Will a solid grasp of 3D genome organization have implications for under- standing human development? Will it contribute to the protection of human health? Will it contribute to strategies for early diagnostics and perhaps even the development of new therapies? The answer to all these questions is almost certainly a resounding Yes, as knowledge of 3D genome organization will enhance our capacity to address both fundamental biological processes as well as disease.

Innovation: An abundance of studies argue that genomes function as integrated units and, yet, no extant technologies enable sequence-specific imaging of entire genomes at high genomic resolution. Thus, the capacity of researchers to fathom the interplay between 3D genome organization and genome function has been limited to disjointed snapshots of localized events. Accordingly, first three aims will develop the next tier of tools to put entire genomes within reach. They will advance a new method, OligoFISSEQ, and then integrate it with OligoSTORM and OligoDNA-PAINT to finally achieve high-throughput imaging at both conventional and super-resolution. They will also tackle two genomic features that have been prohibitively difficult to capture – presence of homologs in diploid cells and highly repeated sequences – as well as innovate strategies for high volume data storage, image processing and analysis, and modeling. Finally, a fourth aim will implement methods for disseminating our tools.

  1. Scaling technologies toward whole genome imaging
     
  2. Filling in gaps to visualize chromosomes end-to-end – tackling homologs and repeats
     
  3. Probe design, image analysis, modeling, and integration of epigenetic data
     
  4. Training, resources, and opportunities for engaging colleagues in whole genome imaging
     
  • Active CEGS Awards

    The Center for Synthetic Regulatory Genomics

    RM1 HG009491
    Jef D. Boeke
    New York University Langone Health

    The Center for Synthetic Regulatory Genomics (SyRGe) is tasked with development and application of revolutionary technology for making dramatic, coordinated changes to extensive gene loci in the form of Big DNAs of 50-1000 kb, which enable broad investigation of the function of regulatory sequences and foster translational applications to biotechnology, personalized medicine and aggressively humanized mouse models. Specifically, we will (i) enhance our Design – Build – Deliver Technology pipeline to make it ever more efficient and precise, ii) enable “Bottom up” fully synthetic regulatory landscapes, as well as multilocus engineering in both cells and animals, culminating in projects to look at brain function and behavior iii) deploy combinatorial genomics at the level of Big DNA, iv) Develop new tools to engineer megabase changes to mammalian genomes in a locus-agnostic manner and v) Develop imaging technology to “light up” the Big DNA molecules we are delivering. The Center will dramatically supersede present and predicted technologies for manipulation and assessment of Big DNAs. Some of this work will culminate in the production of extensively humanized animal models called Genomically Rewritten and Tailored Genetically Engineered Mouse Models (GREAT-GEMMs), an element of our program worthy of bioethical debate and discussion. The Center also features a unique and highly successful outreach program whereby undergraduates from diverse backgrounds play a crucial role in genome assembly, as well as a Fellows program to expose researchers and students from other fields to transformative new technology and facilitate its promulgation throughout the larger human genetics, model organism and genomics communities. Finally, mechanisms to ensure long-term sustainable access to the technology developed in this Center to anyone who wishes to deploy it have been put in place.


    Center for Personal Dynamic Regulomes

    RM1 HG007735
    Howard Chang
    Stanford University

    Tens of thousands of human genomes have been sequenced, but the central challenge is their interpretation. A comprehensive set of regulatory events across a genome — the regulome — is needed to make full use of genomic information, but is currently out of reach for most clinical applications and biological systems. The Center will develop technologies that greatly increase the sensitivity, speed, and comprehensiveness of understanding genome regulation. We will develop new technologies to interrogate the transactions between the genome and regulatory factors, such as proteins and noncoding RNAs from single cells, and integrate variations in DNA sequences and chromatin states over time and across individuals. Novel molecular engineering and biosensor strategies are deployed to encapsulate the desired complex DNA transformations into the probe system, such that the probe system can be directly used on very small human clinical samples and capture genome-wide information in one or two steps. These technologies will be applied to clinical samples with genomic aberrations to exercise their robustness, and reveal for the first time epigenomic dynamics of human diseases during progression and treatment. These technologies will be broadly applicable to many biomedical investigations, and the Center will disseminate the technologies via training and diverse means.


    Center for Genome Editing and Recording

    RM1 HG009490
    Jonathan Weissman
    Whitehead Institute

    Recent advances in DNA sequencing and bioinformatics have generated vast numbers of sequence variants associated with disease that, in principle, hold the keys to breakthroughs in preventive medicine and therapeutic intervention. However, realizing the promise of personalized medicine will require accurate manipulation of DNA sequences and gene expression as well as interrogation of the functional consequences of sequence variants at a scale and level of accuracy not currently possible. The Center for Genome Editing and Recording (CGER) will address these challenges by creating technologies to detect, alter and record the sequence and output of the genome in individual cells and tissues. The CGER will exploit the programmable DNA binding and nickase activity of CRISPR-Cas proteins as well as engineered zinc finger and TALE proteins to create a new generation of tools to precisely engineer the genome and epigenome. These platforms will enable high-precision engineering of both the nuclear and mitochondrial genomes as well as heritable silencing or activation of messenger RNAs, long noncoding RNA and a host of other regulatory elements such as enhancers and insulator regions. Critically, these alterations can be made without introducing double stranded breaks to DNA, thereby avoiding DNA toxicity and minimizing reliance on complex and difficult-to-control endogenous DNA repair pathways. Collectively, these technologies will usher in a new era of safer, high precision and multiplexed genome and epigenome editing. In addition, we will exploit these platforms to develop higher-level multichannel molecular recorders that will allow us to track and reconstruct the life history of cells in an in vivo setting. We will add the ability to follow cells in space and time as well as record the history of their past cellular states to the existing phylogenetic lineage tracing systems pioneered in the previous embodiment of this Center. The center will be led by a team that collectively has a remarkable track record of developing bold, impactful new tools to expand their precision, efficacy, safety and scope, and finally exploiting these new capabilities to develop novel strategies to explore fundamental biological and biomedical problems. Our multidisciplinary team has a rich history of working together, which has been greatly accelerated by the CEGS structure in a manner that would simply not be possible if each co-PI had been working on similar problems in isolation. Leveraging the capabilities of the Whitehead Institute, MIT, The Broad Institute, Harvard University, Harvard Medical School, The Lewis Sigler Institute for Integrative Genomics at Princeton and the Massachusetts General Hospital, we aim to create transformative capabilities and have access to state-of-the- art research facilities as well as resources for training, education and outreach that will attract diverse talent to the field of genomics research. PUBLIC HEALTH RELEVANCE: The Center for Genomic Editing and Recording will create technologies to enable robust, comprehensive exploration of genes and genetic pathways responsible for human disease in addition to the development of higher-level multichannel molecular recorders that will allow us to track and reconstruct the life history of cells in an in vivo setting. Collectively, these technologies have profound implications for genome science, therapeutic strategies for somatic disorders, and genetic diseases as well as understanding normal development and disease processes such as tumor evolution and the mechanism of metastases and response to therapeutic challenges. In addition, we aim to create a vibrant environment in which scholars across educational and socioeconomic levels can engage without barriers, where diverse students at every stage of development will be exposed to novel opportunities for training, research and skill development, and new ideas in a dynamic interdisciplinary research environment.


    Center for the Multiplexed Assessment Of Phenotype

    RM1 HG010461
    Douglas Fowler
    University of Washington

    To date, millions of human genetic variants have been found, many in the coding or regulatory sequence of genes. However, for only a tiny fraction of these variants do we understand how the expression or function of the encoded product is affected. As a consequence, the promise of sequencing human genomes to understand human phenotypes – especially the risk for many diseases with genetic components – has gone largely unfulfilled. What is needed are facile, high-throughput methods for generating libraries of human cells bearing mutant sequence elements and for assessing these libraries to determine each variant's effect on molecular and cellular phenotypes. Thus, the Center for the Multiplexed Assessment of Phenotype, based largely in the University of Washington's Department of Genome Sciences, proposes to develop highly generalizable, reproducible and scalable technologies to generate, and assess the functional impact of, variants in human genes. In the first specific aim, the Center will establish two workhorse methods of mutagenesis to produce variants: saturation editing of genes at their endogenous loci in the human genome, and in vitro generation of variant libraries that are recombined into safe harbor sites. In the second specific aim, the Center will develop approaches to explore the impact of mutations in noncoding regions on versions of genes that have been minimized – pared down to partially remove intronic sequence but still capable of providing essential activity. Further, it will develop mass spectrometry methods to analyze variation in coding sequences for its effect on protein abundance, stability, interactions, turnover and aggregation. In the third specific aim, the Center will assess variant effects on cell morphology, behavior and internal organization by using a novel, microscopy- based phenotyping technology, and on global transcription by developing a massively parallel single-cell mRNA profiling method. Center-developed technologies will be piloted on a set of human genes with disease relevance, enabling comparisons between each variant's functional effects and the effects of known pathogenic or benign variants. This effort will inform the use in the clinic of the large-scale functional data the Center's technologies will generate. Additionally, variants will be assessed under different conditions, such as in multiple cell lines, in combination with another mutation, or in the presence of a drug. The Center will also train early career experimentalists, clinical geneticists and data scientists to obtain and use large-scale functional data. This training will include internships in Center laboratories for one to three months, and apprenticeships for one to two years. These close interactions will generate medically- and biologically-relevant results and reveal the best paths for translating the vast amounts of Center-generated functional data for clinical use. Through these new technologies and their dissemination to the broader clinical community, the Center will advance the promise of the Human Genome Project by interpreting the vast landscape of human genetic variation.


    A phenomics-first resource for interpretation of variants

    RM1 HG010860
    Melissa Haendel
    University of Colorado, Denver

    Genomics is key to precision medicine; however, despite the ease of sequencing, clinical interpretation is still thwarted because relevant data (disease, phenotype, and variant) is complex, heterogeneous, and disaggregated across sources. Moreover, this evidence is sometimes incomplete, conflicting, and erroneous. Consequently, clinicians face long lists of candidate diseases, genes, and countless variants of unknown significance. This situation will not improve without capturing and harmonizing the underlying phenotypic information; computability of this information is the bedrock for the emerging field of ​phenomics​. From basic science to clinical care, communities need structured ways to represent and exchange phenotypes and disease definitions. Addressing these fundamental phenomics needs makes it possible to computationally assess and reveal links between diseases and variants. We have previously shown how the addition of phenotypic information using the Human Phenotype Ontology (HPO) can improve the diagnostic yield for hard-to-diagnose patients, and HPO is therefore now a global standard for “deep phenotyping”. We have demonstrated the applicability of deep phenotyping in the evaluation of rare diseases which have overlapping mechanistic underpinnings with common/complex diseases as well as evolutionarily conserved mechanisms in model organisms. Having coordinated the community and prototyped the underlying computational platforms, we will now align both phenotype ontologies and clinical terminologies, enabling better comparison and inference of phenotypes for improved diagnostic efficacy. We propose to develop a Phenomics-First Resource (PFR). ​Specifically we will: 1. Create a community-driven framework of interoperable phenotype definitions across species​ (uPheno) 2. Harmonize human disease definitions with the ​MONDO​ disease alignment resource 3. Create a community-wide exchange standard for clinical and model-organism phenotypes (​Phenopackets​) 4. Develop an integrated phenomics platform ​to provide the research ​(e.g. BioLink) and clinical (​FHIR​) communities with programmatic access to phenomics ontologies, data, and algorithms The dynamic suite of interlinked technologies will together leverage community-developed knowledge in order to make variant interpretation more reliable, better provenanced, and more clinically actionable.


    Center for Live Cell Genomics

    RM1 HG011543
    David Haussler
    University of California Santa Cruz

    Center for Live Cell Genomics We will build new methodology and capacity for large-scale, long-term, inexpensive, modular, customizable, shared, Internet-of-Things-controlled, reproducible live cell culture and tissue-based experimental genomics disease models. Tissue models include traditional cell culture as well as organoid and primary tissue explants obtained from surgery or biopsy. Organoid factories supporting tissue growth and maintenance will be integrated with external and on-chip electro-optofluidic analytical modules to become part of an ecosystem that is modeled after open-source software. It will use commodity sensors, cameras, and computers linked in platforms flexibly designed using simple 3D printing, molding, etching and milling techniques potentially available at any institution. This will stimulate rapid innovation in experimental platforms for tissue culture. We will push this technology and use its best-in-class capabilities to make progress in neurodevelopment and pediatric cancer, addressing big questions. What genes contribute most importantly and specifically to human brain development? How do they go wrong in neurodevelopmental disease or brain injury? What specific molecular pathways are disrupted in individual pediatric cancer cases? How can we test pathway-specific treatments in a tissue model specific to each patient? Our education and outreach plans include a training program to develop a diverse and inclusive cohort of undergraduate students trained in genomic science through secondary school and community college outreach as well as coding workshops and research-based laboratory classes for UCSC undergraduates to develop core competencies. Participation in these activities serve as a basis for training graduate students and postdocs in inclusive pedagogy and mentorship. We are also developing a one-stop information hub to form an online community and to share our technology through immersive webinars and tutorials aimed at a broader audience with a range of expertise from the general public to scientists and clinicians at research institutions. Our work will enable significant advances in neuroscience and cancer research and education, stimulate a new open-source culture in cell biology and genomics, and democratize scientific and educational access beyond elite institutions, extending sharing projects like NHGRI AnVIL beyond data and code to include experiments and Internet-connected experimental platforms. PUBLIC HEALTH RELEVANCE: We will introduce a new experimental platform paradigm to genomic studies of live cells and tissues that will allow scientists to be more creative, more productive, more collaborative and use fewer resources for their live cell genomics experiments. We will create new platforms that provide living cells growing in realistic 3D cultures that mimic actual tissues in the body and incorporate Internet-based remote control and analysis capabilities. We will deploy this technology to perform new, more robust and more reproducible experiments leading to discoveries in neurodevelopmental disease and cancer.


    Center for Dynamic RNA Epitranscriptomes

    RM1 HG008935
    Chuan He
    University of Chicago

    Chemical modifications on mammalian messenger RNA (mRNA) have recently been shown to play critical and diverse regulatory roles in mRNA metabolism and translation. For example, the most abundant mRNA modification, N6-methyladenosine (m6A), is crucial for mammalian stem cell differentiation and tissue development in almost all systems tested so far. Dedicated proteins, many of which are essential in mammals, have evolved to install, recognize, and remove m6A marks (writers, readers, and erasers, respectively). Dysregulation of m6A methylation has been connected to a variety of human diseases and disorders. Functional roles have also been proposed for other internal modifications present in mammalian mRNA, including, but not limited to: pseudouridine (Ψ), 5-methylcytosine (m5C), 2’-O-methylation (Nm), N1-methyladenosine (m1A), N7- methylguanosine (m7G), and N3-methylcytosine (m3C). Our most recent research has uncovered modifications on chromosome-associated regulatory RNAs (carRNAs), such as promoter-associated RNA (paRNA), enhancer RNA (eRNA), and repeat RNAs, as well as frequent modifications in introns of pre-mRNA. The carRNA modifications have been shown to regulate chromatin state and transcription, and intron modifications may affect pre-mRNA processing. Despite rapid advances in the discovery and functional characterization of various RNA modifications and their effector proteins, a significant bottleneck limits the entire field of epitranscriptomics research: a dearth of quantitative sequencing methods that can comprehensively map most RNA modifications at base resolution with exact modification fraction information. The availability of such methods is critical for assessing the importance of these modifications in different regions of mRNA, examining the effects of dynamic changes in modification fraction, assigning modifications to different writers and analyzing their functional relevance, identifying target transcripts and sites of demethylation and analyzing their functional relevance, discovering new effectors for RNA modifications by overlapping with known RBP-binding sites or genomics features, and evaluating the physiological consequences of RNA modifications in biological processes. We have established both nucleic acid chemistry and directed protein evolution platforms to invent new technologies that transform RNA modifications to be read out as mutations or deletions that are universally compatible with extant sequencing platforms. Computational pipelines and RNA modification databases will be built to support the new method development and epitranscriptome research in the broad community. These new technologies will be optimized to work on low-input samples, particularly neuronal and clinical samples. We will focus on integrating new methods into robust protocols to map multiple RNA modifications in single experiments. Our proposed research will deliver high-throughput, high-resolution, and high-sensitivity methods that simultaneously map multiple RNA modifications in all biological areas.


    Center for Genomic Information Encoded by RNA Nucleotide Modifications

    RM1 HG011563
    Samie Jaffrey
    Weill Medical College of Cornell University

    Post-transcriptional mechanisms control gene expression in virtually every cell. A major mediator of post-transcriptional gene regulation is the translating ribosome, which comprises three different types of RNAs: rRNA, tRNA and mRNA. These RNAs, along with ribosomal and mRNA-binding proteins, form a multi-RNA/multi-protein complex that can markedly influence mRNA stability and translation. Importantly, this complex is not constitutive. Instead, rRNA-tRNA, rRNA-mRNA, and tRNA-mRNA interactions are highly regulated, although the mechanisms of its regulation are poorly understood. A potential mechanism may involve chemical modification of their nucleotides. Indeed, rRNA, tRNA and mRNA are subjected to diverse chemical modifications whose stoichiometry is highly regulated in different tissues or disease states. Our underlying hypothesis is that the regulated nucleotide modifications in rRNA, tRNA, and mRNA act as a “code” that controls these RNAs and their mutual interactions, thus encoding unique patterns of gene expression. Although rRNA, tRNA, and mRNA nucleotide modifications are poised to be critical regulators of gene expression, studying how these modifications influence each other to control gene expression has been difficult to explore. In part this reflects the lack of scalable methods to quantify and profile nucleotide modifications in rRNA, tRNA, and mRNA. Another problem is that understanding the interactions of rRNA, tRNA, and mRNA requires specialized expertise in each of these three major areas of RNA biology. It is therefore critical for experts in rRNA, tRNA, and mRNA to work together to decipher the mutual interactions of these RNAs. The Center will bring together a team of experts in these diverse types of RNAs who will work together to develop novel techniques to probe nucleotide modifications and how they interact to orchestrate unique patterns of gene expression. The Center will develop novel technologies for mapping and quantifying rRNA, tRNA, and mRNA modifications, identify the dynamic modification sites in tissues and disease, and determine the function of these dynamic modifications. The methods and datasets that will be developed in the Center will provide the foundational knowledge needed to accelerate new areas of epitranscriptomics research in rRNA, tRNA, and mRNA biology. The Center has a major outreach and educational mission. The outreach/educational opportunities will include sponsored undergraduate research, breakout project funding, project funding for underrepresented minority trainees, funding for training visiting outside investigators, and funding for an annual symposium. We will also develop a website that curates the epitranscriptomic mapping data generated by the Center to ensure rapid and easy access to the new data we generate. Overall, the Center’s mission is to serve as a hub for training researchers in epitranscriptomics, as well as to develop new enabling technologies, develop foundational datasets, and reveal fundamental principles of modified nucleotide function in rRNA, tRNA, and mRNA that are needed to open up new areas of epitranscriptomics research. PUBLIC HEALTH RELEVANCE: Ribosomal RNA (rRNA), transfer RNA (tRNA), and messenger RNA (mRNA) are subjected to regulated and dynamic tissue-specific, signaling-specific, or disease-specific chemical modifications, many of which appear to regulate the fate and function of these RNAs. Although N6-methyladenosine (m6A) in mRNA comprises the major focus of the scientific community, other highly prevalent, and possibly similarly important, modifications in rRNA, tRNA and mRNA have not been studied in similar depth due to the lack the tools to quantify their precise stoichiometries in a site-specific and high-throughput manner. The Center for Quantitative and Site-Specific Analysis of RNA Modifications comprises an interdisciplinary and highly synergistic group of investigators who will develop a suite of enabling technologies that will provide the foundational tools, datasets, protocols, and concepts regarding modified nucleotide function, which will be needed to unravel the function of the still unresolved epitranscriptomic code in rRNA, tRNA, and mRNA.


    Center for Sub-Cellular Genomics 

    RM1 HG010023
    Junhyong Kim
    University of Pennsylvania

    A cell is a highly complex system with distributed molecular physiologies in structured sub- cellular compartments whose interplay with the nuclear genome determine the functional characteristics of the cell. A classic example of distributed genomic processes is found in neurons. Learning and memory requires modulation of individual synapses through RNA localization, localized translation, and localized metabolites such as those from dendritic mitochondria. Dendrites of neurons integrate distributed synaptic signals into both electrical and nuclear transcriptional response. Dysfunction of these distributed genomic functions in neurons can result in a broad spectrum of neuropsychiatric diseases such as bipolar and depressive disorders, autism, among others. Understanding complex genomic interactions within a single cell requires new technologies: we need nano-scale ability to make genome-wide measurements at highly localized compartments and to effect highly localized functional genomic manipulations, especially in live tissues. To address this need, we propose to establish a Center for Sub-Cellular Genomics using neurons as model systems. The center will develop new optical and nanotechnology approaches to isolate sub-cellular scale components for genomic, metabolomics, and lipidomic analyses. The center will also develop new mass spectrometry methods, molecular biology methods, and informatics models to create a platform technology for sub-cellular genomics.


    Genetic & Social Determinants of Health: Center for Admixture Science and Technology

    RM1 HG011558
    Lucila Ohno Machado
    University of California San Diego

    It is imperative to understand the underlying sources of the large health disparities among individuals from different racial and ethnic groups living in the United States (US). Complex relationships between genetics and social factors influence health outcomes. Approximately 33% of people in the US belong to an ethnic minority group and ~12.5% live below the federal poverty line. Historical and recent mixing of Europeans, Native Americans, Africans and Asians resulted in the US population having a relatively large number of admixed individuals who carry ancestry from outside their self-identified race. The All of Us (AoU) Program and the Million Veterans Program (MVP) include genetic, health and socioeconomic information on all participants, and therefore provide an opportunity to identify factors contributing to health disparities. However, the AoU program and MVP require their data to stay within local hosting sites, therefore conducting joint analyses on these cohorts requires the development of algorithms that enable privacy-protecting distributed computing (i.e., without revealing individual-level data). There are three important gaps in understanding genetic determinants of health: 1) most studies have been dominated by European individuals, and while they control for global ancestry, there is no attempt to model the patchwork of local ancestry characteristic of admixed individuals; 2) GWAS are primarily conducted using SNPs, while important sources of ancestry-specific genetic variation (tandem repeats (TRs) and the major histocompatibility complex (MHC) interval) are not assayed; and 3) most GWAS do not adjust for socioeconomic factors. The American College of Medical Genetics and Genomics (ACMG) has published a list of medically actionable cancer and cardiovascular genes recommended for return of incidental findings of pathogenic variants to reduce morbidity and mortality, but having minorities excluded from healthcare follow up due to common barriers (e.g., language and access) makes it difficult to distinguish between the genetic and socioeconomic factors that contribute to disparate health outcomes. The goal of the CAST (Center for Admixture Science and Technology) program is to improve the clinical utility of genetic information for all populations living in the US. In Aim 1, we will develop and apply multivariate models of disease risk prediction that incorporate local ancestry, complex variants (TRs and HLA types). In Aim 2, we will conduct scalable distributed computing using data from millions of individuals across the AoU and MVP compute enclaves. In Aim 3, we will develop new approaches to characterize phenotypes using electronic health records and surveys from AoU and MVP, assess the impact of including social determinants of health in our models, and prospectively evaluate them with new AoU and MVP participants. To achieve these goals, we assembled a highly interdisciplinary group of researchers with expertise in Genetics, Genome Biology, Data Sharing Policy and Technology, Health Disparities, Phenotyping, and Statistics. PUBLIC HEALTH RELEVANCE: It is imperative to understand the underlying sources of the large health disparities among individuals from different racial and ethnic groups living in the United States. Complex relationships between genetics, individual behavior, socioeconomic status and the environment influence health. The goal of the CAST (Center for Admixture Science and Technology) program is to improve the utility of genome science for all populations living in the United States.


    The Duke FUNCTION Center: Pioneering the comprehensive identification of combinatorial noncoding causes of disease

    RM1 HG011123
    Tim Reddy
    Duke University

    Noncoding genetic variation that alters gene regulatory element activity has major impacts on health, disease, and evolution. Because measuring regulatory element activity has long been a major challenge, the mechanisms underlying thousands of genetic associations with disease remain unknown. Recent advances in high-throughput technologies have disruptively advanced the ability to measure the activity of individual regulatory elements, and the first population- and genome-scale uses of those methods are now underway. However, regulatory elements do not act alone. They interact with promoters, other regulatory elements, and the surrounding chromatin, all in ways that are complex and difficult to predict. Though there are now a plethora of technologies to measure the activity of individual regulatory elements, the ability to recapitulate the effects of combinations of regulatory elements is woefully inadequate and severely hinders efforts to establish the gene regulatory contributions to traits and diseases. The goal of the Duke FUNCTION Center of Excellence in Genomic Science is to make the study of the combinatorial activity of regulatory elements routine. Aim 1 is to develop a suite of new technologies to measure the combinatorial effects of regulatory elements in their endogenous genomic contexts. Those technologies will leverage very recent discoveries of CRISPR enzymes other than Cas9 that greatly expand the ability to manipulate the human genome. Aim 2 is to develop the matched computational, statistical, and evolutionary models needed to interpret and predict the measured effects of combinations of regulatory variants on human traits and diseases. Aim 3 is to demonstrate the broad applicability of the technologies developed through case studies of human diseases with prevalence ranging from common to ultra rare. Example case studies will include studies of schizophrenia, rare recessive disorders, and undiagnosed genetic disorders. We will also use a nationwide request for applications to identify Pilot Projects that will expand applications to other disease areas. Aim 4 is to create an electronic platform for distributing results from functional studies of the noncoding genome to the broad research community. The platform will integrate our results with those from studies in other labs and consortia, such as ENCODE; and will enable researchers with diverse expertise to benefit from the Center. Finally, our Education and Outreach Aim is to expand genomics capacity locally and nationally, and with a particular emphasis on increasing use of our new technologies for translational research. The expected outcome of this project will be a paradigm shift in human genetic and genomics in which it will become possible to finally understand the full regulatory complexity that controls the expression of human genes. We anticipate that ability will be particularly powerful for translating genetic associations into disease mechanisms, thus creating a windfall of new knowledge about which genes contribute most to disease, and how to manipulate those genes for therapeutic benefit. Long term, we envision this work being critical to realizing the full potential of whole genome sequencing to detect causes of disease.


    Center for Integrated Cellular Analysis

    1 RM1 HG011014-01
    Rahul Satija
    New York Genome Center

    While rapid advances in single-cell RNA-sequencing are yielding comprehensive taxonomies of cell states in the human body, understanding the complex molecular and environmental factors that regulate cell behavior remains a central challenge. New methods for simultaneous measurement of multiple molecular modalities, spatial context, and lineage relationships are needed to address this goal, but are currently outside the scope of present technologies which largely focus on a single data type. We propose to create a Center for Integrated Cellular Analysis, with a mission to develop a comprehensive suite of technologies and analytical methods to measure and integrate the molecular and environmental determinants of cellular identity. To achieve these goals, we propose the following series of synergistic Aims that will be developed in parallel: 1) Develop massively parallel assays to simultaneously profile multiple molecular components across millions of cells; 2) Identify the spatial and environmental determinants of cellular state in complex interacting populations; 3)  Develop scalable platforms to profile inherited molecular components, and determine the role of cell lineage in establishing molecular and phenotypic differences across cells; and 4) Develop methods to harmonize single cell profiles across distinct modalities, enabling the inference of cellular identity. Our Center will address critical challenges in data integration, and produce software and protocols that will be applicable to diverse biological  systems. We will share these resources broadly with the community, alongside a broader educational focus to encourage New York City students from under-represented backgrounds to pursue academic training in Genomics and Systems Biology.


    Center for Genome Imaging

    RM1 HG011016
    Ting Wu
    Harvard Medical School

    Three-dimensional (3D) genome organization is a major contributor to genome function, and yet, we are only at the very dawn of discovering the structural signatures that underlie that organization. Thus, the goal of the proposed studies is to develop and apply tools that will enable sequence-specific imaging of human genomes, in their entirety, with high genomic resolution. In particular, the proposed work will innovate methods for fixed and live cell imaging using diffraction-limited light microscopy and super-resolution microscopy as well as develop new tools for image analysis and genome modeling. To this end, it will involve the continued collaboration of four laboratories, whose collective breadth of expertise covers the fields of classical and molecular genetics, chromosome dynamics, imaging, Hi-C analysis, convolutional neural networks, and polymer physics-based and restraint-based modeling. An equally important objective of the proposed studies is to ensure a generation of researchers whose personal breadth of expertise will come to match that of the entire current team.

    Health relatedness: Will a solid grasp of 3D genome organization have implications for under- standing human development? Will it contribute to the protection of human health? Will it contribute to strategies for early diagnostics and perhaps even the development of new therapies? The answer to all these questions is almost certainly a resounding Yes, as knowledge of 3D genome organization will enhance our capacity to address both fundamental biological processes as well as disease.

    Innovation: An abundance of studies argue that genomes function as integrated units and, yet, no extant technologies enable sequence-specific imaging of entire genomes at high genomic resolution. Thus, the capacity of researchers to fathom the interplay between 3D genome organization and genome function has been limited to disjointed snapshots of localized events. Accordingly, first three aims will develop the next tier of tools to put entire genomes within reach. They will advance a new method, OligoFISSEQ, and then integrate it with OligoSTORM and OligoDNA-PAINT to finally achieve high-throughput imaging at both conventional and super-resolution. They will also tackle two genomic features that have been prohibitively difficult to capture – presence of homologs in diploid cells and highly repeated sequences – as well as innovate strategies for high volume data storage, image processing and analysis, and modeling. Finally, a fourth aim will implement methods for disseminating our tools.

    1. Scaling technologies toward whole genome imaging
       
    2. Filling in gaps to visualize chromosomes end-to-end – tackling homologs and repeats
       
    3. Probe design, image analysis, modeling, and integration of epigenetic data
       
    4. Training, resources, and opportunities for engaging colleagues in whole genome imaging
       

Previous CEGS Awards

Below is a list of previous Centers of Excellence in Genomic Science (CEGS) grant awards. Grant numbers link to the NIH RePORT system, where abstracts, other information about the awards and resulting publications are included.

While active CEGS have their own institutional websites, some are not maintained once the grant ends. If a website still exists, it will be linked from the project title.

P50HG006193

Center for Cell Circuits
Aviv Regev
The Broad Institute, Cambridge, Massachusetts


RM1HG008525

Center for Genomically Engineered Organs
George M. Church
Harvard Medical School


RM1HG007743

Center for Photogenomics
John A. Stamatoyannopoulos
The Altius Institute              


P50MH106933

Neuropsychiatric Genome-Scale and RDOC Individualized Domains (N-GRID)
Kohane, Isaac
Harvard Medical School


P50HG004233

Genomic Analysis of Network Perturbations Human Disease
Vidal, Marc
Dana-Farber Cancer Institute, Boston


P50HG002351

Center for the Study of Natural Genetic Variation Olson, Maynard V.
University of Washington


P50HG002357

Analysis of Human Genome Using Integrated Technologies
Snyder, Michael P.
Yale University / Stanford University


P50HG002360

CEGS: Microscale Life Sciences Center
Meldrum, Deirdre R.
University of Washington / Arizona State University-Tempe Campus


P50HG002370

Center for Genomic Experimentation and Computation Brent, Roger
VTT/MSI Molecular Sciences Institute


P50HG002568

Genomic Basis of Vertebrate Diversity
Talbot, William S. / Kingsley, David M.
Stanford University


P50HG002790

Implications of Haplotype Structure in the Human Genome
Waterman, Michael S. / Tavaré, Simon
University of Southern California


P50HG002806

Genomic Approaches to Neuronal Diversity and Plasticity
Ju, Jingyue
Columbia University Health Sciences


P50HG003170

Molecular and Genomic Imaging Center
Church, George M.
Harvard Medical School


P50HG003233

Center for the Epigenetics of Common Human Disease
Feinberg, Andrew P.
Johns Hopkins University


P50HG004071

Center for in Toto Genomic Analysis of Vertebrate Development
Bronner-Fraser, Marianne / Fraser, Scott E.
California Institute of Technology


P50HG004952

Wisconsin Center of Excellence in Genomics Science
Olivier, Michael
Medical College of Wisconsin / Texas Biomedical Research Institute


P50HG005550

Causal Transcriptional Consequences of Human Genetic Variation
Church, George M.
Harvard Medical School


P50MH090338 / P50HG006582

An Interdisciplinary Program for Systems Genomics of Complex Behaviors
Pardo-Manuel de Villena, Fernando
University of North Carolina, Chapel Hill

Last updated: February 7, 2024