NIH to expand critical catalog for genomics research

Sheena Faherty, Ph.D. February 02, 2017 PRESS CONTACT

The National Institutes of Health (NIH) plans to expand its Encyclopedia of DNA Elements (ENCODE) Project, which is generating a fundamental genomics resource used by many scientists to study human health and disease.

A Fundamental Genomics Resource

"ENCODE has created high-quality and easily accessible sets of data, tools and analyses that are being used extensively in studies to interpret genome sequences and to understand the consequence of genomic variation," said Elise Feingold, Ph.D., a program director in the Division of Genome Sciences at NHGRI.

"These awards provide the opportunity to strengthen this foundation by expanding the breadth and depth of the resource."

Since launching in 2003, ENCODE has funded a network of researchers to develop and apply methods for mapping candidate functional elements in the genome, and to analyze the enormous database of generated genomic information. The data and tools generated by ENCODE are organized by two groups: a data coordinating center, which houses the data and provides access to the resource through an open-access portal, and a data analysis center, which synthesizes the data into an encyclopedia for use by the research community.

Pending the availability of funds, NHGRI plans to commit up to $31.5 million in the current fiscal year (FY17) for these awards. With this funding, ENCODE will expand the scope of these efforts to include characterization centers, which will study the biological role that candidate functional elements may play, and develop methods to determine how they contribute to gene regulation in a variety of cell types and model systems. Additionally, the project will enhance the ENCODE catalog by developing a way to incorporate data provided by the research community, and will use biological samples from research participants who have explicitly consented for unrestricted sharing of their genomic data.

At its core, ENCODE is about enabling the scientific community to make discoveries; that is, using basic science approaches to understand genomes at the most fundamental level. Its catalog of genomic information can be used for a variety of research projects - for example, generating hypotheses about what goes wrong in specific diseases or understanding the processes that determine how the same genome sequence is used in different parts of the body to make cells with specialized functions. More than 1,600 scientific publications by the research community have used ENCODE data or tools.

"We found that many of the people that are using the ENCODE resource are doing so for disease studies, and this attests to its translational value," said Mike Pazin, Ph.D., a program director in NHGRI's Division of Genome Sciences.

Identifying the genome's features: the mapping centers

ENCODE's mapping centers have been part of the project since its inception. These groups aim to pinpoint the genomic locations of genes and the regulatory elements that control them. With these new awards, the mapping centers will study a broader diversity of biological samples, including those from individuals with various diseases, as well as highly specialized cells, to expand the catalog of candidate functional elements in the human and mouse genomes.

"In the past, ENCODE has focused on identifying functional elements in healthy individuals; but gene expression may be regulated differently in people that are unhealthy versus those that are healthy," said Dr. Pazin. "Diseased tissues may help with the detection of new functional elements."

"An important aspect of the ENCODE Project is to identify collections of cell types for use in creating an incredibly detailed map of the genome and its features," said Erez Lieberman Aiden, Ph.D., assistant professor in the Department of Genetics at the Baylor College of Medicine and at Rice University, and a first-time ENCODE grantee who will run one of ENCODE's eight mapping centers. "If we put them all together, the whole is much more valuable than the sum of each of the parts."

Dr. Aiden's research focuses on how the genome folds up inside the nucleus in three dimensions. "Scientists tend to think of a chromosome as a long, linear string of letters. In actuality, the string folds up, forming loops and other shapes," he said.

For the roughly two meters of DNA to fit into a cell nucleus, it must be packed into chromatin. This tight packing within the nucleus then can put two parts of the genome - a gene and its regulatory element, for example - in close contact. Having a better understanding of where these loops occur will clarify the relationship of genomic features that were thought to lie far apart.

A more detailed look at genome function: the characterization centers

ENCODE will establish five characterization centers to investigate how large numbers of genomic elements function in specific biological settings. The ENCODE characterization centers will take advantage of newly developed technologies to characterize many elements at the same time.

"We want to try and breathe life into the functional aspects of the catalog that ENCODE has created," said Will Greenleaf, Ph.D., assistant professor in the Genetics Department at Stanford University, and a new ENCODE investigator who will run one of the characterization centers. "Understanding how regulatory elements work together to bring about gene expression is something we're really excited about."

Dr. Greenleaf's research involves changing various candidate regulatory elements through methods like CRISPR-Cas9, a gene-editing technique that can precisely clip out sections of the genome. His group, in collaboration with Michael Bassik, Ph.D.'s group at Stanford University, will then characterize how these cells grow under a variety of conditions to document what happens when regulatory elements are missing from the genome.

"We've sequenced the human genome, but it's written in a language that we don't understand. ENCODE is a way to learn the logic and grammar of that language, so that we can unlock the power of sequencing the genome for understanding both human health and disease," he said.

Analyzing the catalog: the computational analysis projects

In a third facet of ENCODE, researchers will develop computational and statistical approaches that will make the ENCODE catalog even more useful for studying both disease mechanisms and fundamental biology.

"In any given cell type, you may have 30,000-50,000 sites in the genome that modulate gene expression. How do we even think about that? Or visualize that? Or work out which elements in the genome are regulating which genes - and when and how? You have to compute," said Christina Leslie, Ph.D., associate member of the Computational and Systems Biology Program at the Sloan Kettering Institute for Cancer Research who's been awarded funds to work on one of six computational projects.

Her research uses predictive models of gene regulation that incorporate data on which sites in the genome are accessible, and decodes the DNA signals at these sites. Her previous work had investigated this phenomenon looking at the genome in a linear, or one-dimensional, form. Now, she is incorporating data from other ENCODE projects, such as Dr. Aiden's work on mapping chromatin loops in three dimensions, to further our understanding of genome function.

Bringing data to the community: the data coordinating and data analysis centers

Funded by the National Human Genome Research Institute (NHGRI), part of NIH, the ENCODE Project strives to catalog all the genes and regulatory elements - the parts of the genome that control whether genes are active or not - in humans and select model organisms. With four years of additional support, NHGRI builds on a long-standing commitment to developing freely available genomics resources for use by the scientific community.

The ENCODE Project is bringing together laboratories that are generating vast amounts of data with groups that integrate these data through the power of computational research. The data coordinating center and the data analysis center support ENCODE members by connecting all participants to the data, and creating avenues of easy access by the greater research community.

"As a community resource, ENCODE data must be rapidly and freely available to researchers so they can immediately put it to use in their own work. This is where the data coordinating center and data analysis center play such critical roles," said Dan Gilchrist, Ph.D., a program director in NHGRI's Division of Genome Sciences. "Thanks to these centers, ENCODE data are findable, accessible, interoperable and reusable - maximizing their utility to the research community."

Recipients of the awards are:

Mapping Centers

Bradley Bernstein, M.D., Ph.D. and Chad Nusbaum, Ph.D.; Broad Institute, Cambridge, Massachusetts
Erez Lieberman Aiden, Ph.D.; Baylor College of Medicine, Houston
Mats Ljungman, Ph.D.; University of Michigan, Ann Arbor
Richard Myers, Ph.D. and Eric Mendenhall, Ph.D.; HudsonAlpha Institute for Biotechnology, Huntsville, Alabama.; University of Alabama in Huntsville
Yijun Ruan, Ph.D.; The Jackson Laboratory, Bar Harbor, Maine
Michael Snyder, Ph.D.; Stanford University, California
John Stamatoyannopoulos, M.D.; Altius Institute for Biomedical Sciences, Seattle, Washington
Barbara Wold, Ph.D. and Ali Mortazavi, Ph.D.; California Institute of Technology, Pasedena; University of California, Irvine

Characterization Centers

Nadav Ahituv, Ph.D. and Jay Shendure, M.D., Ph.D.; University of California, San Francisco; University of Washington, Seattle
William Greenleaf, Ph.D. and Michael Bassik, Ph.D.; Stanford University, California
John Lis, Ph.D. and Haiyuan Yu, Ph.D.; Cornell University, Ithaca, New York
Len Pennacchio, Ph.D. and Axel Visel, Ph.D.; University of California, Lawrence Berkeley National Laboratory, Berkeley, California
Yin Shen, Ph.D. and Bing Ren, Ph.D.; University of California, San Francisco; Ludwig Institute for Cancer Research/University of California, San Diego School of Medicine

Computational Analysis

Michael Beer, Ph.D.; Johns Hopkins University, Baltimore, Maryland
Christina Leslie, Ph.D.; Sloan Kettering Institute for Cancer Research, New York, New York
Alkes Price, Ph.D. and Soumya Raychaudhuri, M.D., Ph.D.; Harvard University, Cambridge, Massachusetts.; Brigham and Women's Hospital, Boston, Massachusetts
Jonathan Pritchard, Ph.D.; Stanford University, California
Ting Wang, Ph.D., Barak Cohen, Ph.D. and Cedric Feschotte, Ph.D.; Washington University in St. Louis; University of Utah, Salt Lake City
Xinshu Grace Xiao, Ph.D.; University of California, Los Angeles

Data Coordinating Center

J. Michael Cherry, Ph.D.; Stanford University, California

Data Analysis Center

Zhiping Weng, Ph.D. and Mark Gerstein, Ph.D.; University of Massachusetts Medical School, Worcester; Yale University, New Haven, Connecticut

Learn More About: