ENCODE Consortium Publishes Scientific Strategy

ENCODE Consortium Publishes Scientific Strategy

New Technology Development Grants Will Aid Quest To Find All Functional Elements in Human DNA

ENCODE logo

BETHESDA, Md., Thurs., Oct. 21, 2004 - A research consortium organized by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), today published a paper in the journal Science detailing the scientific rationale and strategy behind its quest to produce a comprehensive catalog of all parts of the human genome crucial to biological function. Also today, NHGRI announced the award of $5.5 million in technology development grants to provide new tools for the pioneering effort.

In a peer-reviewed article published in the Oct. 22 issue of Science, the ENCyclopedia Of DNA Elements (ENCODE) consortium outlines its plans for achieving its ambitious goal of building a "parts list" of all sequence-based functional elements in the human DNA sequence. The list will include: protein-coding genes; non-protein-coding genes; regulatory elements involved in the control of gene transcription; and DNA sequences that mediate chromosomal structure and dynamics. The ENCODE researchers also anticipate they may uncover additional functional elements that have yet to be recognized.

"Creating this monumental reference work will help us mine and fully utilize the human genome sequence. Such knowledge will lead to a far deeper understanding of human biology and stimulate the development of new strategies for improving human health," said NHGRI Director Francis S. Collins, M.D., Ph.D.

While the completion of the Human Genome Project in April 2003, and the publication of the finished human genome sequence in Nature just this week, marked significant scientific achievements, these are only the first steps toward the ultimate goal of using information about the human genome sequence to diagnose, treat and prevent disease. Over the past several years, researchers have made major strides in using DNA sequence data to help find genes, which are the parts of the genome that code for proteins. The protein-coding component of these genes, however, makes up just a small fraction of the human genome - about 1.5 percent. There is strong evidence that other parts of the genome have important functions, but very little information exists about where these other "functional elements" are located and how they work. The ENCODE project aims to address this critical goal of genomics research.

Launched in September 2003, the ENCODE project is being implemented in three phases: a pilot phase, a technology development phase and a production phase. In the pilot phase, which is expected to last three years, ENCODE researchers are devising and testing high-throughput ways of efficiently applying known approaches to identify functional elements. Their collaborative efforts are centered on 44 DNA targets, which together cover about 1 percent of the human genome, or about 30 million base pairs. The target regions were strategically selected to provide a representative cross section of the entire human genome sequence. Simultaneously, in the second phase of the ENCODE Project, the technology development component, other research groups are striving to develop new technologies designed to widen the array of novel methods and technologies available to be applied to the ENCODE project. Guided by the results of the first two phases, NHGRI will decide how to initiate the production phase and expand the ENCODE project to analyze the remaining 99 percent of the human genome.

"Major challenges lie ahead on the road to a complete encyclopedia of DNA elements," said Elise A. Feingold, Ph.D., NHGRI's program director in charge of the ENCODE project. "Such work is well beyond the scope of any single group. However, by bringing together researchers with a broad range of interests and expertise to work in a highly collaborative setting, we expect that the ENCODE consortium will have the power to achieve a goal of this magnitude."

Among the many hurdles facing the ENCODE consortium is the complexity of the problem. No single experimental approach can be used to identify all functional elements, and many current methods may not provide a cost effective means of finding functional elements in a target as large as the human genome. Furthermore, many functional elements are only active in certain types of cells or at certain stages of development, which means it may be necessary to analyze many different types of human cells. In addition, if a truly comprehensive inventory is to be created, more work needs to be done to learn about functional elements not surveyed in the pilot project, including centromeres (the middles of chromosomes) and telomeres (the ends of chromosomes). In their Science article, ENCODE researchers set forth their plans for addressing these and other challenges.

NHGRI has designated the ENCODE project as a community resource project, which means that all data generated for this project will be deposited in free, public databases as soon as they are experimentally verified. "During the Human Genome Project, our policy of rapid data release enabled researchers to take advantage of human genomic sequence data as soon as they were produced. Similarly, the ENCODE consortium will make valuable data rapidly available for use by scientists around the world," said Mark S. Guyer, Ph.D., director of NHGRI's Division of Extramural Research.

Also today, NHGRI announced the award of a second set of ENCODE technology development grants, which are intended to complement the first set of technology development grants made in 2003 by adding more novel methods and technologies to the consortium's "tool box." "These grants are aimed at broadening the types of functional elements that we are studying under ENCODE and also expanding the portfolio of technologies that we can apply to them," said Peter Good, Ph.D., NHGRI's program director for genome informatics.

Recipients of the 2004 ENCODE Technology Development Grants and their total approximate funding are:

Joseph R. Ecker, Ph.D., The Salk Institute, La Jolla, Calif.
"Genome Wide Analysis of DNA Methylation" - $1.5 million (3 years)

Vishwanath Iyer, Ph.D., University of Texas, Austin
"Sequence Tag Analysis of Genomic Enrichment (STAGE) and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) for Regulatory Element Identification" - $1.3 million (3 years)

Yijun Ruan, Ph.D., Genome Institute of Singapore
"Di-tag Technologies for Complete Transcriptome Annotation" - $1 million (3 years)

Thomas Tullius, Ph.D., Boston University
"Structure of Genomic DNA at Single-Nucleotide Resolution" - $870,000 (3 years)

Madaiah Puttaraju, Ph.D., Intronn Inc., Gaithersburg, Md.
"Use of RNA Trans-splicing to Identify Splice Sites" - $420,000 (2 years)

Scott Tenenbaum, Ph.D., University at Albany, State University of New York
"Identifying Functional Regulatory Elements in RNA" - $410,000 (2 years)

The ENCODE consortium currently is comprised of several research teams in the United States, as well as groups in Canada, Singapore, Spain and the United Kingdom. The collaborative effort is open to all interested researchers in academia, government and industry who agree to abide by the consortium's guidelines.

For more detailed information on the ENCODE project, including a complete list of participants and the consortium's data release and accessibility policies, go to: www.genome.gov/ENCODE. ENCODE data that can be directly linked to genomic sequence will be made available at the University of California, Santa Cruz ENCODE Genome Browser (www.genome.ucsc.edu/ENCODE) and the ENSEMBL Browser (www.ensembl.org).

NHGRI is one of 27 institutes and centers at NIH, an agency of the Department of Health and Human Services. The NHGRI Division of Extramural Research supports grants for research and for training and career development at sites nationwide. Additional information about NHGRI can be found at: www.genome.gov.

Contact:

Geoff Spencer
NHGRI
(301) 402-0911
spencerg@mail.nih.gov

Last updated: March 12, 2010