2011 News Feature: New user's guide and tutorial helps disease researchers interpret human genome

National Human Genome Research Institute

National Institutes of Health
U.S. Department of Health and Human Services

New user's guide and tutorial helps disease researchers interpret human genome

ENCODE logoIf the human genome were a car, then the ENCyclopedia of DNA Elements (ENCODE) would be its 'parts list.' Instead of carburetors and gaskets, ENCODE catalogs elements that are part of the genome that control gene activity in cells in response to their circumstances. Many scientists believe that using precise tools such as ENCODE will help them understand how the genome functions and, in time, enable the discovery of much-needed cures and treatments. To make the ENCODE catalog a more researcher-friendly resource, the ENCODE research consortium, organized by the National Human Genome Research Institute (NHGRI), has published a user's guide and tutorial in the April 19, 2011 issue of PLoS Biology. The user's guide and tutorial work in tandem with online training materials at the ENCODE Portal [genome.ucsc.edu], developed by Open Helix and the University of California, Santa Cruz, the ENCODE Data Coordinating Center.

"With release of this guide and tutorial, ENCODE has taken an important step toward making its data more accessible to a wider range of researchers who are focused on the biology of human disease," said Eric D. Green, M.D., Ph.D., NHGRI director. "This is an essential step towards making fundamental genomic discoveries about the causes of disease and eventually developing treatments."

For many years, researchers thought that protein-coding regions of genes were the primary places to look for answers to the question of what causes diseases. However, as genome research has advanced, it has become clear that other regions outside of those that encode proteins are also important in influencing gene expression and disease.

ENCODE was launched, first as a pilot project in 2003 focused on one percent of the human genome, and then expanded in 2007 to whole-genome analysis, to identify all of the functional elements that exist in the human genome. In 2007, ENCODE findings from the pilot project challenged the traditional view that most of the interesting (consequential) "action" is located at or near the protein-coding regions of genes and revealed a striking complexity in the organization of the genome. It pointed to an intricate network in which genes, along with regulatory elements and other types of DNA sequences that do not code for proteins interact in overlapping ways.

In this new publication, the team describes the data being generated on the entire human genome and provides examples of how these data are shining a new light on important biological questions. As an example, this work shows how the data can be immediately useful in interpreting associations between single nucleotide (single DNA building blocks) differences in individuals and disease. For example, ENCODE data confirm recent observations by other groups that variations in regions of DNA very far from the protein-coding region of the MYC gene change the binding of transcription factor proteins to a genetic control region, leading to changes in expression of the MYC gene and therefore to oncogenesis, the process by which normal cells are turned into cancerous cells. These findings suggest a mechanism for genetic association with multiple cancers lying outside of the protein-coding sequence of a gene.

Similar studies are now possible for the thousands of variants identified in genome-wide association studies, in which the genome of different individuals — those with the disease being studied and healthy controls — are examined to identify variants associated with a disease. The ENCODE catalog can then be linked to genome-wide association study results to help identify the underlying biological changes that lead to disease. As the individual cellular parts and biological pathways give way to new understandings, researchers will be able to develop treatments that specifically target the deleterious changes.

Key features of the data include comprehensive mapping of:

The new tutorials are expected to help researchers, who are not experts in these areas of biological research, use the insights into cellular regulation of the genome and the networks of biological interactions of the parts identified by ENCODE to advance their own studies.

"This project requires collaboration from multiple people all over the world at the cutting edge of their fields, working in a coordinated manner to figure out the function of our human genome," said Dr. Richard Myers, president and director of the HudsonAlpha Institute for Biotechnology and one of the 25 principal investigators of the project. "The importance extends beyond basic knowledge of whom and what we are as humans and into understanding of human health and disease."

All ENCODE data are available and free for immediate use via downloading of data files [genome.gucsc.edu], visualization in the UCSC genome browser [genome.ucsc.edu] and data mining with the UCSC Table Browser [genome.ucsc.edu].

To read the study in PLoS Biology, please go to: A User's Guide to the Encyclopedia of DNA Elements (ENCODE) [plosbiology]

Top of page

Last Reviewed: November 8, 2012