PLos Biology, April 19, 2011
New user's guide and tutorial helps disease researchers interpret human genome
If the human genome were a car, then the ENCyclopedia of DNA Elements (ENCODE) would be its 'parts list.' Instead of carburetors and gaskets, ENCODE catalogs elements that are part of the genome that control gene activity in cells in response to their circumstances. Many scientists believe that using precise tools such as ENCODE will help them understand how the genome functions and, in time, enable the discovery of much-needed cures and treatments. To make the ENCODE catalog a more researcher-friendly resource, the ENCODE research consortium, organized by the National Human Genome Research Institute (NHGRI), has published a user's guide and tutorial in the April 19, 2011 issue of PLoS Biology. The user's guide and tutorial work in tandem with online training materials at the ENCODE Portal [genome.ucsc.edu], developed by Open Helix and the University of California, Santa Cruz, the ENCODE Data Coordinating Center.
"With release of this guide and tutorial, ENCODE has taken an important step toward making its data more accessible to a wider range of researchers who are focused on the biology of human disease," said Eric D. Green, M.D., Ph.D., NHGRI director. "This is an essential step towards making fundamental genomic discoveries about the causes of disease and eventually developing treatments."
For many years, researchers thought that protein-coding regions of genes were the primary places to look for answers to the question of what causes diseases. However, as genome research has advanced, it has become clear that other regions outside of those that encode proteins are also important in influencing gene expression and disease.
ENCODE was launched, first as a pilot project in 2003 focused on one percent of the human genome, and then expanded in 2007 to whole-genome analysis, to identify all of the functional elements that exist in the human genome. In 2007, ENCODE findings from the pilot project challenged the traditional view that most of the interesting (consequential) "action" is located at or near the protein-coding regions of genes and revealed a striking complexity in the organization of the genome. It pointed to an intricate network in which genes, along with regulatory elements and other types of DNA sequences that do not code for proteins interact in overlapping ways.
In this new publication, the team describes the data being generated on the entire human genome and provides examples of how these data are shining a new light on important biological questions. As an example, this work shows how the data can be immediately useful in interpreting associations between single nucleotide (single DNA building blocks) differences in individuals and disease. For example, ENCODE data confirm recent observations by other groups that variations in regions of DNA very far from the protein-coding region of the MYC gene change the binding of transcription factor proteins to a genetic control region, leading to changes in expression of the MYC gene and therefore to oncogenesis, the process by which normal cells are turned into cancerous cells. These findings suggest a mechanism for genetic association with multiple cancers lying outside of the protein-coding sequence of a gene.
Similar studies are now possible for the thousands of variants identified in genome-wide association studies, in which the genome of different individuals — those with the disease being studied and healthy controls — are examined to identify variants associated with a disease. The ENCODE catalog can then be linked to genome-wide association study results to help identify the underlying biological changes that lead to disease. As the individual cellular parts and biological pathways give way to new understandings, researchers will be able to develop treatments that specifically target the deleterious changes.
Key features of the data include comprehensive mapping of:
- Protein-coding genes — Proteins are molecules made of amino acids linked together in a specific sequence; the amino acid sequence is encoded by the sequence of DNA subunits called nucleotides that make up genes.
- Non-coding genes — Stretches of DNA that are read by the cell as if they were genes but do not encode proteins. These appear to help regulate the activity of the genome.
- Chromatin structure features — Complex physical structures made from a combination of DNA and binding proteins that make up the contents of the nucleus and affects genome function.
- Histone modifications — Histones are the proteins that make up the chromatin structures that help shape and control the genome. In addition, histone proteins can be physically modified by adding chemical groups, such as a methyl molecule, that further regulates genomic activity.
- DNA methylation — Just like histones, methyl groups can be added to DNA itself in a process called DNA methylation. Chemically attaching methyl groups to DNA physically changes the ability of enzymes to reach the DNA and thus alters the gene expression pattern in cells. Methylation helps cells "remember what they are doing" or alter levels of gene expression, and it is a crucial part of normal development and cellular differentiation in higher organisms.
- Transcription factor binding sites — Transcription factors are proteins that bind to specific DNA sequences, controlling the flow (or transcription) of genetic information from DNA to mRNA. Mapping the binding sites can help researchers understand how genomic activity is controlled.
The new tutorials are expected to help researchers, who are not experts in these areas of biological research, use the insights into cellular regulation of the genome and the networks of biological interactions of the parts identified by ENCODE to advance their own studies.
"This project requires collaboration from multiple people all over the world at the cutting edge of their fields, working in a coordinated manner to figure out the function of our human genome," said Dr. Richard Myers, president and director of the HudsonAlpha Institute for Biotechnology and one of the 25 principal investigators of the project. "The importance extends beyond basic knowledge of whom and what we are as humans and into understanding of human health and disease."
All ENCODE data are available and free for immediate use via downloading of data files [genome.gucsc.edu], visualization in the UCSC genome browser [genome.ucsc.edu] and data mining with the UCSC Table Browser [genome.ucsc.edu].
To read the study in PLoS Biology, please go to: A User's Guide to the Encyclopedia of DNA Elements (ENCODE) [plosbiology]
Last Reviewed: November 8, 2012