Answering Big Questions

December 5, 2012

NHGRI could be called the institute of big questions - and answering big questions often takes big efforts.

The big question that led to the creation of the National Human Genome Research Institute (NHGRI) was: "What is the sequence of the human genome?" It was a hard question to contemplate in the mid-1980s, less than four decades after the structure of DNA had first been elucidated and at a time when DNA sequencing technologies were in their infancy. Answering the "3 billion letter" question seemed like a herculean task at the time. But NHGRI jumped in to answer that question through its leading role in the Human Genome Project - and the rest is history.

With a human genome sequence in hand, NHGRI then started tackling additional questions: What is the evolutionary history of the human genome? What genomic sequences are critical for genome function? How do genome sequences differ among people? What genomic sequence differences lead to human disease? Basically, how does the human genome work and what variants in our genomes make us more prone to illness?

Researchers had known the genetic code since the 1960s, but the first analyses of the human genome sequence revealed that protein-coding sequences comprise only a very small fraction of the human genome. What is all that other sequence - much of which had been mischaracterized as junk DNA - actually doing?

Meanwhile, the early studies of human genomic variation that followed the Human Genome Project quickly confirmed the relative similarity of genome sequences among individuals, estimating that something like one-tenth of one percent of the genome sequence differs between any two people. Yet logically, in that 0.1 percent - some 3-6 million bases per person - must reside many of the answers about peoples' difference in the genetic susceptibility to disease. But what are those variants and which ones are relevant for human disease?

Big questions often require big efforts to find the big answers. As a funding agency and a guiding force for the Human Genome Project, the research community naturally looked to NHGRI to help organize the work. To tackle these next big questions, NHGRI sought to establish consortia of researchers with well-defined goals. Since big questions often need big numbers to get the statistical rigor needed to produce reliable answers, these projects needed to be big, too.

To get started, NHGRI funded research as part of large, international consortia-style projects: the Encyclopedia of DNA Elements (or ENCODE) to catalog all the functional elements in the human genome and a series of studies to examine genomic variation across the world's populations - the most recent of these efforts is called the 1000 Genomes Project. Both ENCODE and 1000 Genomes produced landmark reports in the second half of 2012, and both have provided some remarkable insights into the biology of the human genome.

ENCODE, for example, found a detectable activity for upwards of 85 percent of the human genome's sequence, the vast majority of which reside in parts of the genome that do not code for protein. The biggest role for all of these 'non-coding sequences' seems to be in regulating the very small percent of DNA bases that do encode proteins. These exquisite controls help explain how the roughly 20,000 genes or so can encode something as complex as the human body. It is notable that a recent paper reported the generation and analysis of the oyster genome sequence, finding that the oyster has about 28,000 genes in its genome - significantly more than the human genome; undoubtedly, the complexity of the human genome (compared to the that of say the oyster) stems from the complexity of its regulatory controls embedded throughout non-coding sequences, something that ENCODE has greatly helped us to catalog and characterize.

To understand genomic variation and its role in health and disease, NHGRI began a series of studies that increased in complexity with improving technologies. In 2002, NHGRI funded the International HapMap Project to build a better catalog of genomic variation, especially single-nucleotide polymorphisms (or SNPs) in four populations around the world. With improving technologies for sequencing DNA and the plummeting costs for genome sequencing, the follow-on 1000 Genomes Project is now sequencing the genomes of around 2,500 people from two dozen populations around the world. In November 2012, the 1000 Genomes Project reported their findings from analyzing the genomes of the first 1,092 people from 14 populations. The researchers identified tens of millions of genomic variants, revealed important information about the positions of these variants relative to functional genomic elements, and established an important foundation for performing studies aiming to understand the genomic basis of disease.

Together, these projects are yielding detailed findings that are painting ever-clearer pictures of how the human genome works and how genomic variants play a role in health and disease. And that is helping NHGRI - and the research community - begin to focus on the next big question: How will genomics be used to advance medical care and improve human health? This is increasingly NHGRI's focus, for which we will continue to ask big questions.

Posted: March 21, 2013

Last updated: March 21, 2013