2001 Release: First Analysis of Human Genome

National Human Genome Research Institute

National Institutes of Health
U.S. Department of Health and Human Services


International Human Genome Sequencing Consortium Publishes Sequence and Analysis of the Human Genome

February 12, 2001

WASHINGTON, D.C. - The Human Genome Project international consortium today announced the publication of a draft sequence and initial analysis of the human genome - the genetic blueprint for a human being. The paper appears in the Feb. 15 issue of the journal Nature.

The draft sequence, which covers more than 90 percent of the human genome, represents the exact order of DNA's four chemical bases - commonly abbreviated as A, T, C and G - along the human chromosomes. This DNA text influences everything from eye color and height, to aging and disease.

The consortium's initial analysis of this text represents scientists' first global view of the human genomic landscape, with its extraordinary trove of information about human development, physiology, medicine and evolution.

The results reported in this week's Nature represent major progress for the human genome consortium. On June 26, the consortium announced that it had collected roughly 90 percent of the letters of the text for the "Book of Life." The consortium's new achievement represents a further compilation of these letters into the first draft of a readable text.

There are small gaps still remaining in this text, but scientists are already getting a good sense of what the genome landscape looks like and the surprising stories it has to tell. Below are highlights:

The sequence information from the consortium has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as by commercial database companies, providing key information services to biotechnologists. Already, many tens of thousands of genes have been identified from the genome sequence, including more than 30 that play a direct role in human disease.

The scientific work reported here will serve as a basis for research and discovery in the coming decades. Such research will have profound long-term consequences for medicine. It will help elucidate the underlying molecular mechanisms of disease. This in turn will allow researchers to design better drugs and therapies for many illnesses.

But, as the authors of the Nature paper write, "the science is only part of the challenge.We must also involve society at large in the work ahead. We must set realistic expectations that the most important benefits will not be reaped overnight. Moreover, understanding and wisdom will be required to ensure that they are implemented broadly and equitably."

"We are standing at an extraordinary moment in scientific history. It's as though we have climbed to the top of the Himalayas. We can for the first time see the breathtaking vista of the human genome," said Eric Lander, director of the Whitehead Institute Center for Genome Research. "For many years to come, we will be exploring the intricate details of the terrain ahead. We've got a long way to go before we will ultimately understand all the secrets that the genome has to tell us."

"This remarkable achievement is a clear testament to the hard work of the hundreds of scientists in the sixteen genome centers that make up the Human Genome Project consortium," said Francis Collins, director of the National Human Genome Research Institute. "These scientists have proved to the world that they can work together toward a common human good. For, with the human genome sequence in hand, we can begin to build the tools we need to conquer the host of illnesses that cause untold human suffering and premature death."

What's Next?

The consortium's ultimate goal is to produce a completely "finished" sequence with no gaps and 99.99 percent accuracy. Although the near-finished version is adequate for most biomedical research, the HGP has made a commitment to filling all gaps and resolving all ambiguities in the sequence by 2003.

Production of genome sequence has skyrocketed over the past year, with more than 90 percent of the sequence having been produced in the past 15 months alone. Because of this increased capacity, the next phase is expected to move much more rapidly than previously expected.

The HGP also plans to sequence the genomes of many other species, because comparing genomes across species will provide researchers key tools for understanding the essential elements that evolution has designated as important to survival. This information will in turn translate into practical knowledge toward developing better therapies in the future.

As the authors of the Nature paper point out, the draft genome sequence has provided an initial look at the human gene content, but many ambiguities remain. One of the HGP's priorities will be to refine the data to accurately reflect every gene and every alternatively spliced form.

Several steps are needed to reach this ambitious goal, they report. Finishing the human sequence will help, but in addition, scientists will need cross-species comparisons to achieve this goal. A newly formed public-private consortium is speeding this effort, producing freely accessible data that can be readily used for cross-species comparison.

Comparative genomics will also offer scientists insights into important regions in the sequence that perform regulatory functions. Also among the future plans for HGP scientists is the sequencing of other large genomes, such as primates. Scientists also plan to complete the catalogue of human variations in the population and identify the genes that predispose individuals to risk for common diseases.

Finally, the sequence will serve as a foundation for a broad range of functional genomic tools to help biologists to probe the function of the genes in a more systematic manner. Development of such post-genomic tools will be one of the major thrusts for biologists in the next decade, according to the scientists.

The HGP sequencing consortium used a biocluster provided by Compaq Computer Corporation that provided one terabyte of secondary storage and assisted annotation and analysis.

In a related announcement today, the biotech firm Celera Genomics announced that it had published its human genome sequence in the journal Science. The company used a combination of its own data and the consortium's data, available freely online, to assemble its sequence.

Background

Sequencing, which is determining the exact order of DNA's four chemical bases - commonly abbreviated A, T, C and G - has been expedited in the HGP by technological advances in deciphering DNA and the collaborative nature of the effort, which has drawn upon the talents of about 1,000 scientists worldwide.

The Human Genome Sequencing Project aims to determine the sequence of the euchromatic portion of human genome. The "euchromatic" portion excludes certain regions consisting of long stretches of highly repetitive DNA that encode little genetic information. Such regions are said to be "heterochromatic." (Genomes contain long stretches of highly repetitive DNA. For example, the center of chromosomes, called "centromeres," consists of heterochromatic DNA.

The international Human Genome Sequencing Consortium includes scientists at 20 institutions located in France, Germany, Japan, China, Great Britain and the United States. The five largest centers are located at: Baylor College of Medicine, Houston, Texas; Joint Genome Institute in Walnut Creek, CA; Sanger Centre near Cambridge, England; Washington University School of Medicine, St. Louis; and Whitehead Institute, Cambridge, Massachusetts.

The project is funded by grants from government agencies and public charities in the various countries. These include the National Human Genome Research Institute at the U.S. National Institutes of Health (NIH), the Wellcome Trust in England, and the U.S. Department of Energy, as well as agencies in Japan, France, Germany and China.

The total cost for Phase One ("working draft") is approximately $300 million worldwide, with roughly half ($150 million) being funded by the NIH.

The HGP is sometimes reported to have a cost of $3 billion. However, this figure refers to the total projected funding over a 15-year period (1990-2005) for a wide range of scientific activities related to genomics. These include studies of human diseases, experimental organisms (such as bacteria, yeast, worms, flies and mice); development of new technologies for biological and medical research; computational methods to analyze genomes; and ethical, legal and social issues related to genetics. Human genome sequencing represents only a small fraction of the overall 15-year budget.

The institutions that form the International Human Genome Sequencing Consortium include:

  1. Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, MA, USA
  2. The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom
  3. Washington University Genome Sequencing Center, St. Louis, MO, USA
  4. US DOE Joint Genome Institute, Walnut Creek, CA, USA
  5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, TX, USA
  6. RIKEN Genomic Sciences Center, Yokohama-city, Japan
  7. Genoscope and CNRS UMR-8030, Evry Cedex, France
  8. GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, MA, USA
  9. Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
  10. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China
  11. Multimegabase Sequencing Center; The Institute for Systems Biology, Seattle, WA
  12. Stanford Genome Technology Center, Stanford, CA, USA
  13. Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
  14. University Washington Genome Center, Seattle, WA, USA
  15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
  16. University of Texas Southwestern Medical Center at Dallas, Dallas, TX, USA
  17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
  18. Max Planck Institute for Molecular Genetics, Berlin, Germany, USA
  19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, NY, USA
  20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany, USA
Contact:

Geoff Spencer
Phone: (301) 402-0911
E-mail: spencerg@mail.nih.gov

Top of page

Last Reviewed: March 9, 2012