The Human Genome Project international consortium announced today that two billion of the three billion "letters" that constitute the genetic instruction book of humans have been deciphered and deposited into GenBank. GenBank, the public database of DNA sequence operated by the National Institutes of Health (NIH), is accessible freely and without restrictions to all scientists in industry and academia.
The two billionth "letter," or base pair, was deposited earlier this month by the Wellcome Trust's Sanger Centre in Great Britain. The "letter" was a "T," the abbreviation for thymine, one of the four chemicals or bases that make up DNA. The 2,178,076,000 unique base pairs now in GenBank have been mapped to their locations on the 24 human chromosomes.
The Human Genome Project (HGP) is on track to complete the "working draft," which will include 90 percent of the human DNA sequence, with an accuracy of 99.9 percent, in June. The HGP worldwide will invest an estimated $250 million in producing the working draft.
The finished, stand-the-test of time version of the human DNA sequence will be ready on or before 2003. Just four months ago, the HGP reached the one-billionth base pair milestone.
"It's good news that we're moving so fast but it's even better news that researchers throughout the world are using this data now to investigate the genetic underpinnings of health and diseases ranging from Alzheimer's to diabetes," said Dr. Francis Collins, director the National Human Genome Research Institute (NHGRI) at National Institutes of Health (NIH) , in a speech today at the BIO 2000 annual international biotechnology conference in Boston.
Reaching the two billion base pair milestone is "a splendid achievement which will help doctors around the world in their quest to cure disease and advance knowledge," said Dr. Michael Morgan, chief executive of Wellcome Trust Genome Campus in Cambridgeshire, United Kingdom, which hosts the UK contribution to the HGP.
"We are pleased to be contributing to the creation of the scientific infrastructure that will enable the next stage of the biotechnology revolution," said Dr. Ari Patrinos, director of the U.S. Department of Energy's Office of Biological and Environmental Research, which sponsors the Joint Genome Institute in Walnut Creek, Calif. Sequencing, which is determining the exact order of DNA's four chemical bases, commonly abbreviated A, T, C and G, has been expedited in the HGP by technological advances in deciphering DNA and the coalition's collaborative nature, which has resulted in about 1,000 scientists worldwide working together effectively.
Today the HGP assembles 12,000 bases every minute. Twenty years ago, sequencing that many bases would have required one year or more. Three years ago, when pilot projects to evaluate feasibility of large-scale sequencing were initiated by the HGP, deciphering 12,000 bases required 20 minutes.
Scientists throughout the world already are using the human sequence data in GenBank for basic research and disease related studies. Recently the genes responsible for hereditary deafness and cerebral cavernous malformations, an often-fatal vascular disease causing seizures and brain hemorrhages, were detected with data from GenBank.
Scientists are rapidly annotating the human DNA sequence in GenBank with information about the location of specific genes and the genetic variants (called Single Nucleotide Polymorphisms or SNPs) that can provide clues to various health disorders.
Almost 15 billion raw base pairs were sequenced to reach the two billion milestone. HGP scientists decipher each area of a chromosome at least four to five times to insure that the data deposited into GenBank is accurate. The "depth of coverage," as this repeat sequencing is called, also helps the scientists assemble the long stretches of the A, T, C, and G bases. The finished version of the human DNA sequence that the HGP will complete in 2003 will have a greater depth of coverage, with at least eight- to nine-fold coverage for each chromosome region.
The international HGP consortium includes scientists at 16 institutions in France, Germany, Japan, China, Great Britain and the U.S. The five institutions that generate the most sequence are: Baylor College of Medicine, Houston; Washington University School of Medicine, St. Louis; Whitehead Institute, Cambridge, Mass.; Joint Genome Institute in CA; and the Sanger Centre in Great Britain. NHGRI funds the sequencing centers at Baylor, Washington University and Whitehead.
Last Reviewed: September 15, 2006