On April 14, 2003 the National Human Genome Research Institute (NHGRI), the Department of Energy (DOE) and their partners in the International Human Genome Sequencing Consortium announced the successful completion of the Human Genome Project.
A genome is an organism's complete set of deoxyribonucleic acid (DNA), a chemical compound that contains the genetic instructions needed to develop and direct the activities of every organism. DNA molecules are made of two twisting, paired strands. Each strand is made of four chemical units, called nucleotide bases. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). Bases on opposite strands pair specifically; an A always pairs with a T, and a C always with a G.
The human genome contains approximately 3 billion of these base pairs, which reside in the 23 pairs of chromosomes within the nucleus of all our cells. Each chromosome contains hundreds to thousands of genes, which carry the instructions for making proteins. Each of the estimated 30,000 genes in the human genome makes an average of three proteins.
Sequencing means determining the exact order of the base pairs in a segment of DNA. Human chromosomes range in size from about 50,000,000 to 300,000,000 base pairs. Because the bases exist as pairs, and the identity of one of the bases in the pair determines the other member of the pair, scientists do not have to report both bases of the pair.
The primary method used by the HGP to produce the finished version of the human genetic code is map-based, or BAC-based, sequencing. BAC is the acronym for "bacterial artificial chromosome." Human DNA is fragmented into pieces that are relatively large but still manageable in size (between 150,000 and 200,000 base pairs). The fragments are cloned in bacteria, which store and replicate the human DNA so that it can be prepared in quantities large enough for sequencing. If carefully chosen to minimize overlap, it takes about 20,000 different BAC clones to contain the 3 billion pairs of bases of the human genome. A collection of BAC clones containing the entire human genome is called a "BAC library."
In the BAC-based method, each BAC clone is "mapped" to determine where the DNA in BAC clones comes from in the human genome. Using this approach ensures that scientists know both the precise location of the DNA letters that are sequenced from each clone and their spatial relation to sequenced human DNA in other BAC clones.
For sequencing, each BAC clone is cut into still smaller fragments that are about 2,000 bases in length. These pieces are called "subclones." A "sequencing reaction" is carried out on these subclones. The products of the sequencing reaction are then loaded into the sequencing machine (sequencer). The sequencer generates about 500 to 800 base pairs of A, T, C and G from each sequencing reaction, so that each base is sequenced about 10 times. A computer then assembles these short sequences into contiguous stretches of sequence representing the human DNA in the BAC clone.
This is intentionally not known to protect the volunteers who provided DNA samples for sequencing. The sequence is derived from the DNA of several volunteers. To ensure that the identities of the volunteers cannot be revealed, a careful process was developed to recruit the volunteers and to collect and maintain the blood samples that were the source of the DNA.
The volunteers responded to local public advertisements near the laboratories where the DNA "libraries" were prepared. Candidates were recruited from a diverse population. The volunteers provided blood samples after being extensively counseled and then giving their informed consent. About 5 to 10 times as many volunteers donated blood as were eventually used, so that not even the volunteers would know whether their sample was used. All labels were removed before the actual samples were chosen.
The main goals of the Human Genome Project were first articulated in 1988 by a special committee of the U.S. National Academy of Sciences, and later adopted through a detailed series of five-year plans jointly written by the National Institutes of Health and the Department of Energy. At this time, the principal goals laid out by the National Academy of Sciences have been achieved, including the essential completion of a high-quality version of the human sequence. Other goals included the creation of physical and genetic maps of the human genome, which were accomplished in the mid-1990s, as well as the mapping and sequencing of a set of five model organisms, including the mouse. All of these goals have been achieved within the time frame and budget first estimated by the NAS committee.
Notably, quite a number of additional goals not considered possible in 1988 have been added along the way and successfully achieved. Examples include advanced drafts of the sequences of the mouse and rat genomes, as well as a catalog of variable bases in the human genome.
Yes - within the limits of today's technology, the human genome is as complete as it can be. Small gaps that are unrecoverable in any current sequencing method remain, amounting for about 1 percent of the gene-containing portion of the genome, or euchromatin. New technologies will have to be invented to obtain the sequence of these regions.
However, the gene-containing portion of the genome is complete in nearly every functional way for the purposes of scientific research and is freely and publicly available. Even though the Human Genome Project is now completed, scientists will continue to develop and apply new technologies to the few remaining refractory problems. For its part, NHGRI will continue to support a wide range of research to develop new sequencing technologies, to interpret the human sequence and to use the newfound understanding of the human genome to improve human health.
On June 26, 2000, the International Human Genome Sequencing Consortium announced the production of a rough draft of the human genome sequence. In April, 2003, the International Human Genome Sequencing Consortium is announcing an essentially finished version of the human genome sequence. This version, which is available to the public, provides nearly all the information needed to do research using the whole genome.
The difference between the draft and finished versions is defined by coverage, the number of gaps and the error rate. The draft sequence covered 90 percent of the genome at an error rate of one in 1,000 base pairs, but there were more than 150,000 gaps and only 28 percent of the genome had reached the finished standard. In the April 2003 version, there are less than 400 gaps and 99 percent of the genome is finished with an accuracy rate of less than one error every 10,000 base pairs. The differences between the two versions are significant for scientists using the sequence to conduct research.
Every part of the genome sequenced by the Human Genome Project was made public immediately - in fact, new data on the genome is posted every 24 hours. It is true that private companies have filed thousands of patents on human genes over the past several years. We don't know how many such patents have been filed, whether the patents will be awarded or if they're enforceable. Most of the patent applications have not been acted upon, so we really don't know how much, if any, of the genome can be used freely for commercial purposes.
The Human Genome Project could not have been completed as quickly and as effectively without the strong participation of international institutions. In the United States, contributors to the effort include the National Institutes of Health (NIH), which began participation in 1988 when it created the Office for Human Genome Research, later upgraded to the National Center for Human Genome Research in 1990 and then the National Human Genome Research Institute (NHGRI) in 1997; and the U.S. Department of Energy (DOE), where HGP discussions began as early as 1984. However, almost all of the actual sequencing of the genome was conducted at numerous universities and research centers throughout the United States, the United Kingdom, France, Germany, Japan and China.
The International Human Genome Sequencing Consortium includes:
- The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U. K.
- Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S.
- United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S.
- Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, Tex., U.S.
- RIKEN Genomic Sciences Center, Yokohama, Japan
- Genoscope and CNRS UMR-8030, Evry, France
- GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA
- Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
- Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China
- Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash.
- Stanford Genome Technology Center, Stanford, Calif., U.S.
- Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, Calif., U.S.
- University of Washington Genome Center, Seattle, Wash., U.S.
- Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
- University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S.
- University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, Okla., U.S.
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y., U.S.
- GBF - German Research Centre for Biotechnology, Braunschweig, Germany
In 1990, Congress established funding for the Human Genome Project and set a target completion date of 2005. Although estimates suggested that the project would cost a total of $3 billion over this period, the project ended up costing less than expected, about $2.7 billion in FY 1991 dollars. Additionally, the project is being completed more than two years ahead of schedule.
It is also important to consider that the Human Genome Project will likely pay for itself many times over on an economic basis - if one considers that genome-based research will play an important role in seeding biotechnology and drug development industries, not to mention improvements in human health.
Since the beginning of the Human Genome Project, it has been clear that science's expanding knowledge of the genome would have a profound impact upon humanity. To maximize the potential for beneficial effects while minimizing the risk of detrimental effects, it was essential that research be conducted to investigate a wide range of issues related to the acquisition and use of genomic information.
Five percent of the annual budget of the NHGRI is dedicated to examining ethical, legal and social implications (ELSI) related to human genome research, incorporating specific recommendations into the activities of NHGRI and providing guidance to policymakers and the public. The ELSI program at NHGRI, which is considered unprecedented in biomedical science in terms of scope and level of priority, provides an effective basis from which to assess the implications of genome research, and has resulted in several notable improvements to the HGP.
An example is the decision to sequence the DNA of several anonymous individuals, rather than a known individual, in order to protect privacy. Another example is the development of widely used genetic privacy guidelines and draft legislation. The ELSI program at NHGRI now serves as a model for large, publicly funded science efforts.
Having the essentially complete sequence of the human genome is similar to having all the pages of a manual needed to make the human body. The challenge to researchers and scientists now is to determine how to read the contents of all these pages and then understand how the parts work together and to discover the genetic basis for health and the pathology of human disease. In this respect, genome-based research will eventually enable medical science to develop highly effective diagnostic tools, to better understand the health needs of people based on their individual genetic make-ups, and to design new and highly effective treatments for disease.
Individualized analysis based on each person's genome will lead to a very powerful form of preventive medicine. We'll be able to learn about risks of future illness based on DNA analysis. Physicians, nurses, genetic counselors and other health-care professionals will be able to work with individuals to focus efforts on the things that are most likely to maintain health for a particular individual. That might mean diet or lifestyle changes, or it might mean medical surveillance. But there will be a personalized aspect to what we do to keep ourselves healthy. Then, through our understanding at the molecular level of how things like diabetes or heart disease or schizophrenia come about, we should see a whole new generation of interventions, many of which will be drugs that are much more effective and precise than those available today.
It's important to be careful about raising expectations. Most new drugs based on the completed genome are still perhaps 10 to 15 years in the future, although more than 350 biotech products - many based on genetic research - are currently in clinical trials, according to the Biotechnology Industry Organization. It usually takes more than a decade for a company to conduct the kinds of clinical studies needed to win marketing approval from the Food and Drug Administration.
Testing, however, will arrive more quickly, especially the ability to predict individual future health risks, and the ability to implement an enhanced approach to preventive medicine. In the next decade, we may also be better able to determine which drugs work best for individuals, based on their genetic make-up.
Biological research has traditionally been a very individualistic enterprise, with researchers pursuing medical investigations more or less independently. The magnitude of both the technological challenge and the necessary financial investment prompted the Human Genome Project to assemble interdisciplinary teams, encompassing engineering and informatics as well as biology; automate procedures wherever possible; and concentrate research in major centers to maximize economies of scale.
As a result, research involving other genome-related projects (e.g., the International HapMap Project to study human genetic variation and the Encyclopedia of DNA Elements, or ENCODE, project) is now characterized by large-scale, cooperative efforts involving many institutions, often from many different nations, working collaboratively. The era of team-oriented research in biology is here.
In addition to introducing large-scale approaches to biology, the Human Genome Project has produced all sorts of new tools and technologies that can be used by individual scientists to carry out smaller scale research in a much more effective manner.
Yes. We are entering a new age of discovery that will transform human health. Our eventual knowledge about the workings of the genome has the potential to fundamentally change our most basic perceptions of our biological world. It is difficult to predict what will be learned and how future knowledge will be applied, but there can be little doubt that understanding the genome will revolutionize our concept of health and improve the human condition in remarkable ways.
NHGRI's vision for the future, which is being published April 24, 2003 in the journal Nature, details a diverse and exciting landscape of new possibilities. NHGRI will particularly focus on opportunities to translate the results of the Human Genome Project into advances in medicine, including projects that build upon the completed human genome sequence. This is particularly true of projects of a large international scope that require extensive coordination and public investment to ensure that results and discoveries remain freely available in the public domain.
An example is NHGRI's genetic variation mapping project, or HapMap, which will speed the discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease. The HapMap should also be a powerful resource for studying the genetic factors contributing to variation in response to environmental influences, in susceptibility to infection, and in the effectiveness of drugs and vaccines. Another example is the ENCODE project, which aims to create a comprehensive encyclopedia of the functional elements encoded in the DNA sequence, by cataloging the identity and precise location of all of the protein-encoding and non-protein-encoding genes within the genome.
For more detailed information on NHGRI, the Human Genome Project and the future of genomics, go to:
- The NHGRI Web site: www.genome.gov
- The Human Genome Project: www.genome.gov/10001772
- The ENCODE Project: www.genome.gov/ENCODE
- The HapMap Project: www.genome.gov/HapMap
- Sequencing Projects: The NHGRI Genome Sequencing Program (GSP)
- The Celebration of the Genome: www.genome.gov/About/April2003
- Genetic Terms and Definitions: www.genome.gov/glossary.cfm
- Ethical, Legal and Social Implications Research: www.genome.gov/10001618
For more detailed information on DOE's Human Genome Program and the future of genomics, go to:
Top of page
Last Updated: October 30, 2010