Explore frequently asked questions and answers about the Human Genome Project and its impact on the field of genomics.
What is a genome?
A genome is an organism's complete set of deoxyribonucleic acid (DNA), a chemical compound that contains the genetic instructions needed to develop and direct the activities of every organism. DNA molecules are made of two twisting, paired strands. Each strand is made of four chemical units, called nucleotide bases. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). Bases on opposite strands pair specifically; an A always pairs with a T, and a C always with a G.
The human genome contains approximately 3 billion of these base pairs, which reside in the 23 pairs of chromosomes within the nucleus of all our cells. Each chromosome contains hundreds to thousands of genes, which carry the instructions for making proteins. Each of the estimated 30,000 genes in the human genome makes an average of three proteins.
What is DNA sequencing?
Sequencing means determining the exact order of the base pairs in a segment of DNA. Human chromosomes range in size from about 50,000,000 to 300,000,000 base pairs. Because the bases exist as pairs, and the identity of one of the bases in the pair determines the other member of the pair, scientists do not have to report both bases of the pair.
The primary method used by the HGP to produce the finished version of the human genetic code was map-based, or BAC-based, sequencing. BAC is the acronym for "bacterial artificial chromosome." Human DNA is fragmented into pieces that are relatively large but still manageable in size (between 150,000 and 200,000 base pairs). The fragments are cloned in bacteria, which store and replicate the human DNA so that it can be prepared in quantities large enough for sequencing. If carefully chosen to minimize overlap, it takes about 20,000 different BAC clones to contain the 3 billion pairs of bases of the human genome. A collection of BAC clones containing the entire human genome is called a "BAC library."
In the BAC-based method, each BAC clone is "mapped" to determine where the DNA in BAC clones comes from in the human genome. Using this approach ensures that scientists know both the precise location of the DNA letters that are sequenced from each clone and their spatial relation to sequenced human DNA in other BAC clones.
For sequencing, each BAC clone is cut into still smaller fragments that are about 2,000 bases in length. These pieces are called "subclones." A "sequencing reaction" is carried out on these subclones. The products of the sequencing reaction are then loaded into the sequencing machine (sequencer). The sequencer generates about 500 to 800 base pairs of A, T, C and G from each sequencing reaction, so that each base is sequenced about 10 times. A computer then assembles these short sequences into contiguous stretches of sequence representing the human DNA in the BAC clone.
Whose DNA was sequenced?
This was intentionally not known to protect the volunteers who provided DNA samples for sequencing. The sequence is derived from the DNA of several volunteers. To ensure that the identities of the volunteers cannot be revealed, a careful process was developed to recruit the volunteers and to collect and maintain the blood samples that were the source of the DNA.
The volunteers responded to local public advertisements near the laboratories where the DNA "libraries" were prepared. Candidates were recruited from a diverse population. The volunteers provided blood samples after being extensively counseled and then giving their informed consent. About 5 to 10 times as many volunteers donated blood as were eventually used, so that not even the volunteers would know whether their sample was used. All labels were removed before the actual samples were chosen.
What were the goals?
The main goals of the Human Genome Project were first articulated in 1988 by a special committee of the U.S. National Academy of Sciences, and later adopted through a detailed series of five-year plans jointly written by the National Institutes of Health and the Department of Energy. The principal goals laid out by the National Academy of Sciences were achieved, including the essential completion of a high-quality version of the human sequence. Other goals included the creation of physical and genetic maps of the human genome, which were accomplished in the mid-1990s, as well as the mapping and sequencing of a set of five model organisms, including the mouse. All of these goals were achieved within the time frame and budget first estimated by the NAS committee.
Notably, quite a number of additional goals not considered possible in 1988 have been added along the way and successfully achieved. Examples include advanced drafts of the sequences of the mouse and rat genomes, as well as a catalog of variable bases in the human genome.
What is a draft vs. finished genome sequence?
On June 26, 2000, the International Human Genome Sequencing Consortium announced the production of a rough draft of the human genome sequence. In April, 2003, the International Human Genome Sequencing Consortium is announcing an essentially finished version of the human genome sequence. This version, which is available to the public, provides nearly all the information needed to do research using the whole genome.
The difference between the draft and finished versions is defined by coverage, the number of gaps and the error rate. The draft sequence covered 90 percent of the genome at an error rate of one in 1,000 base pairs, but there were more than 150,000 gaps and only 28 percent of the genome had reached the finished standard. In the April 2003 version, there are less than 400 gaps and 99 percent of the euchromatic genome is finished with an accuracy rate of less than one error every 10,000 base pairs. The differences between the two versions are significant for scientists using the sequence to conduct research.
Who owns the human genome?
Every part of the genome sequenced by the Human Genome Project was made public immediately, and new information about the genome is posted almost every day in freely accessible databases or published in scientific journals (which may or may not be freely available to the public).
The Supreme Court ruled in 2013 that naturally occurring human genes are not an invention and therefore cannot be patented. However, private companies can apply for patents on edited or synthetic genes, which have been altered significantly from their natural versions to count as a new, patentable, product.
The Human Genome Project could not have been completed s quickly and as effectively without the strong participation of international institutions. In the United States, contributors to the effort include the National Institutes of Health (NIH), which began participation in 1988 when it created the Office for Human Genome Research, later upgraded to the National Center for Human Genome Research in 1990 and then the National Human Genome Research Institute (NHGRI) in 1997; and the U.S. Department of Energy (DOE), where HGP discussions began as early as 1984. However, almost all of the actual sequencing of the genome was conducted at numerous universities and research centers throughout the United States, the United Kingdom, France, Germany, Japan and China.
The International Human Genome Sequencing Consortium included:
- The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.
- The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U. K.
- Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S.
- United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S.
- Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, Tex., U.S.
- RIKEN Genomic Sciences Center, Yokohama, Japan
- Genoscope and CNRS UMR-8030, Evry, France
- GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA
- Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
- Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China
- Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash.
- Stanford Genome Technology Center, Stanford, Calif., U.S.
- Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, Calif., U.S.
- University of Washington Genome Center, Seattle, Wash., U.S.
- Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
- University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S.
- University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, Okla., U.S.
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y., U.S.
- GBF - German Research Centre for Biotechnology, Braunschweig, Germany
How much did it cost?
In 1990, Congress established funding for the Human Genome Project and set a target completion date of 2005. Although estimates suggested that the project would cost a total of $3 billion over this period, the project ended up costing less than expected, about $2.7 billion in FY 1991 dollars. Additionally, the project was completed more than two years ahead of schedule.
It is also important to consider that the Human Genome Project will likely pay for itself many times over on an economic basis - if one considers that genome-based research will play an important role in seeding biotechnology and drug development industries, not to mention improvements in human health.
Why does NHGRI study ethical issues?
Since the beginning of the Human Genome Project, it has been clear that expanding our knowledge of the genome would have a profound impact on individuals and society. The leaders of the Human Genome Project recognized that it would be important to address a wide range of ethical and social issues related to the acquisition and use of genomic information, in order to balance the potential risks and benefits of incorporating this new knowledge into research and clinical care. The Ethical, Legal, and Social Implications (ELSI) program at NHGRI was established in 1990 to oversee research in these areas.
The United States Congress mandates that no less than five percent of the annual NHGRI budget is dedicated to studying the ethical, legal and social implications of human genome research, as well as recommending policy solutions and stimulating public discussion. The ELSI program at NHGRI, which is unprecedented in biomedical science in terms of scope and level of priority, provides an effective basis from which to assess the implications of genome research.
Since its inception the ELSI program at NHGRI has made several notable contributions to the genomics field. Among these are major changes to the way investigators and institutional review boards handle the consent process for genomics studies. Another is key guidance on the NIH’s genomic data sharing policy, notably the need to balance open science with personal privacy and autonomy. The ELSI program has been effective in promoting dialogue about the implications of genomics, and shaping the culture around the approach to genomics in research, medical, and community settings.
What is the future of medical science?
Having the essentially complete sequence of the human genome is similar to having all the pages of a manual needed to make the human body. The challenge to researchers and scientists now is to determine how to read the contents of all these pages and then understand how the parts work together and to discover the genetic basis for health and the pathology of human disease. In this respect, genome-based research will eventually enable medical science to develop highly effective diagnostic tools, to better understand the health needs of people based on their individual genetic make-ups, and to design new and highly effective treatments for disease.
Individualized analysis based on each person's genome will lead to a very powerful form of preventive medicine. We'll be able to learn about risks of future illness based on DNA analysis. Physicians, nurses, genetic counselors and other health-care professionals will be able to work with individuals to focus efforts on the things that are most likely to maintain health for a particular individual. That might mean diet or lifestyle changes, or it might mean medical surveillance. But there will be a personalized aspect to what we do to keep ourselves healthy. Then, through our understanding at the molecular level of how things like diabetes or heart disease or schizophrenia come about, we should see a whole new generation of interventions, many of which will be drugs that are much more effective and precise than those available today.
How did it impact research?
Biological research has traditionally been a very individualistic enterprise, with researchers pursuing medical investigations more or less independently. The magnitude of both the technological challenge and the necessary financial investment prompted the Human Genome Project to assemble interdisciplinary teams, encompassing engineering and informatics as well as biology; automate procedures wherever possible; and concentrate research in major centers to maximize economies of scale.
As a result, research involving other genome-related projects (e.g., the International HapMap Project to study human genetic variation and the Encyclopedia of DNA Elements, or ENCODE, project) is now characterized by large-scale, cooperative efforts involving many institutions, often from many different nations, working collaboratively. The era of team-oriented research in biology is here.
In addition to introducing large-scale approaches to biology, the Human Genome Project has produced all sorts of new tools and technologies that can be used by individual scientists to carry out smaller scale research in a much more effective manner.
Last updated: February 24, 2020