The United States Human Genome Project
The First Five Years: Fiscal Years 1991-1995
The Human Genome Initiative is a worldwide research effort that has the goal of analyzing the structure of human DNA and determining the location of the estimated 100,000 human genes. In parallel with this effort the DNA of a set of model organisms will be studied to provide the comparative information necessary for understanding the functioning of the human genome. The information generated by the human genome project is expected to be the source book for biomedical science in the 21st century and will be of immense benefit to the field of medicine. It will help us to understand and eventually treat many of the more than 4000 genetic diseases that afflict mankind, as well as the many multifactorial diseases in which genetic predisposition plays an important role.
A centrally coordinated project focused on specific objectives is believed to be the most efficient and least expensive way of obtaining this information. In the course of the project much new technology will be developed that will facilitate biomedical and a broad range of biological research, bring down the cost of many experiments, and find application in numerous other fields. The basic data produced will be collected in electronic databases that will make the information readily accessible in convenient form to all who need it.
This report describes the plans for the United States Human Genome Project (HGP) and updates those originally prepared by the Office of Technology Assessment (OTA) and the National Research Council (NRC) in 1988. In the intervening two years, improvements in technology for almost every aspect of genomics research have taken place. As a result, more specific goals can now be set for the first five years.
The plan presented here was prepared jointly by the National Institutes of Health (NIH) and the Department of Energy (DOE), the two agencies that have received earmarked funding for the HGP. Over the last two years, these agencies have developed a highly synergistic and well-integrated approach to carrying out this initiative, as evidenced by the adoption of this common plan.
In order to achieve the scientific goals set out in this report, a number of administrative measures have been put in place. In addition, a newsletter, an electronic bulletin board and a comprehensive administrative database are being set up to facilitate the communication and tracking of progress.
Research centers will be established to promote the collaboration of investigators from diverse disciplines on a major task of the genome program. DOE has already established three large centers in its National Laboratories and NIH will establish 10 to 20 additional centers over the next five years. The centers will become foci for collaboration with investigators at other locations and with industrial organizations that want to develop applications of the research results, thereby creating networks of interrelated projects.
Meetings and workshops will be organized to bring together investigators with common research objectives and to encourage collaboration, exchange of materials and use of common starting materials or protocols wherever these are appropriate. It is expected that mapping and sequencing groups will coalesce around individual human chromosomes or around particular model organisms.
NIH and DOE will continue their synergistic working relationship and will also interact closely with other interested agencies, as well as with genome mapping programs in other countries as they get organized. Close ties with industry and with the medical community have been established, and will continue to be encouraged, to ensure efficient technology transfer. The private sector is involved in this project at all levels from participation in the advisory committees to receipt of grants and contracts.
The overall budget needs for the effort are still anticipated to be the same as those identified by the OTA and the NRC, namely about $200 million per year for approximately 15 years. Fiscal years 1988 to 1990 have been a period for getting organized and getting research under way. The five-year goals that have been specified in this plan are for the period FY 1991 through FY 1995 and assume the program will rapidly reach the level of funding specified above.
In addition to the long-term objectives of the HGP, this plan sets out specific scientific goals to be achieved in the first five years together with the rationale for each goal. Five-year goals have been identified for the following areas that together encompass the HGP:
The specific goals will be reviewed annually and updated as further advances in the underlying technology occur.
The Human Genome Initiative is a worldwide research effort that has the goal of analyzing the structure of human DNA and determining the location of all human genes. In parallel with this effort, the DNA of a set of model organisms will be studied to provide the comparative information necessary for understanding the functioning of the human genome. The information generated by the human genome project is expected to be the source book for biomedical science in the 21st century. It will have a profound impact on and expedite progress in a variety of biological fields, including those such as developmental biology and neurobiology, where scientists are just beginning to understand the underlying molecular mechanisms. The analysis and interpretation of the information will occupy scientists for many years to come. Thus, the maximal benefit of the human genome project will only be achieved if it is surrounded by research efforts that are focused on understanding and taking advantage of the human genetic information.
The human genome project is expected to be of immense benefit to medical science. It will help us to understand and eventually treat many of the more than 4000 genetic diseases that afflict mankind, as well as the many multifactorial diseases in which genetic predisposition plays an important role. New technologies emanating from the genome project will also find application in other fields such as agriculture and the environmental sciences. They will be valuable for assessing the effects of radiation and other environmental factors on human genetic material. It is anticipated that the private sector will derive great benefit from the trained manpower, the data and the techniques developed by the human genome program and will develop many useful applications based on the new knowledge that is produced. Within a few years, DNA sequence information will undoubtedly be a major tool in most areas of basic and applied biological research.
As a result of the enormous strides in basic research on molecular and medical genetics in the last thirty to forty years, technology has advanced to a stage of development at which such a project can realistically be contemplated. Because of the farsighted investment in basic research by the federal government over this time period, the United States is clearly the leader in this field. Pursuit of the HGP will allow the U.S. to remain at the forefront of biomedical science and to train the scientific manpower that will be able to take advantage of the immense opportunities for research and innovation that will emanate from this project.
The possibility of initiating such a major and significant research program was extensively discussed in the scientific community during 1986 and 1987. In the spring of 1987, a Report on the Human Genome Initiative was prepared by the Health and Environmental Research Advisory Committee (HERAC) of the Department of Energy (DOE). In early 1988, further discussion culminated in the publication of two additional widely circulated, influential reports. The Office of Technology Assessment (OTA) report presented a comprehensive and detailed analysis of the scientific developments that had led to the promise of actually being able to "map and sequence" the human genome and presented a number of options as to how the U.S. might pursue such a project. The National Research Council (NRC) report recommended that the U.S. support the research effort and presented an outline for a multi-phase research plan for accomplishing the goal of sequencing human DNA over the course of the following two decades. A report to the director of NIH by the Ad Hoc Advisory Committee on Complex Genomes, also prepared in 1988, essentially concurred with the NRC Report.
In Fiscal Year 1988, the Congress of the U.S. launched the HGP by appropriating funds to both the Department of Energy (DOE) and the National Institutes of Health (NIH) specifically for support of research efforts to determine the structure of complex genomes. In the report accompanying the Senate appropriations bill for FY 1989, the Congress requested the NIH to prepare, by early 1990, a report on the optimal strategy for the conduct of the human genome program. The FY 1990 House Appropriations Committee report also asked the NIH for a comprehensive spending plan by the time of the FY 1991 appropriations hearings. Prepared in response to those requests, the present report contains a summary of the progress that has been made in the field of genome research since the preparation of the OTA and NRC reports and presents a plan for the HGP, with emphasis on the next five year period. Because the two agencies have been collaborating closely for the last two years in the management of the program, this plan was prepared jointly by the NIH and the DOE. The agencies plan to revise the plan approximately annually, based on the latest scientific developments.
It is generally agreed that the overall goal of the Human Genome Initiative is to acquire fundamental information needed to further our basic scientific understanding of human genetics and of the role of various genes in health and disease. The premise is that this can be done much more efficiently, and in a more cost-effective manner, as a targeted and coordinated program. Thus, we obtain valuable basic information in the least expensive way while increasing the "benefit to cost" ratio for genetics research in general.
As refined through the discussions over the last half of the 1980's and defined in the NRC report, the Human Genome Initiative has several interrelated goals:
At the time the NRC and OTA reports were written, the consensus of the scientific community was that state of the art technology was sufficient for the development of detailed genetic and limited physical maps. That technology, however, was not considered sufficient for completion of the physical map of a genome as large and complex as that of the human. Nor was the technology available for DNA sequencing considered to be adequate for the task of sequencing the 3,000,000,000 base pairs of human DNA. At that time, the largest continuous human sequence that had been determined was that of the human growth hormone gene, only 67,000 nucleotides long.
Thus, the NRC committee and others recommended a multi-phase program, in which the initial phase would consist of:
In these recommendations, the task of sequencing the complete human DNA was reserved to a later phase, one that would only be embarked upon if methods could be developed that would allow the sequence to be obtained at a reasonable cost. The overall program was expected to take at least fifteen years to complete. Technology development was to be an integral part of the project throughout.
This general plan is still appropriate, but some of the details must be changed as improvements in the technology have occurred in the last two years. In order to prepare the present report, advisors to, and staff of, the NIH and the DOE have joined forces to examine the state of the science and develop the plan to be followed over the next five years. This document represents the consensus of the two agencies regarding the conduct of genome research and will be updated periodically.
The rosters of the various advisory groups that participated in the plan's development are appended. These are the Health and Environmental Research Advisory Committee (HERAC) of the DOE (Appendix 1), and the Program Advisory Committee on the Human Genome (PACHG) of the NIH (Appendix 2). The primary working group for these committees was the joint subcommittee of the HERAC and the PACHG (Appendix 3) that was specified in the NIH/DOE Memorandum of Understanding (see below and Appendix 4), supplemented by additional experts (Appendix 5).
The plan addresses specific scientific goals to be achieved in the next five years in the following areas:
Also presented are the implementation strategies that will be used to achieve these goals with respect to:
Finally, the report addresses budget projections. The five year period covered by the plan will begin with FY 1991. It is assumed that funding levels for the combined NIH and DOE programs will rapidly reach approximately the level recommended by the NRC report, namely $200 million per year, adjusted for inflation. Although five year goals are presented with some specificity, and although substantial progress has been made in technology development in the past two years, it must be stressed that the next five years will still be a time in which rapid advances in methods and strategies will be necessary if the program is to meet the goals outlined. Extreme flexibility and diligence on the part of program management, for both the research and its administration, will be needed during this period of experimentation and technological development.
The human genome consists of 50,000 to 100,000 genes located on 23 pairs of chromosomes. One chromosome in each pair is inherited from the mother, the other from the father. Each chromosome contains a long molecule of DNA, the chemical of which genes are made. The DNA, in turn, is a double-stranded molecule in which each strand is a linear array of units called nucleotides or bases. There are four different bases, called A,T,G and C and the bases on one strand of DNA are precisely paired with the base on the other strand, so that an A is always opposite T and G opposite C. The order of the four units on the DNA strand determines the information content of a particular gene or piece of DNA. Genes are of differing length, ranging in size from roughly 2,000 to as many as 2,000,000 base pairs. Mapping is the process of determining the position and spacing of genes, or other landmarks, on the chromosomes relative to one another. There are basically two types of maps, genetic and physical, which differ in the methods used to construct them and in the metric that is used to measure the distance between genes. Sequencing is the process of determining the order of the nucleotides, or base pairs, in a DNA molecule.
Although mapping of human genes began early in the twentieth century, it has been intensively pursued only for the past two decades. For most of this period the methods that were developed, though original and ingenious, have been inadequate for comprehensive mapping and have only allowed the construction of relatively crude maps with very little detail. Recently, much more effective technology has been introduced. To date, about 1,700 of the estimated 50,000 to 100,000 human genes (less than 2 percent) have been mapped.
A frequently asked question is: whose genome will be sequenced? The answer is: no one's. The first complete human genome to be sequenced will be a composite of sequences from many sources, most of these being cell lines that have existed in laboratories all over the world for some time. The sequence will thus be a generic sequence representative of humans in general and not of any particular individual. The complete sequence will provide a standard against which other partial sequences can be compared.
It has been suggested that, due to the great variability between individual human beings, a single sequence would not be very useful. While it is true that much valuable insight will come from comparing many different human sequences, the presumption is that functionally important DNA is conserved among humans as it is between humans and mice in those areas that have been studied. DNA regions of particular interest, such as genes involved in genetic disease, will be sequenced from many individuals in the course of research on those diseases. As more information about the extent of human polymorphism accumulates from these and other studies in the next few years, it will be evaluated to determine the impact on strategy for the HGP.
1. Genetic Map
Genetic maps have many uses, including identification of the genes associated with genetic diseases and other biological properties. Genetic maps also form an essential backbone or scaffold that is needed to guide a physical mapping effort.
Genetic maps are constructed by determining how frequently two "markers," such as a physical trait, a particular medical syndrome, or a detectable DNA sequence, are inherited together. Genes which lie close together on a chromosome have a much higher chance of being inherited together than genes that lie farther apart. Genetic studies of families, to determine how frequently two traits are inherited together, lead to the production of "genetic maps" in which distance between genes is measured in centimorgans in honor of the American geneticist Thomas Hunt Morgan. Two markers are one centimorgan apart if they are separated during transmission from parents to children one percent of the time. A centimorgan corresponds to a highly variable physical distance, but the genome-wide average is believed to be roughly one million base pairs.
Among the technical advances that led to the Human Genome Initiative, the development of new genetic mapping tools is prominent. One of the most important innovations has been the introduction of DNA markers such as restriction fragment length polymorphisms, or RFLPs, to detect genetic variation among individuals. Such markers are relatively easy to find in large numbers and have been used to construct genetic maps. In the past two years, advances have continued to be made in this area. New types of DNA markers have been defined and techniques, such as denaturing gradient gel electrophoresis, have been adapted to detect subtle variation in DNA sequences. As a result, the number of useful markers has increased in the past two years. It is estimated that 3000 well-spaced and informative markers will be needed to achieve the goal of a completely linked map where markers are an average of one centimorgan apart, as recommended by the NRC. For the first five years, the genome program has set as its goal the creation of a 2-5 centimorgan map, which would require 600 to 1500 such markers. Each marker should be identified by an STS, as defined in the section on physical mapping. A working group has been established to develop a plan for achieving this goal.
2. Physical Map
The distance between sites on physical maps is measured in units of physical length, such as numbers of nucleotide pairs. Physical maps can be constructed in a variety of different ways. They are used as the basis for the characterization and isolation of individual genes or other DNA regions of interest, as well as to provide the starting material for DNA sequencing. The ability to construct physical maps derives from recombinant DNA techniques that allow the isolation and cloning of DNA fragments, the identification of specific sequence markers on DNA, and the determination of the order of and distance between such markers on a chromosome.
There are several kinds of physical maps, which can be conveniently categorized into two general types. One type describes the order and spacing of markers on a DNA molecule. The cytogenetic map is a map of this type. Based on microscopic analysis, cytogenetic maps record the location of genes or DNA markers relative to visible landmarks on the chromosomes. This is the oldest type of physical map and the resolution (precision in locating markers) is rather low, on the order of 10 million base pairs. Nevertheless, the cytogenetic map is still an extremely valuable tool and markers continue to be mapped in this way. At the recent 10th Human Gene Mapping Workshop, the number of mapped markers was reported to be 4362, as opposed to 2057 only two years ago. Another example of this type of physical map is the long range restriction map, which records the order of and distance between specific sequences, known as restriction sites, on chromosomes. The resolution of long-range restriction maps is between 100,000 and 2 million base pairs.
The second type of physical map consists of a collection of cloned pieces of DNA that represent a complete chromosome or chromosomal segment, together with information about the order of the cloned pieces. There are a variety of techniques for cloning DNA and a number of methods for determining the order of the clones. The technology for constructing overlapping clone sets (known as "contigs") is continually improving. At present, a collection of ordered clones is typically the starting material for sequencing. However, novel approaches that do not require cloning, but still allow the investigator access to the DNA to be sequenced, are under development.
In the past two years, improvements in several techniques, such as pulsed field gel electrophoresis, yeast artificial chromosome cloning, the polymerase chain reaction (PCR), fluorescence in situ hybridization, and radiation hybrid analysis have made the initial stages in the construction of physical maps of large genomes significantly easier and more rapid than was predictable at the time of the NRC recommendations. Currently, in the U.S., there are federally supported research projects to physically map the DNA of all or parts of eleven of the 24 human chromosomes (there are 23 pairs of chromosomes, but the X and the Y sex chromosomes are not like each other, resulting in 24 different chromosomes).
NIH is supporting, through its extramural grants program, projects for physical mapping of three chromosomes (3,4,18), the DOE is supporting projects in the Los Alamos and Livermore National Laboratories to produce complete overlapping clone maps of two others (16, 19), and the two agencies are funding separate but complementary physical mapping efforts on another six chromosomes (5,11,17,21,22,X). These projects involve the construction of physical maps of both categories, using both state of the art techniques and new methods under development. The DOE also supports the preparation of clone libraries representing the various chromosomes at Los Alamos and Livermore.
There are still several technological barriers to the rapid, inexpensive and routine construction of physical maps. One is that the length of DNA over which a continuous, or uninterrupted, set of overlapping clones can be readily established is limited. Typically contigs are small, consisting of between two and six cosmid clones (a cosmid is a type of vector that can carry a maximum of 40 thousand base pairs). To be more than minimally useful, the length of DNA over which the physical map shows continuity , or "connectivity," must be considerably longer.
A challenging, but reasonable, goal for physical mapping research projects is to extend the length of a DNA segment that can be covered by a single contig or spanned by a set of closely spaced, ordered markers, to about 2 million base pairs. If the physical mapping of human chromosomes is to be achieved within the next five years, it is important that current physical mapping efforts give their highest priority to the problem of completing maps, i.e., of achieving uninterrupted continuity of physical mapping data over large regions of DNA.
Another difficulty faced by those trying to assemble physical maps of chromosomes has been the inability to compare the results of one mapping method directly with those of another and to combine maps constructed by two different techniques into a single map. This problem is addressed by a recent proposal that presents a new concept or definition of a useful physical map. According to the proposed system, data from any of a variety of physical mapping techniques can be reported in a common "language." In this system, each mapped element (individual clone, contig, or sequenced region) is defined by a unique "sequence-tagged site" or STS, which is basically a short DNA sequence that has been shown to be unique. A map is then constructed showing the order and spacing of the STS's.
The STS system, as proposed, does appear to have several advantages. The STS map can be represented electronically and stored in a database that is publicly available and contains sufficient information to enable any scientist to recover de novo any mapped chromosomal region in his/her own laboratory. Thus, the proposed STS system will facilitate the scientific community's access to the human physical map. Quality control and project accountability will also be improved because the mapping results reported by any individual laboratory can readily be checked elsewhere. Access to mapped DNA through the information in the STS database will obviate the need for an expensive, long term, centralized repository of clones, although it will not eliminate the need to generate and map such clones nor the need to store them in and distribute them from the laboratory in which they are produced. The proposed STS system will also facilitate the integration of results from different laboratories, regardless of the methods used, to produce a single, useful physical map and will establish a uniform criterion for determining how complete the map of a particular region is. Finally, an STS map may in the future be the appropriate starting point for sequencing of DNA.
The STS proposal is still under discussion in the scientific community and as yet, few, if any, mapping projects have started to use the STS system. Another uncertainty is the additional cost of generating STS markers. The NIH and DOE have established a joint working group to develop more detailed plans for testing and implementing the STS approach to physical mapping.
Over the next five years, in addition to generation of STS maps, efforts should be continued to generate complete contig maps of large regions of the human genome. Because current technology is not yet sufficient for this task, however, it is unclear what fraction of the genome can be cloned and ordered during this time. An STS map, with one STS characterized approximately every 100 thousand base pairs, is an achievable goal. Such a map will assist continued efforts to isolate the intervening DNA.
Three decades ago when Francis Crick and James Watson elucidated the double helix structure of DNA, there was no way of determining the sequence of even short molecules of DNA. Only years later, with the advent of recombinant DNA technology in the early 1970's, was it possible to think of isolating individual genes. That breakthrough, combined with the development of powerful DNA sequencing techniques, provided the technological basis for the Human Genome Initiative.
To date, the only organisms for which a complete DNA sequence has been determined are viruses. The largest published viral genome sequence is that of the Epstein-Barr virus (EBV), a sequence of 170,000 base pairs. Scientists are now attempting to sequence the DNA of certain bacteria, approximately 4,500,000 base pairs long. The size and complexity of human DNA, however, still makes the sequencing of the human genome awesome to contemplate. While many short stretches of human DNA have been sequenced, slightly more than 5 million base pairs in the aggregate, the human genome comprises about 3 billion base pairs of DNA and is nearly 1,000 times larger than that of a bacterial genome.
If such a large amount of DNA is to be sequenced, a substantial increase in the speed and reduction in the cost of sequencing technology will be required. The current cost of DNA sequencing, in laboratories that do it routinely, is estimated to be about $2 to $5 per base pair of finished sequence, that is, sequence whose accuracy has been adequately confirmed. In laboratories that sequence DNA only occasionally, the costs are much higher. The costs of DNA preparation, salaries and overhead are included in these figures. These costs must be reduced below $0.50 a base pair before large scale sequencing will be cost effective.
Sequencing technology has improved significantly in the last two years. Machines that automatically perform the basic steps of identifying the order of the base pairs in appropriately prepared DNA samples are now readily available. In the most advanced laboratories, using these machines, it is possible for one individual to generate about 2000 base pairs of finished DNA sequence per day per machine, starting with properly prepared cloned DNA. One approach to lowering the cost of DNA sequencing is further automation. The maximum reduction in cost of current sequencing technology will come from the creation of a fully automated assembly line for rapid DNA sequencing. Efforts are underway in both DOE and NIH-sponsored projects, as well as in private companies, to automate most of the preparatory steps in the sequencing process through the development of high speed robotic workstations for sample handling.
During the next five years, pilot projects will be undertaken in order to test strategies and develop technologies for larger sequencing projects, with the aim of reducing costs to well below $1 per base pair by the end of the first five year period. The sequencing projects should analyze biologically interesting regions in the size range of 200,000 to 1 million base pairs. In these developmental efforts, it will be more important to complete the sequence of chosen segments rather than to merely obtain a very high number of base pairs of sequence comprising many smaller segments. This approach will maximize the possibility of successfully identifying and developing the technology needed to proceed with large-scale genomic analysis. In addition, the amount of biological information obtained in the sequencing of human DNA in the course of these developmental research programs will be significantly increased if parallel efforts to sequence equivalent regions in the mouse are undertaken. Such comparative approaches will be encouraged.
In order to keep the costs of the HGP within the original estimates, the cost of routine large scale sequencing will ultimately have to be reduced well below $0.50 per base pair. Therefore, sequencing projects larger than these pilot projects, such as the sequencing of an entire human chromosome, will not be considered until the cost of sequencing is reduced to that level. The cost of sequencing will be assessed in five years and a recommendation made as to further technological develop developments needed before large-sequencing projects are undertaken.
It is by no means certain that enhancement of current technology, as described above, will bring the cost of sequencing down sufficiently. Therefore, entirely new approaches to DNA sequencing will also be encouraged. There are a number of techniques that hold some promise, including the use of capillary gel electrophoresis, the use of stable isotopes and mass spectrometry, and new imaging techniques such as scanning tunneling or atomic force microscopy and X-ray imaging. Projects of this sort are being pursued under support from the DOE and the NIH, as well as in private industry.
Determine the sequence of an aggregate of 10 million base pairs of human DNA in large continuous stretches in the course of technology development and validation.
Experience has shown many times over that information derived from studies of the biology of model organisms is essential to interpreting data obtained in studies of humans and in understanding human biology. Research involving microbial, animal, and plant models will continue to provide a basis for analyzing normal gene regulation, genetic diseases and evolutionary processes. For this reason, the human genome program will support mapping and sequencing of the genomes of a select number of non-human organisms.
Research projects that use model organisms will also be valuable to technology development. Since the genomes of these organisms are smaller and simpler than that of the human, they represent excellent systems for the development and testing of procedures needed for the much more complex human genome.
A number of organisms have already been identified as particularly useful models for comparative genetic analyses because a large amount of information about their genetics and molecular biology has already been accumulated. These organisms are bacteria (e.g., Escherichia coli), yeast (e.g., Saccharomyces cerevisiae), the fruit fly Drosophila melanogaster, the worm Caenorhabditis elegans and the laboratory mouse. However, it is fully expected that research projects involving other model organisms will also contribute significantly to the Human Genome Initiative.
Complete physical maps, both long-range restriction maps and an overlapping clone set, are already available for E.coli. Long range restriction maps are available for several other bacteria and overlapping clone sets are being assembled. Extensive over lapping clone sets have also been assembled for both S. cerevisiae and C. elegans. Projects to sequence E. coli DNA have been initiated in both the U.S. and Japan. Sequencing of the DNA of another bacterium, B. subtilis, has begun in a consortium of European laboratories. Another European consortium and an American-Japanese collaborative project have each begun to sequence one of the chromosomes of S. cerevisiae. Finally, a collaborative project, involving a laboratory in the U.S. and one in the United Kingdom, is planned to begin the sequencing of C. elegans DNA.
While the mouse genome is not simpler than that of man, it is particularly useful for comparisons because of the many biological similarities between the mouse and man. The genetic map of the mouse, based on morphological markers, has already led to many insights into human genetics. There is every reason to believe that a physical map of the mouse genome will be equally useful. In order to prepare a physical map of the mouse, a genetic map based on DNA markers will need to be created. This will then lead to the development of a physical map that can be directly compared to the human physical map.
The general methodology used in studying model organisms will be similar to that described under human mapping and sequencing. The same criteria for the need to achieve long range continuity of physical mapping and sequencing projects apply, as do the requirements for reducing costs.
Sequence an aggregate of about 20 million base pairs of DNA from a variety of model organisms, focusing on stretches that are one million base pairs long, in the course of the development and validation of new and/or improved DNA sequencing technology.
Initiative will be genome maps and DNA sequences. For maximum utility, it will be critical to develop appropriate computer tools and information systems for the collection, storage and distribution of the immense amounts of mapping and sequencing data that will be generated in the course of the program. At present, it is not clear whether the most useful product of the Human Genome Initiative will be a single large database or a distributed set of smaller, networked databases. It is also unclear how genome databases will be structured in the future and whether existing databases can be adapted to meet the overall, long-term needs of the Human Genome Initiative, or whether new systems will have to be developed. However, it is certain that genome databases will need to be comprehensive and up-to-date, and, if there are several databases, it will be imperative that they be effectively linked with one another.
In addition to database development, it will be vital to develop new methods and tools for the analysis and interpretation of genome maps and DNA sequences. Successfully addressing both of these areas of genome informatics will require the development of a coordinated national program to make the information and analysis tools from this project readily available to the widest possible range of scientists and physicians in the most useful, timely and cost-effective manner.
While it is currently possible to describe the informatics goals of the Human Genome Initiative in broad terms, considerable refinement will be necessary as this program develops and informatics technology improves over time. A Joint Informatics Task Force (JITF) has been established by NIH and DOE to help the agencies develop detailed informatics programs. The report recommending the establishment of the JITF is included as Appendix 6.
The responsibilities of the JITF will include identification of the uses to which the data will be put and establishment of priorities for both technical objectives and policy areas. Specific issues to be addressed will include: genome database structures, management and services; development of algorithms, software, and hardware for organization and analysis of data; data exchange standards; electronic networks for collection and distribution of genome information; training and education of informatics personnel; and coordination of genome informatics activities among laboratories and agencies. The JITF will also serve as a national focus for interaction with international activities related to genome informatics.
The challenge will be not only to design databases that can meet the growing needs for access and for increasingly sophisticated search capabilities, but also to keep up with the voluminous amount of information that will be produced at ever faster rates. A number of research efforts are in progress to improve database design, software for database access and data entry procedures.
Recently, the National Center for Biotechnology Information (NCBI) was established at the National Library of Medicine to create automated systems for knowledge about molecular biology, biochemistry and genetics; and to pursue research in biological information handling, particularly with respect to human molecular biology. Thus, the mission of the NCBI supports, in part, that of the Human Genome Initiative. Consequently, the efforts of the NCBI will be closely coordinated with the human genome program through the JITF and by frequent staff interactions with the NIH and the DOE.
The plan to map and sequence the entire human genome is predicated on the belief that humankind will benefit immensely from attendant advances in medicine, biological research and biotechnology. Yet, as with any new technology, controversial uses of the information and capabilities that will flow from the Human Genome Initiative also may emerge. Ethical, legal and social issues arise in regard to ways of ensuring that this information is used in the most responsible manner.
Some of the questions that must be considered concern individual privacy and confidentiality. Should information about an individual's genetic makeup become available to others without that person's knowledge and permission? How can we assure that genetic information does not lead to stigmatization or to discrimination in areas such as insurance or employment?
Concerns also arise in connection with the medical applications resulting from the genome program, such as with the anticipated ability to predict a person's future health. Initially, at least, there will be a lapse - in many cases of years -between the ability to diagnose certain genetic disorders and the ability to treat them. How will an individual cope with a devastating diagnosis when no treatment is available? What issues does such a situation raise?
These questions are not new. Physicians and counselors are facing them today when treating patients with genetic and other diseases. However, the greatly increased flow of information about human genetics will make the need to deal with these issues more compelling.
The NIH and DOE human genome programs will provide for the support of studies that investigate concerns such as these. About 3 percent of the genome budget will be available for activities that address ethical, social and legal issues related to the project.
A series of specific recommendations for the research agenda and related activities in the ethics component of the human genome program has been developed by a joint DOE/NIH working group on ethics. These recommendations will guide the program over the next five years and will continue to be refined as the program proceeds. A complete report of the ethics working group is attached (Appendix 7).
The purpose of the ethics component of the human genome program is to:
The program will endeavor to anticipate problems before they arise and develop suggestions for dealing with them that would forestall adverse effects. The approach to accomplishing these objectives will be to:
The Human Genome Initiative is creating the need for a considerable number of scientists and other trained personnel who have the skills to pursue the research goals and apply the information generated by the program. The ability of the U.S. research establishment and industry to take advantage of the products of the human genome project will require highly trained individuals. Scientists with diverse expertise are required: geneticists and molecular biologists, as well as investigators from fields such as physics, chemistry, engineering, mathematics, and computer science. Critically needed are scientists with interdisciplinary skills - those who understand the biological problem at hand and can find solutions by applying skills from other disciplines. Many more technicians also will be required to operate the large amount of technology that the genome program will employ.
The Ad Hoc Program Advisory Committee on Complex Genomes of NIH recommended that research training be an integral part of the human genome program. This recommendation has been reinforced by the current Program Advisory Committee on the Human Genome. In response to these recommendations, the following initiatives have been put in place:
Pre-doctoral training grants in genome research will support training of scientists with the skills needed to carry out basic and applied research related to the goals of the Human Genome Initiative and to apply that knowledge in solving important biomedical research problems. The focus of this training will be interdisciplinary, intended to give students a deeper understanding of how the methods and principles of one or more of the non-biological sciences can interact with those of biology to address research problems related to genomic analysis.
Post-doctoral fellowships in genome research will provide support for training at the post-graduate level. In addition to the customary training for Ph.D. and M.D. degree holders in molecular biology and other areas relevant to genomics research, there will be an effort to attract individuals who wish to pursue interdisciplinary training. Candidates for these grants who are trained in mathematics, computer science, chemistry, physics, or engineering and who want to augment their skills in those fields with th training in biological science to enable them to pursue genome research, will be encouraged. Conversely, biologists who want to acquire research training in biocomputation, instrumentation, biophysics, or other areas related to genome research will be desirable candidates. There also will be fellowship support for individuals interested in the ethical, legal and social implications of genome research.
Senior fellowships will be available to experienced investigators in physics, mathematics, engineering, and biological, chemical or computer science who want to acquire training and experience in another discipline. It is expected that these senior fellows subsequently will use this additional training to develop and broaden their research interests to include problems related to genome analysis.
Training at National Laboratories will be supported by DOE and will be available for both pre-doctoral and post-doctoral individuals who want to learn techniques of genome research.
Short courses will also be needed to provide in-depth training in a defined area. These courses could address the need of individuals to enhance their skills in molecular techniques, computational sciences, and ethics or legal studies. The NIH and the DOE are currently studying the best ways to meet such needs.
Although considerable strides have been made in technology development since the publication of the NRC and OTA reports, there is still a need for further innovation to adapt the technology to large scale projects and to bring the costs down. During the next five years, there will be an emphasis on technology development in all areas of the program. Automation, optimization, cost reduction and other improvements will be supported in areas such as cloning technology, robotics, DNA sequencing, gel technology, software tools, and instrument development. Equally important will be the support of completely novel approaches such as the use of scanning tunnelling microscopy or mass spectrometry for sequencing. The ultimate technology that will be used to sequence the human genome may turn out to be a method that is still on the drawing board.
Rapid transfer of the technology developed under the human genome program to industries that can develop economically and medically useful applications is a major goal of the project. This will occur in a variety of ways ranging from direct federally funded research at private companies to expedited transfer of new technology into the private sector. The human genome project is certain to spawn and nurture parallel efforts on a host of other plant and animal genomes that are of direct commercial interest. Rapid provision of technology and trained personnel will play a most critical role in driving these efforts.
Industry will benefit directly from the availability of scientists trained by the human genome project and by the availability of databases that provide access to the data generated by the project. These databases will be used in many diverse ways to design products for medical and industrial applications. In the coming year, a plan will be developed for technology transfer in regard to inventions produced by the genome project. A variety of mechanisms will be explored for facilitating this transfer, for improving information flow and for identifying potential blocks to efficient transfer. The National Laboratories of the DOE are already working with private sector interests to establish cooperative ventures. The NIH intramural laboratories have similarly developed a system of cooperative research and development agreements with industry.
The biotechnology industry in the U.S. is strong and innovative and has very close ties to the scientists that are doing the research. Indeed, this industry will be a strong participant in all aspects of the project from the beginning. Representatives from industry sit on the advisory committees and industrial scientists have received numerous grants from both NIH and DOE. It is expected that industrial involvement will increase as the project proceeds, especially during the phase of large-scale sequencing.
Transfer of the technology into medical applications will be facilitated where necessary, but will also occur naturally. Many of the scientists supported by the NIH human genome program are physicians or work closely with physicians who are involved in patient care. The various institutes of the NIH all support research on diseases that result from genetic variation and a variety of mechanisms will be used to assure that information is transferred efficiently from the National Center for Human Genome Research to these institutes. A coordinating committee has already been established for this purpose. The National Center for Human Genome Research will be particularly alert to the need to stimulate preparation of reagents for use in the diagnosis and treatment eatment of rare genetic diseases, as such reagents may not be commercially viable.
The National Institutes of Health has a natural interest in the Human Genome Initiative in view of its long history of supporting research in genetics and molecular biology as an integral part of its mission to improve the health of all Americans. The Department artment of Energy has a long-standing program of genetic research directed at improving the ability to assess the effects of radiation and energy-related chemicals on human health. In recognition of these complementary interests, NIH and DOE have agreed to coordinate their individual genome activities.
NIH: The human genome program of the National Institutes of Health was formally established after Congress appropriated earmarked funds to NIH in fiscal year 1988 to conduct research on mapping and sequencing of the human genome.
In October 1988, the Office of Human Genome Research was established to plan and coordinate NIH genome activities in cooperation with other federal agencies, industry, academia and international groups. As of October 1, 1989, the office became an independent funding unit within the NIH with authority to award grants and contracts and was renamed the National Human Genome Research Institute.
To provide ongoing advice from scientific experts and industry representatives, NIH established a permanent Program Advisory Committee on the Human Genome (PACHG) and, because virtually all of the institutes of NIH are involved in research that interacts with the human genome program, an internal NIH Coordinating Committee on the Human Genome also was formed.
While most of the research supported by the NIH genome program will take place at academic, non-profit and for-profit institutions across the country, relevant intramural studies also will be considered for funding under the program.
DOE: The genome program of the Department of Energy started in fiscal year 1987 on a small scale and received earmarked funds for the first time in the fiscal year 1988 appropriation.
DOE's genome activities are represented mainly by multidisciplinary programs under way at three National Laboratories: Lawrence Berkeley Laboratory, Los Alamos National Laboratory and Lawrence Livermore National Laboratory. Additional projects are supported at other National Laboratories, at universities and in the private sector.
Oversight of DOE human genome activities is provided by the Health and Environmental Research Advisory Committee (HERAC). The Office of Health and Environmental Research (OHER), assisted by a steering committee representing the three National Laboratories and extramural grantees, manages the program and administers grants and contracts.
Memorandum of Understanding: Mechanisms for the coordination of human genome activities between DOE and NIH are specified in a 1988 Memorandum of Understanding. A joint advisory subcommittee was established to monitor and coordinate programs. Furthermore, there is extensive formal and informal interagency contact between program administrators. Panels convened by DOE or NIH to review genome research proposals, to assist in program coordination or to provide advice, are attended by representatives of both agencies, and regular joint workshops and meetings on genome-related issues are held.
The NIH and the DOE have had an excellent working relationship with regard to the human genome program in the past and expect that this relationship will become even closer and more useful in the future. The establishment of a joint informatics working group, a joint ethics working group, and a joint mapping working group, in addition to the joint advisory subcommittees called for in the MOU, attest to the close cooperation. Additional joint working groups will be established as needed. Each of these groups will provide information and advice to the parent advisory committees of both agencies.
In August, 1989, a group of NIH and DOE advisors met together with selected other experts to develop a joint plan for the genome project for the next five years. This plan was presented to the advisory committees of both agencies for approval and is contained in the present report to Congress. Each agency will implement its genome program according to this overall scheme. Because of the success of the joint planning exercise and the need for frequent updates, the two agencies will repeat this process at regular intervals to assure continued close coordination.
Although there are areas of overlapping interest between DOE and NIH, there are also clear areas of distinction, based on the respective agency's interests and strengths. The following is a list of major similarities and differences.
While the list of similarities and differences is instructive for showing the great diversity of activities that are included in the genome project, the key to the DOE-NIH relationship is the fact that both agencies are working from the same blueprint. Over the past two years a great deal of synergism has developed between the two agencies, with productive collaborations established between DOE laboratories, NIH supported investigators and industry.
Because the information to be derived from mapping and sequencing the human genome will be of very broad interest and applicability, it is natural that a number of other federal agencies are involved in funding and carrying out activities related to the Human Genome Initiative.
National Science Foundation: The National Science Foundation is interested in the support of projects focused on the scientific infrastructure for genome-related activities. Specific NSF activities have included funding in FY 1989 of a science and technology center dedicated to new technologies for DNA and protein chemistry. NSF is also involved in development of new software and algorithms for database searching and development of special purpose hardware to increase the speed of biological database searches. Recently, the NSF decided to start a program for mapping and sequencing the genome of the model plant system Arabidopsis thaliana in collaboration with NIH and other agencies. This system will be an excellent one for developing and testing technology.
NSF representatives regularly attend the NIH advisory committee and DOE steering committee meetings as liaison members to assure coordination of the programs.
U.S. Department of Agriculture: A growing interest in mapping and sequencing the genomes of plants important to agriculture and forestry led the USDA to establish an Office of Genome Mapping after a planning conference in December 1988. A coordinating committee was formed to devise the goals and scope of USDA's plant genome efforts, which are planned to extend over 10 years at an estimated cost of $500 million. Plant genes that display pest and disease resistance as well as drought tolerance, along with other gene systems of economic importance, will be selected for mapping and sequencing.
The USDA's Agricultural Research Service also has an active animal science division that is interested in genome research. This is expected to be a growing area within USDA.
A liaison member from the USDA attends the NIH Program Advisory Committee meetings and the DOE Human Genome Steering Committee meetings and NIH and DOE staff have attended the various USDA planning meetings. As the USDA program proceeds, closer ties will be established.
The Howard Hughes Medical Institute (HHMI) has played an important role in supporting research and databases of importance to the genome project. Both DOE and NIH have worked with HHMI to coordinate activities and a representative of HHMI has attended almost most all functions sponsored by one or both of the agencies. HHMI has been able to identify a role for itself in areas that are difficult for federal agencies to support, such as the critical funding provided to help the Human Genome Organization (HUGO) get started (also see below).
The Human Genome Initiative is not limited to the United States. Many countries are interested in participating in the project and all are interested in the outcome. Programs with funding are currently underway in the United Kingdom (UK), Italy and the Soviet Union. Funding is expected in the near future from the Commission of the European Community (EC), France and Japan. However, all these programs are small compared to the U.S. program and are currently still in the early stages of organization.
An association of interested scientists from across the world has been formed and incorporated as the Human Genome Organization (HUGO). This organization plans to assist with the international coordination of the various national programs and to develop a number of activities to facilitate this.
While NIH and DOE support HUGO and believe it could be most helpful as a facilitator, international interaction is already proceeding well. Individual investigators have formed numerous collaborations across national lines, almost all genome meetings are international in scope, and the staff responsible for the management of the various national programs have established good lines of communication. For example, NIH staff has been represented at meetings of the EC Working Party and planning meetings in the UK. Both NIH and DOE representatives have attended planning meetings in Italy, Spain, the USSR, and Japan. The EC Working Party and the Medical Research Council in the UK, as well as Canada, have sent representatives to the NIH Program Advisory Committee meetings.
NIH and DOE have a policy of welcoming international collaboration in the basic research aspects of the human genome project. Because it is desirable to encourage other countries to contribute financially to this project, the agencies have decided that they will, in general, not fund a foreign research project unless it will make a unique contribution that cannot readily be duplicated in the U.S. The agencies will, however, fund joint research projects between the U.S. and another country if there is also so joint funding from the other country.
There are many opportunities where international collaboration could enhance progress on the Human Genome Initiative. Currently, the U.S. is in a leadership position with respect to scientific accomplishment and organization of the genome program. However, as other nations organize and initiate their programs, the U.S. will stand to gain by international collaboration as much as the other countries involved.
The budget for the two agencies follows:
The original cost estimates by the National Academy of Sciences and the Office of Technology Assessment were that a level of funding of approximately $200 million per year for about 15 years would be needed to complete the human genome project. No effort has been made at this early stage to revise or update these figures. Fiscal years 1988 through 1990 have been a period of getting organized and getting research underway while the budget for genome research has been ramping up. The five year goals prop proposed in this document are for the period FY 1991 through FY 1995 and assume a funding level of $200 million per year with inflationary increases can be reached rapidly. Only at this level will the critical mass of people and research projects be achieved that can move the human genome program forward at an optimal rate.
Contributions to the program by other countries are welcome. However, such contributions should not be viewed as decreasing the need for a critical level of activity in the U.S. Rather, they will shorten the time needed to complete the project. The funding levels originally recommended by the OTA and the NRC are required to provide optimal benefit to the American research enterprise and to American industry.
The need for money for new construction was identified in the NRC report, although no specific figure was given. NIH has estimated that a sum of $100 million over five years will be needed to make available space for expansion of research center activities. DOE estimates that $21 million will be needed to construct additional space at its centers.
Since technology and costs are still changing rapidly, it is hazardous to assign precise costs to specific areas. However, an approximate breakdown into categories over the next five years is currently estimated as follows:
It would be counterproductive to fix a particular budget distribution at this time in such a rapidly moving field, where relative costs are also changing constantly. Flexibility will be essential so that unexpected opportunities can be pursued effectively. Every effort will be made to complete the project as economically as possible.
Top of page
Last Reviewed: October 1, 2012