NHGRI logo

Department of Health and Human Services

Statement by

Dr. Francis S. Collins
Director, National Human Genome Research Institute


Fiscal Year 2001 President's Budget Request
for the National Human Genome Research Institute

Mr. Chairman and Members of the Committee:

I am pleased to present the President's non-AIDS budget request for the National Human Genome Research Institute (NHGRI) for Fiscal Year 2001, a sum of $353.4 million, which reflects an increase of $21.8 million over the comparable Fiscal Year 2000 appropriation. Including the estimated allocation for AIDS, total support requested for the NHGRI is $357.7 million, an increase of $21.9 million over the Fiscal Year 2000 appropriation. Funds for the NHGRI efforts in AIDS research are included within the Office of AIDS Research budget request. The National Institutes of Health (NIH) budget request includes the performance information required by the Government Performance and Results Act (GPRA) of 1993. Prominent in the performance data is NIH's first performance report that compares our FY 1999 results to the goals in our FY 1999 performance plan. As our performance measures mature and performance trends emerge, the GPRA data will serve as indicators to support the identification of strategies and objectives to continuously improve programs across the NIH and the Department.

This is my seventh appearance before this Subcommittee. I am again pleased to report that the Human Genome Project continues to be ahead of schedule and under budget. When I appeared before you last February, Human Genome Project scientists had just completed sequencing the DNA of the worm known as C.elegans; laying out the entire genetic code of an animal for the first time. At that time, 405 million base pairs of human DNA sequence had been deposited in GenBank by the large scale human DNA sequencing pilot projects that were initiated in 1996. These projects tested new ways to apply sequencing strategies to the large and complex human genome.

Human DNA Sequencing

A lot has happened in a year. Following the success of the pilot projects, the NHGRI, the Department of Energy, and our international partners (the U.K., France, Germany, Japan and China) initiated last March full-scale production sequencing of the 3 billion bases that comprise the human genetic instruction book. The newly tested sequencing strategies, coupled with advances in sequencing technology, provided the necessary foundation to begin full-scale production.

Later this year, this international consortium will produce a "working draft" of the human genome sequence, an essential resource for the whole research community. The working draft will provide 90 percent coverage of the human genome with an accuracy of 99.9 percent. Then we will move on to complete the final, highly accurate, finished human genome sequence in 2003 or sooner, two years ahead of the original schedule

All sequence data produced by the international consortium is deposited every 24 hours in GenBank, where it is freely available to any researcher with an internet connection, without restrictions on use. The rapid public availability of the sequence is invaluable to academic scientists studying the molecular basis of human health and disease, as well as corporate researchers engaged in drug development. By November 17, 1999, the consortium had deposited the sequence of one billion bases in the human genome. Today, over half of the sequence, approximately 1.7 billion base pairs of non-redundant sequence, resides in GenBank. This marks the production of over a billion base pairs of human DNA since last year's hearing.

To reach this milestone, Human Genome Project participants actually had to sequence over 12 billion base pairs of human DNA in overlapping pieces. As project manager, I know this could not have been done without the tireless work of the hundreds of dedicated scientists and technicians at the major sequencing centers. The largest five of these are referred to as the G-5 (the Whitehead Institute at MIT, the Washington University School of Medicine in St. Louis, the Baylor College of Medicine in Houston, the DOE's Joint Genome Institute in California, and the Sanger Centre in the U.K.) and will do about 85 percent of the work.

Chromosome 22

The Human Genome Project achieved another historic milestone this year when an international scientific team announced the unraveling of the genetic code of an entire human chromosome for the first time. The 33.5 million base pairs of Chromosome 22 were published in the December 2, 1999 issue of the journal Nature. Research now will focus on determining what it all means. Sequencing and mapping efforts have already revealed that genes on chromosome 22 are implicated in the workings of the immune system, congenital heart disease, schizophrenia, mental retardation, birth defects, and several cancers including leukemia, but many more secrets will be discovered in this decoded text. The results of this work give scientists insights into the way genes are arranged along the DNA molecule and pave the way for major advances in the diagnosis and treatment of disease.

Until last year, scientists were uncertain about whether an entire human chromosome could be sequenced in this manner. For example, they did not know whether insurmountable problems would prevent completing the assembly of large stretches of contiguous sequence. The work done on chromosome 22 not only answered any doubts about the ability to sequence a chromosome, it validated the strategy being pursued by the publicly supported Human Genome Project in sequencing the entire human genome.

Beyond the Human Sequence

While laying out the precise sequence of the 3 billion letters of the human genome is an awesome and audacious undertaking, it is but one of the many important objectives of the Human Genome Project. The 5-year research plan published in the October 23, 1998 issue of Science outlines seven other ambitious goals critical to the success of the project. One such tool is a catalog of common genetic variants.

Human Genetic Variation

Any two human beings, regardless of ethnic or racial self-identity, are 99.9 percent the same at the genetic level. But certain changes in the sequence, some as subtle as a single letter change, contribute to disease or disease risk. Today, to find the misspelling, or misspellings, that contribute to common diseases, such as cancer, Parkinson's disease, asthma, depression, or heart disease, researchers must study pedigrees and search through large chromosome "neighborhoods" using the genetic map. But having the reference sequence, and new technologies for finding those places in the genome that vary among us, means that assembling a catalog of common genetic variants is now possible, and will greatly speed the process of disease gene discovery.

Most variants will be single letter differences, known as SNPs or single nucleotide polymorphisms. Any SNPs found to be associated with a disease will provide targets for further study to understand the biological processes underlying health and disease and facilitate development of diagnostic tests. This understanding will in turn fuel development of improved prevention and treatment strategies. Because genetic variants can also contribute to individual differences in response to drugs, the identification and understanding of these variants will allow doctors to choose the most effective drug based on a patient's particular genetic makeup.

In FY 1999, with contributions from 16 NIH institutes, the NHGRI began an initiative to discover and catalog common variants in human DNA. In the next two years, NIH-supported researchers expect to find about 100,000 SNPs. Over the past year, this initiative has been complemented by an innovative collaboration in the private sector. Last April 15, a collaborative effort of 10 large pharmaceutical companies, IBM, Motorola and the Wellcome Trust, announced the formation of The SNPs Consortium (TSC). The Consortium's goal is to identify an additional 310,000 SNPs. All SNPs identified by either the NIH or TSC are regularly deposited into the publicly available SNP database. This collaboration between the public and private sectors has already produced and deposited 25,000 SNP's into the public database.

Sequencing the Laboratory Mouse

Last fall, NHGRI began sequencing of the genome of the laboratory mouse, one of the most frequently used mammals in biomedical research. Ten laboratories, now referred to as the Mouse Genome Sequencing Network (MGSN), collectively received funding. All mouse sequence produced will fall under the same data release principles adhered to for the sequencing of the human genome, i.e., assemblies greater than 2,000 base pairs will be released to public databases within 24 hours.

Mouse and humans are approximately 70 percent identical at the genetic level. Both genomes contain approximately 3 billion base pairs and encode an estimated 100,000 genes. The invaluable contribution of mouse models toward a better understanding of human disease has long been recognized in biomedical research. For example, mouse models provide scientists with unprecedented insights into the molecular basis of disease and the response to potential therapeutic agents. Intramural scientists at NHGRI are developing and utilizing mouse models to study a diverse array of human diseases. These include brain disorders such as Huntington's disease, Parkinson's disease, neural crest disorders, and blood disorders such as acute myeloid leukemia.

Sequencing the mouse is a priority for a wide spectrum of biomedical scientists. Every institute at NIH, with support of the NIH Office of the Director, made a contribution to the first year of funding. NHGRI has assumed responsibility for funding the mouse sequencing network in the second year and beyond. A significant fraction of NHGRI's FY 2000 increase is dedicated to support of mouse sequencing.

Finishing the Fly Genome

Looking ahead, achievement of another significant milestone is just around the corner. Publication of the complete sequence of the fruit fly, Drosophila melanogaster, is expected within a matter of weeks. The fruit fly is another useful model organism for studying genetics, with a genome of 160 million base pairs of DNA. Providing this research tool is important because understanding the role of a gene in the human body is often clarified by comparing its DNA code to that of other organisms.

NHGRI supported scientists at the University of California at Berkeley and the Baylor College of Medicine carried out the initial scaffold sequencing of the fruit fly genome. In 1998, encouraged by NHGRI, Celera Genomics began a collaboration with these groups. In order to facilitate the work in both sectors, a Memorandum of Understanding (MOU) was prepared between the publicly funded scientists and Celera Genomics to outline the respective roles of each of the partners. The MOU maintained the public sequencing effort's commitment to seeing that complete, accurate sequence for this important model organism is made freely accessible to all scientists by requiring that the annotated sequence be released to GenBank upon publication.

Tools for Understanding the Human Genome

Once we have the sequence of the human and key model organisms in hand, we will need the tools to allow us to explore and understand its significance in health and disease. While this exploration will take many years, it will be aided by tools now in development by the Human Genome Project; tools that enable researchers to study the entire genome and all its genes in a single experiment.

NHGRI has launched a number of initiatives to develop tools to understand gene function that will grow in coming years. One such initiative is the Mammalian Gene Collection, led jointly by NHGRI and NCI. This initiative will create a complete collection of cloned and sequenced genes for humans and other mammals. In the future, scientists will be able to go to the freezer to pull out any gene they want to study. In parallel, new technologies such as microarrays are being developed, that can measure and compare the extent to which a gene is active under various conditions and in various tissues. The NHGRI intramural program is one of the world leaders in this technology. Many other clever approaches to studying gene function are being explored and the field is expanding rapidly.

Both genomic sequencing and these new functional studies generate vast amounts of data that must be organized, stored and analyzed in order to allow scientists to pursue new leads in medical research. One significant outcome of the Human Genome Project has been the transformation of biology into a field that is rich in data, which has spawned a new discipline, called computational biology. New tools for handling data to make it readily accessible to scientists, as well as new approaches for understanding the significance of the data, are urgently needed. In view of this need, NHGRI plans to place a major emphasis on funding computational genomics studies in the future. In Fiscal Year 2001, NHGRI will launch a new Genome Centers of Excellence program to support the development of novel technology and computational approaches for studying the function of genomes. In addition to funding innovative science, these Centers will also provide an environment in which a new generation of genomic scientists can be trained. The concept for the centers is similar to that recommended by an Advisory Committee to the NIH Director for "Programs of Excellence in Biomedical Computing." The NHGRI anticipates that these Genome Centers of Excellence will meet many of the objectives outlined in the Committee's report, known as the"BISTI" (Biomedical Information Science and Technology Initiative) report.

Safeguarding the Fair Use of Genetic Information

From the outset of the Human Genome Project, the NHGRI has supported research into the ethical, legal, and social implications (ELSI) of genomic research and fostered the development of relevant policy recommendations. We have a fundamental obligation to assess and deal with concerns such as protecting the privacy and fair use of genetic information, and the integration of new genetic technologies into health care. If we do not and the public is fearful of obtaining or disclosing genetic information, or has limited access to genetic technologies, the promise of genetic medicine will not be realized and we will have achieved little.

Progress on safeguarding the fair use of genetic information was made just in the last few weeks. On February 8, 2000, President Clinton signed an Executive Order to protect federal workers from discrimination based upon their genetic information. This is built on the bedrock principle that an individual's predictive genetic information should be used for their benefit and not for harm. A variety of important organizations, such as the American Medical Association, Hadassah, the Genetic Alliance, the American College of Medical Genetics, the Biotechnology Industry Organization (BIO) and the National Society of Genetic Counselors, immediately expressed their support for the President's action.

The Executive Order, which built upon the recommendations published by the NIH-DOE ELSI Working Group and the National Action Plan on Breast Cancer, is an important step toward assuring federal workers that their genetic information will be kept private and be used against them by their employer. It also provides federal and state legislators with a useful template for extending protections to all workers. We hope to see this step built upon in 2000 by the passage of effective federal legislation barring the discriminatory use of predictive genetic information in health insurance and employment.


The dramatic progress of the Human Genome Project has exceeded the expectations of even the most optimistic just a few years ago. In a matter of months, the majority of the fundamental "Book of Life", the human sequence, will be in hand. Having this virtual guidebook to the human genome will permit many exciting opportunities. Combining this with the catalog of human variation, and with new tools and technologies developed by the Human Genome Project, will lead to unlocking the mysteries of diseases, such as diabetes, Parkinson's, schizophrenia, and common forms of cancer. That in turn will allow new approaches to prevention based on each individual's disease risk factors. And we can, a few years hence, predict a host of new gene-based therapies specifically designed to fit an individual's genetic makeup.

Mr. Chairman, and members of the committee, it has truly been a privilege to be a part of this historic effort, known as the Human Genome Project. At the beginning of the new millennium, genetics has come to encompass nearly every aspect of health research and will surely transform how we diagnose and treat disease in the future. It will enhance our concepts of shared humanity, regardless of racial or ethnic identity.

My colleagues and I will be happy to respond to any questions you may have.


Director, National Human Genome Research Institute
April 14, 1950. Staunton, Virginia


University of Virginia, 1970 - B.S. (with Highest Honors); Yale University, 1972 - M.S.; Yale University, 1974 - Ph.D.; University of North Carolina School of Medicine, 1977 - M.D. (with Honors)

Professional History:

1977-1981, Intern, Resident, Chief Resident in Medicine, North Carolina Memorial Hospital, Chapel Hill, North Carolina. 1981-1984, Fellow in Human Genetics and Pediatrics, Yale University School of Medicine, New Haven, Connecticut. 1984-1993, Assistant, Associate and then Full Professor of Internal Medicine and Human Genetics, University of Michigan, Ann Arbor, Michigan. 1987-1993 Assistant, Associate, and then Full Investigator, Howard Hughes Medical Institute. 1993 to present, Director, National Human Genome Research Institute, NIH, Bethesda, Maryland.

Awards and Honors:

Morehead Foundation Fellow, 1973-1977; Alpha Omega Alpha, elected Junior year, President of UNC chapter, 1976-1977; Hartford Foundation Fellowship, 1985-1987; Paul di Sant'Agnese Award of the Cystic Fibrosis Foundation, 1989; Gairdner Foundation International Award, 1990; National Medical Research Award, National Health Council, 1991; American Academy of Achievement Golden Plate Award, 1994; The Baxter Award for Distinguished Research in Biomedical Sciences, Association of American Medical Colleges, 1994; Susan G. Komen Breast Cancer Foundation National Award for Scientific Distinction, 1995; Breath of Life Award, Cystic Fibrosis Foundation, 1997; Mendel Medal, Villanova University, 1998; Champions of Pediatric Research Award, Children's National Medical Center, 1998; Shattuck Lecture, Massachusetts Medical Society, 1999; Arthur S. Flemming Public Service Award, 1999.

Honorary Doctoral Degrees:

Emory University, Mary Baldwin College, Yale University, Mount Sinai School of Medicine, University of North Carolina, George Washington University, University of Pennsylvania.

Top of page

Last updated: September 21, 2007