Skip to main content

Fiscal Year 2002 Budget Request

Department of Health and Human Services
National Institutes of Health

Francis S. Collins, M.D., Ph.D.
Director, National Human Genome Research Institute

appearing before the
House Subcommittee on Labor-HHS-Education Appropriations

May 16, 2001


appearing before the
Senate Subcommittee on Labor-HHS-Education Appropriations

May 23, 2001


Statement by Dr. Francis S. Collins
Director, National Human Genome Research Institute
Fiscal Year 2002 President's Budget Request
for the National Human Genome Research Institute

Mr. Chairman and Members of the Committee:

During FY 2000, Human Genome Project scientists capped the achievements of the last decade with a historic milestone - the complete initial reading of the text of our genetic instruction book. At present, roughly 93 percent of the 3.1 billion bases of the human genome are freely available in public databases. This is an awesome step toward a comprehensive view of the essential elements of human life, a perspective that inaugurates a new era in medicine where we will have a more profound understanding of the biological basis of disease and develop more effective ways to diagnose, treat, and prevent illness.

Between March 1999 and June 2000, the production of human genome sequence skyrocketed. During this time, the international collaborators in the Human Genome Project sequenced DNA at a rate of 1000 bases per second, seven days a week, 24 hours a day. After completing the working draft of the human genome sequence in June of 2000, Human Genome Project scientists and computational experts began to scour the sequence for insights. They reported the first key discoveries in the February 15, 2001 issue of the journal Nature. Among the findings were the following:

  • Humans are likely to have only 30,000 to 35,000 genes, just twice as many as a fruit fly, and far fewer than the 80,000 to 150,000 that had been widely predicted.
  • Genes are unevenly distributed across the genomic landscape; they are crowded in some regions and spread out widely in others.
  • Individual human genes are commonly able to produce several different proteins.
  • More than 200 human genes arrived to the genome of some ancestor directly from bacteria.
  • The repetitive DNA sequences that make up much of our genome, and commonly regarded as "junk," have been important for evolutionary flexibility, allowing genes to be shuffled and new ones to be created. The repetitive DNA may also perform other important functions.

Finishing the Human Genome Sequence

Because of the enormous value of DNA sequence information to researchers around the world, in academia and industry, NHGRI has always been committed to the principle of free, rapid access to genomic information through well-organized, annotated databases. Databases housing the human genome sequence are being visited an average of more than 50,000 times a day. In FY 2002, NHGRI will increase the usefulness of the human genome sequence to the world's researchers by finishing the sequencing to match the project's long-standing goals for completeness and stringent accuracy. More than a third of the draft sequence already has been finished into a highly accurate form - containing no more than 1 error per 10,000 bases. Finished sequence for the entire genome is expected by 2003. Finished sequence is already available for the entire lengths of chromosomes 21 and 22. Genes on chromosome 21 are involved in Down syndrome, Alzheimer's disease, certain cancers, and manic depressive illness, while those on chromosome 22 are implicated in the workings of the immune system, in congenital heart disease, schizophrenia, mental retardation and several cancers, including leukemia. Researchers can now study the molecular bases of the conditions linked to these chromosomes systematically and comprehensively, and the same high standard of completeness will be achieved for the other 22 human chromosomes over the next two years.

Genome Sequences of Non-Human Species

In the coming year, NHGRI and its partners will sequence the genomes of important model organisms, including the mouse and rat. The Human Genome Project's goals always included the analysis of the genomes of species that have been important to laboratory research. Having genome sequence from additional species is one of the most efficient tools for interpreting the human sequence, because many of the most important elements in our genome - including genes and the regions that regulate their expression - are conserved in the genomes of other species. Genome sequences from the well-studied laboratory mouse and rat will be especially useful because, as mammals, their genomes are relatively similar to the human genome and because they have long provided insights into the molecular basis of disease.

The Mouse Sequencing Consortium formed in October 2000 and in April 2001 produced a publicly accessible draft sequence covering 95 percent of the mouse genome, a result of the collaborative efforts of three private companies, six institutes of the NIH, and a British charity, the Wellcome Trust. The Consortium will now go through the more arduous process of filling in gaps in the draft and will produce high-quality finished sequence no later than 2005. Already, the mouse data is saving researchers a great deal of time. For example, researchers at Merck recently found a mouse relative of a human gene implicated in schizophrenia by scanning the newly available mouse genome sequence. Alterations in the human gene were found in a large Scottish family where schizophrenia correlates with a chromosomal rearrangement. Researchers had searched without success for years for the related gene in mouse, but the mouse genome sequence readily revealed the corresponding mouse gene in a computer search taking only seconds. The researchers can now test the effects of inactivating the gene on the mouse brain, perhaps giving clues to the molecular basis of schizophrenia in humans.

Meanwhile, the laboratory rat, long used for a wide range of medical research, including studies on high blood pressure, cancer, and drug metabolism, is getting its share of attention. In February, NHGRI and the National Heart, Lung and Blood Institute announced a plan for sequencing the rat genome. The institutes will fund private companies as well as academic labs; all have agreed to release data weekly into public databases.

Other model organisms' genomes are undergoing study as well. NHGRI is funding scientists at the University of California at Berkeley and the Baylor College of Medicine to close the gaps in the fruit fly genome sequence and to ensure that the finished sequence meets quality standards for finished sequence data. In FY 1999, NHGRI and the National Cancer Institute, leading 15 other NIH institutes, launched the Mammalian Gene Collection, whose goals are to develop analysis tools and to produce a collection of full-length copies of genes, which can be sent to researchers on demand. So far, nearly 20,000 full-length gene copies have been identified and are slated for sequencing.

Human Genetic Variation

For understanding the basis of common diseases with complex origins, like heart disease, Alzheimer disease, and diabetes, it is important to catalog genetic variations and how they correlate with disease risk. Among any two people, an average of one DNA spelling variation - or SNP - exists in every 1000 bases. With a draft of the human genome sequence in hand, the pace of SNP discovery has increased dramatically. In FY 1999, NHGRI organized the DNA Polymorphism Discovery Resource consisting of 450 DNA samples collected from anonymous American donors with diverse ethnic backgrounds. NHGRI has funded studies looking for SNPs in these samples. The non-profit SNP Consortium came into being in April 1999, with the goal of developing a high-quality SNP map of the human genome and of releasing the information freely. Consortium members include the Wellcome Trust, a dozen companies (mostly pharmaceutical companies), and three academic centers; they have looked for SNPs in DNA from a subset of the samples in the DNA Polymorphism Discovery Resource. In July 2000, the NHGRI and The SNP Consortium announced a collaboration that has allowed the contribution of 5 times more SNPs to the public domain than the consortium originally planned. As of March 28, the public database that serves as a central repository for SNPs has received 2,840,707 SNP submissions.

With the increased knowledge about human variation, the genetic underpinnings of various diseases, including diabetes, are being discovered. The recent discovery of a gene, calpain-10, whose disruption contributes to diabetes, resulted from studies linking diabetes with genetic variations across the whole genome and then in a specific part of chromosome 2. The newly-discovered gene suggests that a previously unknown biochemical process is involved in the regulation of blood sugar levels. Diabetes is also one of the areas of focus for intramural research at NHGRI.

Investigators from Howard University and NHGRI are engaged in a project looking for genetic risk factors for diabetes in West Africans. This is part of a wider collaboration between the two institutions to study the genetic basis of diseases that disproportionately affect African-Americans. The diabetes study focuses on West Africans since they are thought to be the population from which modern African-Americans are largely descended, and the Africans are not exposed to the same dietary risk factors as Americans. Study recruitment centers were opened in Nigeria and Ghana; in the fall of 2000, researchers met their goal of recruiting 400 pairs of siblings affected with diabetes. Genetic typing of the collected tissue samples is in progress at NHGRI's Center for Inherited Disease Research in Baltimore to search for genetic variations that increase susceptibility to diabetes.

Meanwhile, other intramural investigators are part of a consortium where researchers pool a wide range of data about the genetic factors underlying diabetes. One of the studies, called FUSION (Finnish U.S. investigation of non-insulin dependent diabetes mellitus) has collected DNA samples and clinical data from 5000 Finnish people who have diabetes; many of the individuals are related. A genome-wide search among these people for genes related to diabetes risk has so far identified two areas on chromosome 20 that are likely to contain crucial genes.

Gene Expression

The new-found abundance of genomic information and technology is propelling scientists out of the pattern of studying individual genes and into studying thousands at a time. Large-scale analyses of when genes are on or off (gene expression) can be used, for example, to study the molecular changes in tumor cells. This exciting new approach combines recombinant DNA and computer chip technologies to produce microarrays or DNA chips. Classifying cancer on a molecular level offers the possibility of more accurate and precise diagnosis and treatment. Intramural researchers at NHGRI have used large-scale expression studies to discover genetic signatures that can distinguish the danger from different skin cancers and that can distinguish between hereditary and sporadic forms of breast cancer.

Protein Structure, Function and Interaction

With a global view of human genes now possible, scientists are eager to obtain a similarly comprehensive view of human proteins, a field called proteomics in analogy to genomics. Researchers want to know the functions of proteins and how the proteins work together in cells. Only a subset of all possible proteins are present in any given cells at any given time. To study protein function on a wide scale, various groups of researchers plan to identify the locations of proteins, their levels in different cells, their structures, the interactions among different proteins, and how they are modified. NHGRI is contributing to this field by developing technologies for efficient, large-scale analyses, particularly for determining protein interactions and measuring protein abundance in different cells.

Centers of Excellence in Genomic Science

In FY 2001, NHGRI will award the first grants under a new program to bring cross-disciplinary teams of researchers together with shared resources and a unified goal. The Centers of Excellence in Genomic Science are designed to develop new genomic approaches for analyzing the molecules of life systematically, to integrate technical developments into biomedical research, and to expand training opportunities. Additional centers will be funded in FY 2002 to develop new ways of undertaking genome-wide analyses in areas like the regulation of gene expression, protein expression and interaction, human genetic variation, and the storage and analysis of the flood of new data. These centers are expected to be a source of creative approaches addressing previously unanticipated questions. As training centers, they will give high priority to the training of people from racial or ethnic minority groups, women, and people with disabilities.

Promise for New Treatments and Prevention

Genetic testing will become increasingly important for assessing individual risk of disease and prompting programs of prevention. An example of how this may work involves the disease hereditary hemochromatosis (HH), a disorder of iron metabolism affecting about one in 200 to 400 Americans. Those with the condition accumulate too much iron in their bodies, leading to problems like heart and liver disease and diabetes. The gene causing the condition has been identified, allowing early identification of those in whom HH may develop. Once people at risk are identified, they can easily be treated by periodically removing some blood.

Genetic testing is also being used to tailor medicines to fit individual genetic profiles, since drugs that are effective in some people are less effective in others and, in some, cause severe side effects. These differences in drug response are largely genetically determined. Customizing medicine to a patient's likely response is a promising new field known as pharmacogenomics. A recent publication in the journal Hypertension showed how pharmacogenomics applies to high blood pressure. Researchers found a variation in a particular gene that affects how patients respond to a commonly used high blood pressure drug, hydrochlorothiazide. Other recent studies reveal that doctors should avoid using high doses of a common chemotherapy treatment (6-mercaptopurine) in a small proportion of children with leukemia. Children with a particular form of a gene (TPMT) suffer serious, sometimes fatal, side effects from the drug.

Genomics is also fueling the development of new medicines. Several drugs now showing promising results in clinical trials grew out of genomics-related studies. One example is Glivec (previously called STI571), produced by Novartis for treating chronic myelogenous leukemia (CML). In CML, an abnormal gene fusion creates an abnormally activated protein. Novartis designed a small molecule that specifically inactivates the protein. In phase I clinical trials, this drug caused favorable responses in 53 of 54 patients, while side effects were minimal no matter how high the dosage. Meanwhile, Bayer and Millennium announced the development of another cancer drug born of genomics in January 2001. GlaxoSmithKline is testing a new genomics-derived heart disease drug that targets a protein involved in fat metabolism. Johnson&Johnson is testing a drug targeting a brain receptor involved with memory and attention. Human Genome Sciences has four clinical trials in progress to test gene-based drug candidates.

Ethical, Legal and Social Implications

From its inception, NHGRI recognized its responsibility to address the broader implications of having access to genetic information and technology. Since 1991, it has committed 5 percent of its budget to studying the ethical, legal, and social implications (ELSI) of genome research. Study of human genetic variations raises many ELSI issues. The case of hemochromatosis brings up some of these issues. Given the devastating complications from HH and the simple treatment, some have proposed widespread genetic testing to find those predisposed to HH. But considerable uncertainty remains about how strong the link is between particular gene variants and the presence and severity of HH disease. In FY 2000, NHGRI and the National Heart, Lung and Blood Institute began a 5-year, $30 million epidemiological study among 100,000 adults to gauge, among other things, the prevalence and the genetic and environmental causes of HH. NHGRI is funding an examination of the ethical, legal, and social issues related to implementing a widespread screening program. Information from the study should yield insights not only for HH but also for other treatable adult-onset genetic disorders.

Many ELSI issues raise policy implications; one is how to deal with potential employment discrimination. Two years ago, a Time/CNN poll showed that 95 percent of those polled thought employers should not have access to genetic information about employees without their permission. A recent case, involving the Burlington-Northern Santa Fe railroad, shows what can happen. In March 2000, BNSF added testing for a gene (PMP22), which may be the cause of carpal tunnel syndrome in a small population of people with the disorder, to the medical evaluation of employees who file workers' compensation claims for carpal tunnel syndrome, to test whether the carpal tunnel syndrome was "work-related." Employees were not told that their blood would be submitted for a genetic test. In February, the Equal Employment Opportunity Commission and the workers' union filed suit against BNSF. The company has now stopped genetic testing and agreed to seek approval from the union before doing any genetic testing in the future. While this is a happy ending for this particular case, comprehensive public policy will be required as genetic tools become more widespread. The ELSI program at NHGRI will continue to form policy recommendations that balance the need to protect individuals with the needs of the research community and the healthcare industry.

Finally, as part of its mission of education, NHGRI has produced a free educational kit, "The Human Genome Project: Exploring our Molecular Selves," that was released when the human sequence analysis was published in February. The kit includes a multimedia CD-ROM, an award-winning video documentary, and an informational brochure. The kit is designed to give science teachers and classrooms, particularly at the high school level, better access to the latest information about genome science and its implications, but it is expected to be used more broadly, by college students, voluntary health organizations, and the general public. Backing from Howard Hughes Medical Institute and Pharmaceutical Research and Manufacturers of America (PhRMA) insured that the kit, sponsored by the NIH and DOE, would be available for free. Nearly 40,000 kits have been requested in just two months.

I am pleased to present the President's budget request for the National Human Genome Research Institute (NHGRI) for Fiscal Year 2002, a sum of $426,739,000, which reflects an increase of $44,627,000 over the comparable Fiscal Year 2001 appropriation. The NIH budget request includes the performance information required by the Government Performance and Results Act (GPRA) of 1993. Prominent in the performance data is NIH's second annual performance report which compares our FY 2000 results to the goals in our FY 2000 performance plan. As performance trends on research outcomes emerge, the GPRA data will help NIH to identify strategies and objectives to continuously improve its programs.

Director, National Human Genome Research Institute

April 14, 1950. Staunton, Virginia

University of Virginia, 1970 - B.S. (with Highest Honors); Yale University, 1972 - M.S.; Yale University, 1974 - Ph.D.; University of North Carolina School of Medicine, 1977 - M.D. (with Honors)

Professional History:
1977-1981, Intern, Resident, Chief Resident in Medicine, North Carolina Memorial Hospital, Chapel Hill, North Carolina. 1981-1984, Fellow in Human Genetics and Pediatrics, Yale University School of Medicine, New Haven, Connecticut. 1984-1993, Assistant, Associate and then Full Professor of Internal Medicine and Human Genetics, University of Michigan, Ann Arbor, Michigan. 1987-1993 Assistant, Associate, and then Full Investigator, Howard Hughes Medical Institute. 1993 to present, Director, National Human Genome Research Institute, NIH, Bethesda, Maryland.

Professional Organizations:
American Society of Human Genetics; American Society for Clinical Investigation; Association of American Physicians; Institute of Medicine; National Academy of Sciences; American Academy of Arts and Sciences.

Awards and Honors:
Morehead Foundation Fellow, 1973-1977; Alpha Omega Alpha, elected Junior year, President of UNC chapter, 1976-1977; Hartford Foundation Fellowship, 1985-1987; Paul di Sant'Agnese Award of the Cystic Fibrosis Foundation, 1989; Gairdner Foundation International Award, 1990; National Medical Research Award, National Health Council, 1991; American Academy of Achievement Golden Plate Award, 1994; The Baxter Award for Distinguished Research in Biomedical Sciences, Association of American Medical Colleges, 1994; Susan G. Komen Breast Cancer Foundation National Award for Scientific Distinction, 1995; Breath of Life Award, Cystic Fibrosis Foundation, 1997; Mendel Medal, Villanova University, 1998; Champions of Pediatric Research Award, Children's National Medical Center, 1998; Shattuck Lecture, Massachusetts Medical Society, 1999; Arthur S. Flemming Public Service Award, 1999; Association of American Physicians, George M. Kober Lecture Award, 2000; Carter Lecture, British Society for Human Genetics, 2000; Scientist of the Year, National Disease Research Interchange, 2000

Honorary Doctoral Degrees:
Emory University, Mary Baldwin College, Yale University, Mount Sinai School of Medicine, University of North Carolina, George Washington University, University of Pennsylvania, Brown University

Top of page

Last Updated: September 18, 2009