Skip to main content

The NHGRI Genome Sequencing Program (GSP)

Tree of life, human, bacteria, fly, mouse, dog



Overview and Purpose 

The NHGRI Genome Sequencing Program (GSP) has evolved from NIH's participation in the International Human Genome Sequencing Project (HGP). In addition to creating an essential resource for biomedical research, the HGP helped define NHGRI's niche: developing general paradigms and approaches, and creating data resources and tools. The specific activities of the Genome Sequencing Program since the end of the HGP reflect changes in the scientific questions that could be addressed as the technology (cost, quality) changed over time.

The GSP aims to use genome sequencing to identify genes and genomic variants underlying human inherited disease across its full spectrum, including rare diseases likely to be due to rare variants with strong effects (Mendelian), and common, genetically complex diseases that are caused by many variants. The GSP will also develop methods, tools, and knowledge intended to enhance the ability of the community to pursue other human inherited diseases.

The NHGRI Genome Sequencing Program (GSP) is funded through multiple FOAs:

The GSP benefits from collaboration and in some cases, co-funding from other NIH institutes for work in particular disease areas. At present, the National Heart, Lung and Blood Institute (NHLBI) co-funds both the Centers for Common Disease Genomics (CCDGs) and Centers for Mendelian Genomics (CMGs); NEI co-funds the CMG program. The CCDGs and CMGs regularly collaborate on new disease areas.

The Centers for Common Disease Genomics are studying a subset of cardiovascular, neuropsychiatric, and immune-mediated diseases, using a range of study designs and presumed underlying genomic architectures. Secondarily, the CCDGs will develop resources for the community and develop and optimize technical and project design approaches for using genome sequencing to understand common disease.

The Centers for Mendelian Genomics aim to identify genes and variants underlying hundreds of Mendelian conditions. They also develop tools and the know-how relevant to applying genome sequencing to Mendelian conditions. 

The Genome Sequencing Program Analysis Centers undertake novel, investigator-initiated computational analyses of the data produced by the CCDGs and CMGs. They also intend to develop improved and novel analysis methods and study designs across the entire GSP.

The previous iteration of the GSP included two additional components:

The Clinical Sequencing and Analysis Centers (CSER). This program is still ongoing; NHGRI is considering what a future version of this program would be. However, going forward the CSER program has a separate enough mission that it will collaborate with but no longer be formally managed as part of the GSP.

The Genome Sequencing Informatics tools (GS-IT) program, which is ending. The overall goals of the GS-IT program will be considered as part of the broader NHGRI Informatics Program

For a summary of the process that led to the current program structure, see: NIH genome sequencing program targets the genomic bases of common, rare disease
January 14, 2016

The solicitations for the current program elements are:

A brief history of the sequencing program is available at: The NHGRI Genome Sequencing Program History

A chart including information on sequencing costs can be found here: DNA Sequencing Costs: Data

Program Management

All components of The NHGRI Genome Sequencing Program are under continuous programmatic evaluation. NHGRI Staff consults quarterly with a group of scientific advisors to the program about individual grantee and program performance. Program grantees submit quarterly reports that describe progress, including detailed production summaries for the more production-oriented programs. All major program decisions are vetted by the National Advisory Council for Human Genome Research.

In addition, the grantees are organized into a research network; within and between programs, these networks collaborate on matters such as best practices, standards, policies, methods development, data analysis, technology adaptation, and other common interests. All four programs meet with NHGRI Staff and the SAP at an annual meeting.

A coordinating center has been funded and established at Rutgers University through RFA-HG-15-019: Genome Sequencing Program Coordinating Center (U24). The GSP Coordinating Center (GSPCC) will help NHGRI coordinate across all GSP activities - the Centers for Common Disease Genomics (CCDG), the Centers for Medelian Genomics (CMG), and any other program components), and facilitate cross-study activities to increase the integration and efficiency of the program as a whole.

See: Genome Sequencing Program Coordinating Center (GSPCC)

Data Release and Access Policies

NHGRI data release policies for genome sequence data evolved from the original Bermuda and Ft. Lauderdale policies which were suited for the Human Genome Project data and organismal sequence data. With the advent of projects involving large numbers of samples from human subjects, this area is under continuous evaluation, much of it at the NIH, rather than the NHGRI level.

See: for a discussion of the latest NIH policy proposals in this area.

Data Release Policies for the NHGRI GSP

NHGRI requires all programs and projects to release sequence and related data to appropriate databases (dbGaP). NHGRI requires rapid pre-publication release of all organismal sequence data and assemblies, and human data in a manner consistent with the terms of consent under which samples were collected. Data from the Centers for Mendelian Genomics is also available in Matchmaker Exchange.

Data Access Policies for the NHGRI GSP

Human data access policies vary depending on the repository and consent terms; however, as above, NHGRI requires rapid deposition of data in repositories to which all members of the scientific community have equal access.


The NHGRI Genome Sequencing Program (GSP) was initially created to sequence the human genome, as part of the Human Genome Project (HGP), an international collaboration involving more than a dozen centers worldwide. (See: All About the Human Genome Project for an overview of the HGP). In the process of building the technical and intellectual infrastructure to sequence the 3Gb human genome, the genomes of a small number of widely-used biomedical model systems were sequenced, including those of Escherichia coli, baker's yeast (Saccharomyces cerevisiae), the roundworm (Caenorhabditis elegans), and the fruit fly (Drosophila melanogaster).

The HGP firmly established the advantage of the rapid and unfettered release of sequence data to the community. With the completion of the HGP in 2003, two things were apparent in addition to the basic insight gained from having these genome sequences. First, the development and application of large-scale genome sequencing had resulted in significant gains in efficiency, with approximately 2-fold decreases in cost (per amount of effort) attained every ~20 months. Second, it was apparent that there would be an increasing demand for genome sequence over time to attain goals of major significance to the research community.

The reasons that widely-used biomedical model systems were sequenced were varied. For example:

  • Sequences were obtained from additional mammals and other vertebrates to use sequence comparisons to delineate regions of the genome that have been conserved by evolution. Comparing sequence among mammals (and other vertebrates) is still one of the most effective ways to distinguish regions that are likely to have important function (about 5% of the genome) from the other 95%.
  • Organisms more distantly related to humans were sequenced to understand the origins of genes and gene families.
  • Organisms were sequenced to provide added value to all major research model systems, such as mouse, rat, dog, and others.
  • Sequence information aids basic research into human disease pathogens and their vectors.
  • Sequence from clusters of organisms related to major experimental model systems (e.g. Drosophila) or pathogens and vectors of human disease helps annotate those model systems to maximize the benefit of their use to the scientific community, to help provide basic biological insight, and to provide simpler systems to test ideas about using comparative sequence to annotate more complex genomes, such as our own.
  • Sequence from clusters of vectors/pathogens and related nonpathogenic or vectoring strains allows researchers to discern genes responsible for pathogenic or vectoring properties.

More recently, it has become evident that sequencing costs have dropped and capacity has risen. This allows researchers to undertake projects in Human Variation (See: Survey of Human Structural Variation) and Medical Sequencing (See: Medical Sequencing Program and Current Initiatives), where large numbers of human genomes are partially or (eventually) fully sequenced in order to find the genetic variants that underlie human disease.

Truly facing this challenge will require even more sequencing capacity, at significantly improved efficiencies of the kind that currently can be realized only by maintaining very high-throughput, large-scale sequencing capacity. We anticipate that the program will drive, and will be driven by, the advent of new sequencing technologies. This will enable us to approach very significant questions in ways or at scales that could not previously be approached, such as:

  • What are the sequences and sequence variants that contribute to human health and disease?
  • What is the range of human genetic variation?
  • What are the population frequencies of gene alleles that contribute to common diseases such as heart disease? What are the relative contributions of those alleles to disease?
  • How do somatic mutations correspond to the etiology and behavior of tumors? What are the new somatic mutations that occur during tumorigenesis?

Medical sequence information will lead to the identification of variants responsible for human disease and will facilitate disease stratification, diagnosis, prognosis, treatment response/pharmacogenomics, and identification of critical molecular pathways in health and disease and identification of new drug targets. Ultimately, continued development of medical sequencing, along with improvements to sequencing technology (See: Genome Technology Program), will lead to the ability to use DNA sequencing as part of the routine standard of care in diagnosis, prognosis and treatment of many human diseases.

Each of the large-scale genome sequencing programs described here has, as its motivation, one or more of the rationales described above, and others that are described in the specific initiative descriptions. It is likely that as capacity grows, and as scientists begin to draw conclusions from the current data, new opportunities for the use of large-scale genome sequence data will be proposed. To take advantage of newly arising opportunities, the overall NHGRI Large-Scale Genome Sequencing Program is continually evaluated by NHGRI program staff and the National Advisory Council for Human Genome Research with regard to new opportunities and overall effectiveness.

A chart including information on historical sequencing costs can be found at: DNA Sequencing Costs: Data.

Past Workshops

Scientific Advisors to the NHGRI Genome Sequencing Program

Name Title Institution

Ewan Birney, Ph.D.

Joint Associate Director
of the EMBL-EBI &
Senior Scientist

European Bioinformatics Institute

Rex Chisholm, Ph.D.

Professor of Medical Genetics
(Professor in Cell and Molecular Biology,
Center for Genetic Medicine and Surgery)

Northwestern University

Andy Clark, Ph.D.

Professor of Population Genetics

Cornell University

Daniele Fallin, Ph.D.

Professor and Chair Department of Mental Health

Johns Hopkins University

Jonathan Haines, Ph.D.

Director of the Institute for Computational Biology

Case Western Reserve University

Monica Justice, Ph.D.

Head and Senior Scientist
Genetics & Genome Biology

The Hospital for Sick Children

Rod McInnes, M.D./Ph.D.

Director of Lady Davis Research Institute

McGill University

Len Pennacchio, Ph.D.

Deputy of Genomic Technologies,
DOE Joint Genome Institute & Senior Staff Scientist

Lawrence Berkeley National Laboratory, DOE


Program Contacts

Adam Felsenfeld, Ph.D.
(Centers for Common Disease Genomics and Genome Sequencing Program Coordinating Center)

Lisa Chadwick, Ph.D.
(Centers for Mendelian Genomics)

Carolyn Hutter, Ph.D.
(Clinical Sequencing Exploratory Research)

Chris Wellington, B.S.
(Centers for Mendelian Genomics)

Kris Wetterstrand, M.S.
(Sequencing Costs)

Scientific Program Analysts

Taylorlyn Stephan, B.A.
(Scientific Program Analyst)



National Human Genome Research Institute
National Institutes of Health
5635 Fishers Lane
Suite 4076, MSC 9305
Bethesda, MD 20892-9305

Phone: (301) 496-7531
Fax: (301) 480-2770

Last Updated: October 5, 2018

See Also:

Workshop and Priority Setting Reports and from the Extramural Research Program