NHGRI logo

Human Genome Reference Program

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results.

Participants and Structure

Human Genome Reference Center

  • Washington University, St. Louis
    Principal Investigators (PI): Ting Wang (Contact), Paul Flicek, Ira Hall, Benedict Paten

    The Human Genome Reference Center at Washington University in St. Louis serves as the coordinating center. They maintain and update the reference sequence; support state-of-the-art reference representations; and educate and coordinate with the research community (including clinicians and basic research scientists).
     

High Quality Reference Genomes

  • University of California, Santa Cruz}
    Principal Investigators (PI): David Haussler (Contact), Evan Eichler, Ira Hall

    The High-Quality Human Reference Genomes Center at the University of California, Santa Cruz collects additional DNA samples from populations not represented in the current reference, including the creation of cell lines. They will generate at least 350 high-quality reference genome sequences, a subset of which will be finished, telomere-to-telomere genome sequences. The center also disseminates the data and works closely with the other Human Genome Reference Program components.
     

Genome Reference Representations

  • Dana-Farber Cancer Institute
    Principal Investigators (PI): Heng Li (Contact), Benedict Paten
    Project Title: The construction and utility of reference pan-genome graphs
     
  • University of Southern California
    Principal Investigators (PI): Mark Chaisson (Contact), Evan Eichler, Tobias Marschall
    Project Title: Representing structural haplotypes and complex genetic variation in pan-genome graphs
     
  • Stanford University
    Principal Investigators (PI): Hanlee Ji (Contact), Tsachy Weissman
    Project Title: K-mer indexing for pan-genome reference annotation
     

The Genome Reference Representations (GRR) projects support research and development for a next-generation genome reference representation that can capture all human genome variation and support research on the full diversity of populations.

Informatics Tools for the Pangenome

  • Pending
     
  • Purpose: To develop informatics tools that can apply the new pangenome representation for analysis and enable use of the high-quality genome reference by clinical and basic researchers.
     

Technology Development for Complete Genome Sequencing

  • NHGRI will accept applications for Technology Development for Complete Genome Sequencing on an ongoing basis (see NOT-HG-19-011)
     
  • Purpose: Develop technologies for complete de novo sequencing of phased diploid human genomes.
     
  • Participants and Structure

    Human Genome Reference Center

    • Washington University, St. Louis
      Principal Investigators (PI): Ting Wang (Contact), Paul Flicek, Ira Hall, Benedict Paten

      The Human Genome Reference Center at Washington University in St. Louis serves as the coordinating center. They maintain and update the reference sequence; support state-of-the-art reference representations; and educate and coordinate with the research community (including clinicians and basic research scientists).
       

    High Quality Reference Genomes

    • University of California, Santa Cruz}
      Principal Investigators (PI): David Haussler (Contact), Evan Eichler, Ira Hall

      The High-Quality Human Reference Genomes Center at the University of California, Santa Cruz collects additional DNA samples from populations not represented in the current reference, including the creation of cell lines. They will generate at least 350 high-quality reference genome sequences, a subset of which will be finished, telomere-to-telomere genome sequences. The center also disseminates the data and works closely with the other Human Genome Reference Program components.
       

    Genome Reference Representations

    • Dana-Farber Cancer Institute
      Principal Investigators (PI): Heng Li (Contact), Benedict Paten
      Project Title: The construction and utility of reference pan-genome graphs
       
    • University of Southern California
      Principal Investigators (PI): Mark Chaisson (Contact), Evan Eichler, Tobias Marschall
      Project Title: Representing structural haplotypes and complex genetic variation in pan-genome graphs
       
    • Stanford University
      Principal Investigators (PI): Hanlee Ji (Contact), Tsachy Weissman
      Project Title: K-mer indexing for pan-genome reference annotation
       

    The Genome Reference Representations (GRR) projects support research and development for a next-generation genome reference representation that can capture all human genome variation and support research on the full diversity of populations.

    Informatics Tools for the Pangenome

    • Pending
       
    • Purpose: To develop informatics tools that can apply the new pangenome representation for analysis and enable use of the high-quality genome reference by clinical and basic researchers.
       

    Technology Development for Complete Genome Sequencing

    • NHGRI will accept applications for Technology Development for Complete Genome Sequencing on an ongoing basis (see NOT-HG-19-011)
       
    • Purpose: Develop technologies for complete de novo sequencing of phased diploid human genomes.
       

Overview

Since the origin of the human reference in the completion of the International Human Genome Project, there has been a need to maintain and improve the human reference and to make it available to the community. This has included resolving error reports, adding information to the reference from new high-quality genomes as they became available, and developing ways to represent alternative haplotype information derived from them. Improved or updated reference versions are curated and released to the community.

On March 1, 2018, NHGRI convened a web meeting of over 65 basic research, clinical, and bioinformatic scientists to discuss scientific opportunities for the genome reference. The meeting addressed key research and resource opportunities for improving the human reference; activities necessary to keep the reference relevant and useful; clinical and research community needs (including education); related resources; and collaborations.

The high-level conclusion of the meeting was that the current version of the human reference does not adequately represent human haplotype variation, that the existing tools to include alternative haplotype information in analyses are not well-used, and that there is an opportunity to significantly improve the human reference by developing it into a “pan-genome”. One goal of a pan-genome reference is to represent as much as possible of human haplotype variation, implying that any newly sequenced experimental or patient haplotype will be readily alignable to the reference.  This would include the multiple types of human genomic variation phased in chromosomal regions. This would require addition of many more high-quality human genome assemblies chosen to maximize haplotype diversity, for instance by incorporating samples collected under 1000 Genomes . This would also require the adoption of better ways of representing the data (e.g., as a genome graph), along with the development of new informatics tools to make use of the new reference. 

As a result of these discussions, NHGRI will re-organize and re-focus its contribution to the genome reference to create a multi-component Human Genome Reference Program (HGRP) intended to enable an improved human genome reference for the community, and to foster its long-term sustainability and improvement.

Based on the Concept for this program presented to the National Council on Human Genome Research the components will be:

  1. A Human Genome Reference Center (HGRC; RFA-HG-19-004)
  2. High Quality Human Reference Genomes (HGRQ; RFA-HG-19-002)
  3. Genome Reference Representations (GRR; RFA-HG-19-003)
  4. Informatics tools for use of the human genome reference (see Concept documents)
  5. Technology development for complete sequencing of genomes (NOT-HG-19-011
  • Overview

    Since the origin of the human reference in the completion of the International Human Genome Project, there has been a need to maintain and improve the human reference and to make it available to the community. This has included resolving error reports, adding information to the reference from new high-quality genomes as they became available, and developing ways to represent alternative haplotype information derived from them. Improved or updated reference versions are curated and released to the community.

    On March 1, 2018, NHGRI convened a web meeting of over 65 basic research, clinical, and bioinformatic scientists to discuss scientific opportunities for the genome reference. The meeting addressed key research and resource opportunities for improving the human reference; activities necessary to keep the reference relevant and useful; clinical and research community needs (including education); related resources; and collaborations.

    The high-level conclusion of the meeting was that the current version of the human reference does not adequately represent human haplotype variation, that the existing tools to include alternative haplotype information in analyses are not well-used, and that there is an opportunity to significantly improve the human reference by developing it into a “pan-genome”. One goal of a pan-genome reference is to represent as much as possible of human haplotype variation, implying that any newly sequenced experimental or patient haplotype will be readily alignable to the reference.  This would include the multiple types of human genomic variation phased in chromosomal regions. This would require addition of many more high-quality human genome assemblies chosen to maximize haplotype diversity, for instance by incorporating samples collected under 1000 Genomes . This would also require the adoption of better ways of representing the data (e.g., as a genome graph), along with the development of new informatics tools to make use of the new reference. 

    As a result of these discussions, NHGRI will re-organize and re-focus its contribution to the genome reference to create a multi-component Human Genome Reference Program (HGRP) intended to enable an improved human genome reference for the community, and to foster its long-term sustainability and improvement.

    Based on the Concept for this program presented to the National Council on Human Genome Research the components will be:

    1. A Human Genome Reference Center (HGRC; RFA-HG-19-004)
    2. High Quality Human Reference Genomes (HGRQ; RFA-HG-19-002)
    3. Genome Reference Representations (GRR; RFA-HG-19-003)
    4. Informatics tools for use of the human genome reference (see Concept documents)
    5. Technology development for complete sequencing of genomes (NOT-HG-19-011

Program Management

NHGRI manages the HGRP as a consortium. Grantees for the Human Genome Reference Center, High Quality Reference Genomes, and Genome Reference Representations components interact closely on several aspects of the program such as prioritizing new samples, resolving reference errors or ambiguities, establishing quality metrics, transitioning to graph representations or new reference “builds”, and others.

NHGRI believes that the human reference will be more broadly useful if it can be integrated with, or is part of an effective ecosystem with, other existing databases and resources that present human variation information in different contexts (i.e. ClinVar, EGA, Human Genome Structural Variation Consortium, gnomAD, Bravo, etc.)

Data Release and Access Policies

NHGRI data release policies for genome sequence data evolved from the original Bermuda and Ft. Lauderdale policies which were suited for the Human Genome Project data and organismal sequence data. With the advent of projects involving large numbers of samples from human subjects, this area is under continuous evaluation, much of it at the NIH, rather than the NHGRI level.

See: NOT-OD-13-119 for a discussion of the latest NIH policy proposals in this area.

  • Program Management

    NHGRI manages the HGRP as a consortium. Grantees for the Human Genome Reference Center, High Quality Reference Genomes, and Genome Reference Representations components interact closely on several aspects of the program such as prioritizing new samples, resolving reference errors or ambiguities, establishing quality metrics, transitioning to graph representations or new reference “builds”, and others.

    NHGRI believes that the human reference will be more broadly useful if it can be integrated with, or is part of an effective ecosystem with, other existing databases and resources that present human variation information in different contexts (i.e. ClinVar, EGA, Human Genome Structural Variation Consortium, gnomAD, Bravo, etc.)

    Data Release and Access Policies

    NHGRI data release policies for genome sequence data evolved from the original Bermuda and Ft. Lauderdale policies which were suited for the Human Genome Project data and organismal sequence data. With the advent of projects involving large numbers of samples from human subjects, this area is under continuous evaluation, much of it at the NIH, rather than the NHGRI level.

    See: NOT-OD-13-119 for a discussion of the latest NIH policy proposals in this area.

Select Working Groups

Working Group Chairs Role
Assembly Team Evan Eichler
Karen Miga
Ira Hall
Benedict Paten
Erich Jarvis
Kerstin Howe
Generate “high quality production grade” assemblies; generate “finished” T2T assemblies; QC and validate assemblies; develop methods and pipelines
Pangenomes Ira Hall
Benedict Paten
Heng Li
Variant calling; pangenome framework, construction, and tools
Resource Improvement and Maintenance Paul Flicek
Valerie Schneider
Tina Lindsay
Functional annotation; handling error reports; resolving errors through targeted re-assembly and/or sequencing
Resource Sharing and Outreach Ting Wang
David Haussler
Resource sharing; outreach & education; browsers
Samples Eimear Kenny
Karen Miga
Collect, identify, and prioritize samples for inclusion in the project
Technology and Production Bob Fulton
Karen Miga
Coordinate data production across sites; develop, optimize, troubleshoot, and share protocols; engage with technology companies; test and adopt new technologies and protocols

 

  • Select Working Groups
    Working Group Chairs Role
    Assembly Team Evan Eichler
    Karen Miga
    Ira Hall
    Benedict Paten
    Erich Jarvis
    Kerstin Howe
    Generate “high quality production grade” assemblies; generate “finished” T2T assemblies; QC and validate assemblies; develop methods and pipelines
    Pangenomes Ira Hall
    Benedict Paten
    Heng Li
    Variant calling; pangenome framework, construction, and tools
    Resource Improvement and Maintenance Paul Flicek
    Valerie Schneider
    Tina Lindsay
    Functional annotation; handling error reports; resolving errors through targeted re-assembly and/or sequencing
    Resource Sharing and Outreach Ting Wang
    David Haussler
    Resource sharing; outreach & education; browsers
    Samples Eimear Kenny
    Karen Miga
    Collect, identify, and prioritize samples for inclusion in the project
    Technology and Production Bob Fulton
    Karen Miga
    Coordinate data production across sites; develop, optimize, troubleshoot, and share protocols; engage with technology companies; test and adopt new technologies and protocols

     

Funding Opportunities

  • RFA-HG-19-004 Human Genome Reference Center (HGRC) (U41 Clinical Trial Not Allowed) (Expired)
    Expiration Date: Apr 03, 2019

  • RFA-HG-19-002 High Quality Human Reference Genomes (HQRG) (U01 Clinical Trial Not Allowed) (Expired)
    Expiration Date: Apr 03, 2019

  • RFA-HG-19-003 Research and Development for Genome Reference Representations (GRR) (U01 Clinical Trial Not Allowed) (Expired)
    Expiration Date: Apr 03, 2019

  • NOT-HG-19-011 Notice of Change: Emphasizing Opportunity for Developing Comprehensive Human Genome Sequencing Methodologies in Response to NHGRI Novel Nucleic Acid Sequencing Technology Development FOAs

Contact Staff

Adam Felsenfeld, Ph.D.
Adam Felsenfeld, Ph.D.
  • Program Director
  • Division of Genome Sciences
Mike Smith
Mike Smith, Ph.D.
  • Program Director Genome Technology Program
  • Division of Genome Sciences
Heidi J. Sofia, Ph.D.
Heidi J. Sofia, Ph.D.
  • Program Director
  • Division of Genomic Medicine
Taylorlyn Stephan
Taylorlyn Stephan
  • Scientific Program Analyst
  • Division of Genome Sciences
Baergen Schultz
Baergen Schultz
  • Scientific Program Analyst
  • Division of Genomic Medicine

Last updated: June 12, 2019