Human Genome Reference Program (HGRP)

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results.

Explore this page

Overview

The human genome reference is used by essentially all researchers who need to align and assemble experimental or patient genome sequence data. It also serves as a consensus coordinate system for reporting results. The genome reference is therefore a critical resource for the genomics community. The NHGRI Human Genome Reference Program (HGRP) provides funding for efforts to maintain and improve the human genome reference resource.

Recently it has become technically feasible to produce very high-quality (haplotype phased, nearly contiguous) genome assemblies at scale. This has led to the implementation of a “pangenome reference”, which will include genome assemblies from hundreds of individuals from genetically diverse worldwide populations, along with the computational tools to enable the scientific community to use it.

A pangenome reference ideally will faithfully represent human haplotype variation that exists worldwide and whose frequencies vary across populations—variants that would not readily be detected using a genome reference only including one or a few individuals. With a fully realized pangenome reference, any newly sequenced experimental or patient haplotype will be readily alignable to the reference. This will make the human genome reference much more useful for essentially all genome sequence analyses done worldwide, and less likely to result in differences across populations in the ability to detect variation, which could lead to health disparities. For more on the human pangenome, see “The Human Pangenome”.

Phase 1 of the pangenome reference resource effort is now complete, starting with the release of a pangenome resource including genomes from nearly 50 individuals in 2023. As of December 2024, data and assemblies have been acquired for another ~300 individuals. These will be incorporated into the pangenome reference resource and released to the public in stages through mid-2025.

Phase 2 of NHGRI funding for the pangenome reference resource began in December 2024. The effort will add assemblies from another ~200 individuals selected for the likelihood that their genome assemblies will contribute additional variation to the resource. Phase 2 will also emphasize outreach and community adoption and will include development of informatics tools that will enable the wider community to use the pangenome reference resource to improve their research.

Organization and Components

For Phase 2 of the HGRP three components were funded. They are collectively called the Human Pangenome Reference Consortium (HPRC). These include:

1.    A Human Genome Reference Center (HGRC; RFA-HG-23-025)
2.    High Quality Human Reference Genomes (HGRQ; RFA-HG-23-024)
3.    Informatics tools for use of the human genome reference (ITPG; RFA-HG-25-007)

Phase 1 of the project also included efforts to develop computational representations of the pangenome; and an effort to continue to develop technology for sequencing complete high-quality genomes. See Table below.

NHGRI will also fund separate SBIRs for pangenome Informatics Tools —see PAR-25-308, PAR-25-309.

Ethical, Legal, and Social Implications

The Genomes Center award also supports a team of researchers dedicated to the ethical, legal, and social implications (ELSI) of the human pangenome reference. The team of ELSI researchers is embedded within the larger HPRC project and charged with identifying and addressing both known and emerging ELSI issues related to the HPRC using a variety of research methodologies.

Collaborations and Outreach

The HPRC interacts across borders and at multiple levels:

The HPRC is an international effort, including investigators in the US and Europe, and interactions with labs in Australia, Italy, and Japan
The HPRC is a GA4GH Driver Project
The HPRC is a member of the Human Pangenome Project

Current information is available at humanpangenome.org.

Resource Availability

Sequencing Data

The HGRP consortium is generating sequencing data utilizing a range of sequencing platforms (i.e., Illumina, Oxford Nanopore Technologies and Pacific Biosciences). The sequencing data includes short-read genome sequence data and chromatin conformation data; long-read genome sequence data, methylation data and transcription data. Post quality assessment it is deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/) and mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC). In addition to that it is also made available at SRA, ENA and DDBJ.

Genome Assemblies and Annotations

Sequencing data is used to generate high quality diploid genome assemblies that are deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/), mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC) and deposited at GenBank.

The following annotations accompany the assemblies:

Gene annotations from Comparative Annotation Toolkit (CAT) and Ensembl
Segmental Duplications
Tandem Repeats
Transposable Elements

Reference Pangenome Graphs

High quality diploid assemblies deposited at GenBank are used to derive reference pangenome graphs utilizing the following approaches:

Minigraph
Minigraph-CACTUS
Pangenome Graph Builder

All the raw and processed data generated by the consortium are publicly available after quality assessment. Computational workflows and quality assessment pipelines are available through Dockstore.

Resource Availability
Sequencing Data
The HGRP consortium is generating sequencing data utilizing a range of sequencing platforms (i.e., Illumina, Oxford Nanopore Technologies and Pacific Biosciences). The sequencing data includes short-read genome sequence data and chromatin conformation data; long-read genome sequence data, methylation data and transcription data. Post quality assessment it is deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/) and mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC). In addition to that it is also made available at SRA, ENA and DDBJ.
Genome Assemblies and Annotations
Sequencing data is used to generate high quality diploid genome assemblies that are deposited at AWS S3 bucket (https://registry.opendata.aws/hpgp-data/), mirrored at AnVIL (https://anvilproject.org/data/consortia/HPRC) and deposited at GenBank.
The following annotations accompany the assemblies:
Gene annotations from Comparative Annotation Toolkit (CAT) and Ensembl
Segmental Duplications
Tandem Repeats
Transposable Elements
Reference Pangenome Graphs
High quality diploid assemblies deposited at GenBank are used to derive reference pangenome graphs utilizing the following approaches:
Minigraph
Minigraph-CACTUS
Pangenome Graph Builder
All the raw and processed data generated by the consortium are publicly available after quality assessment. Computational workflows and quality assessment pipelines are available through Dockstore.

Participants

Awardee	Institution	Title	Award Number	Status
Coordinating Center Award
Ting Wang, Heng Li, Benedict Paten, Fergal Martin, Ira Hall	Washington University	The Human Pangenome Reference Consortium Coordination Center	U41HG010972	Active
Genome Center Award
Karen Miga, Eimear Kenny, Ting Wang, Erich Jarvis, Robert Cook-Deegan, Evan Eichler	University of California, Santa Cruz	Center for Human Genome Reference Diversity	UM1HG010971	Active
Tool Development Awards
Erik Garrison	University of Tennessee Health Science Center	Building Tools and Community to Make Pangenomes Accessible	U01HG013760	Active
Melissa Gymrek	University of California, San Diego	Integrating the reference pangenome with biobank-scale data for complex trait analysis	U01HG013755	Active
Benedict Paten, Heng Li, Tobias Marschall	University of California, Santa Cruz	Tools for comprehensive variant characterization using the pangenome	U01HG013748	Active
Andrew Stergachis	University of Washington	Tooling for accurately studying the epigenome along the human pangenome reference	U01HG013744	Active
Reference Representation Awards
Heng Li, Benedict Paten	Dana-Farber Cancer Institute	The construction and utility of reference pan-genome graphs	U01HG010961	Expired
Mark Chaisson, Evan Eichler, Tobias Marschall	University of Southern California	Representing structural haplotypes and complex genetic variation in pan-genome graphs	U01HG010973	Expired
Hanlee Ji, Tsachy Weissman	Stanford University	K-mer indexing for pan-genome reference annotation	U01HG010963	Expired
Technology Development Awards
Karen Miga	University of California, Santa Cruz	Improving throughput of long reads with high consensus base accuracy to resolve repetitive DNAs	R21HG010548	Expired

Participants

Awardee	Institution	Title	Award Number	Status
Coordinating Center Award
Ting Wang, Heng Li, Benedict Paten, Fergal Martin, Ira Hall	Washington University	The Human Pangenome Reference Consortium Coordination Center	U41HG010972	Active
Genome Center Award
Karen Miga, Eimear Kenny, Ting Wang, Erich Jarvis, Robert Cook-Deegan, Evan Eichler	University of California, Santa Cruz	Center for Human Genome Reference Diversity	UM1HG010971	Active
Tool Development Awards
Erik Garrison	University of Tennessee Health Science Center	Building Tools and Community to Make Pangenomes Accessible	U01HG013760	Active
Melissa Gymrek	University of California, San Diego	Integrating the reference pangenome with biobank-scale data for complex trait analysis	U01HG013755	Active
Benedict Paten, Heng Li, Tobias Marschall	University of California, Santa Cruz	Tools for comprehensive variant characterization using the pangenome	U01HG013748	Active
Andrew Stergachis	University of Washington	Tooling for accurately studying the epigenome along the human pangenome reference	U01HG013744	Active
Reference Representation Awards
Heng Li, Benedict Paten	Dana-Farber Cancer Institute	The construction and utility of reference pan-genome graphs	U01HG010961	Expired
Mark Chaisson, Evan Eichler, Tobias Marschall	University of Southern California	Representing structural haplotypes and complex genetic variation in pan-genome graphs	U01HG010973	Expired
Hanlee Ji, Tsachy Weissman	Stanford University	K-mer indexing for pan-genome reference annotation	U01HG010963	Expired
Technology Development Awards
Karen Miga	University of California, Santa Cruz	Improving throughput of long reads with high consensus base accuracy to resolve repetitive DNAs	R21HG010548	Expired

Social Media

Visit

Human Pangenome Reference Consortium Website

Follow

Human Pangenome Reference Consortium on Twitter

Funding Opportunities

Active

At this time, there are no active funding opportunities.

Expired

PAR-25-308 Small Business Informatics Tools for the Pangenome (R43 Clinical Trial Not Allowed)
Expiration Date: March 4th, 2025
PAR-25-309 Small Business Informatics Tools for the Pangenome (R41 Clinical Trial Not Allowed)
Expiration Date: March 4th, 2025
RFA-HG-25-007 Informatics Tools for the Pangenome (U01 Clinical Trial Not Allowed)
Expiration Date: March 4th, 2025
RFA-HG-23-024 Limited Competition: High Quality Reference Genomes (UM1 Clinical Trial Not Allowed)
Expiration Date: August 16, 2023
RFA-HG-23-025 Limited Competition: Human Pangenome Coordinating Center (U41 Clinical Trial Not Allowed)
Expiration Date: August 16, 2023
RFA-HG-19-004 Human Genome Reference Center (HGRC) (U41 Clinical Trial Not Allowed)
Expiration Date: Apr 03, 2019
RFA-HG-19-002 High Quality Human Reference Genomes (HQRG) (U01 Clinical Trial Not Allowed)
Expiration Date: Apr 03, 2019
RFA-HG-19-003 Research and Development for Genome Reference Representations (GRR) (U01 Clinical Trial Not Allowed)
Expiration Date: Apr 03, 2019
NOT-HG-19-011 Notice of Change: Emphasizing Opportunity for Developing Comprehensive Human Genome Sequencing Methodologies in Response to NHGRI Novel Nucleic Acid Sequencing Technology Development FOAs