NHGRI logo

Data Sharing Policies and Expectations

This webpage and the associated FAQs describe the various expectations for data sharing that are specific to NHGRI-supported studies. The NHGRI Data Sharing Governance Committee oversees the institute's implementation and maintenance of NIH and NHGRI data sharing policies. For general NIH data sharing policy information, please visit NIH's Scientific Data Sharing website.

Data Management and Sharing

Note: The NIH Data Management and Sharing (DMS) Policy (NOT-OD-21-013) went into effect on January 25, 2023.

Broad data sharing promotes maximum public benefit from federally funded research, as well as rigor and reproducibility. For studies involving humans, responsible data sharing is important for maximizing the contributions of research participants and promoting trust. NHGRI supports the broadest appropriate data sharing with timely data release through widely accessible data repositories. These repositories may be open access (unrestricted) or, if more appropriate, controlled access. NHGRI is also committed to ensuring that publicly shared datasets are comprehensive and Findable, Accessible, Interoperable and Reusable (FAIR).

For information on the NIH Data Management and Sharing (DMS) Policy (NOT-OD-21-013) and the NIH Genomic Data Sharing Policy (NOT-OD-14-124), see the NIH Scientific Data Sharing website.

Resources include:

For more on NHGRI’s expectations, see below.

Where to Submit Data

  1. When determining where to submit data, investigators should first determine whether the Notice of Funding Opportunity (NOFO) to which they are applying includes specific repository expectations.
     
  2. If not, AnVIL will serve as the primary repository for NHGRI-funded data, metadata and associated documentation. AnVIL supports submission of a variety of data types (not limited to genomic data nor data derived from human research participants) and accepts both controlled-access and unrestricted data.
     

You may propose an alternative to AnVIL in your DMS Plan, which will be assessed by NHGRI prior to funding. The NIH maintains a list of repositories for sharing scientific data.

You may submit to one or more repositories. However, NHGRI expects researchers to share study data, associated documentation, and metadata together in a single location whenever possible. Metadata, documentation, and code/software/tools should generally be shared through open access. For more information on expectations for sharing comprehensive and standardized metadata and phenotypic data, see “Metadata and Phenotypic Data Sharing Expectations” below.

NHGRI will update this guidance over time to account for the changing landscape of data resources.

Note: Per the NIH Clinical Trials Policy, NIH-funded clinical trials are expected to register and submit results to Clinicaltrials.gov. Sharing summary data via ClinicalTrials.gov may be a component of one’s DMS Plan but does not solely fulfill the requirements of the NIH DMS Policy.

How to Register Controlled-Access Studies

 

Study registration in dbGaP is required for large-scale human genomic studies, including those submitting data to AnVIL and studies with an Alternative Data Sharing Plan.

 

Follow the standard process outlined in How to Register and Submit Your Study in dbGaP (Steps 1 – 6) to register your study in dbGaP. Next, see the AnVIL Submission Guide for instructions on submitting the data (Step 7).

 

For Step 2: Please complete the relevant template below or send the basic study information needed for study registration to the NHGRI GPA (nhgrigpa@mail.nih.gov):

 

 

Investigators seeking to submit non-NIH funded data to an NIH-designated data repository (e.g., AnVIL or dbGaP) should follow these instructions.

Timelines for Submitting and Releasing Data

NHGRI follows the NIH’s expectation for submission and release of scientific data, with the following exception: for genomic data, NHGRI expects non-human genomic data that are subject to the NIH GDS Policy to be submitted and released on the same timeline as human genomic data.

 

Timeline for submitting human data

 

Timeline for submitting non-human data

 

Key

  • Level 0: Raw data generated directly from the instrument platform
  • Level 1: Initial sequence reads, the most fundamental form of the data after the basic translation of raw input
  • Level 2: Data after an initial round of analysis or computation to clean the data and assess basic quality measures
  • Level 3: Analysis to identify genetic variants, gene expression patterns, or other features of the data set
  • Level 4: Final analysis that relates the genomic data to phenotype or other biological states

Metadata and Phenotypic Data Sharing Expectations

Per NOT-HG-21-022, NHGRI-funded and supported researchers are expected to:

  1. Share the metadata and phenotypic data associated with the study.
  2. Use standardized data collection protocols and survey instruments for capturing data, as appropriate.
  3. Use standardized notation for metadata (e.g., controlled vocabularies or ontologies) to enable the harmonization of datasets for secondary research analyses.

Investigators should outline plans for comprehensive sharing of metadata, phenotypic data, and other descriptive information (e.g., protocols or methodologies used) in the DMS Plan of the grant application.

NHGRI strongly encourages the use of existing data standards and ontologies that are generally endorsed by the community of your research area, although it does not require the use of any particular one. Investigators should use data standard(s) and ontologies that facilitate comparison across similar studies within their research field.

For ideas of where to find a standard that aligns with your research domain, see NHGRI’s Metadata and Phenotypic Data Expectations FAQs, and specifically “Where should I start?

Considerations for Sharing Scientific Data from Human Research Participants

NHGRI supports many projects that generate scientific data, both genomic and non-genomic, from human research participants. In these instances, researchers must balance the expectation to share data with participant privacy considerations. Though the expectation is for investigators to maximize data sharing, the NIH DMS Policy recognizes that there may be justifiable limitations to sharing scientific data.

Generally, the scientific data derived from human research participants, including qualitative data, should be adequately de-identified prior to sharing to ensure protection of research participants, maintain privacy, and mitigate risk, especially for vulnerable or marginalized groups. NHGRI recognizes that methods for de-identification of qualitative and other data types are developing, and existing tools may have limitations. Investigators are encouraged to review and consider the merits of various tools for data de-identification, such as those listed by the Johns Hopkins Libraries or the UK Data Archive.

Certain studies, for example, qualitative or mixed-methods projects, may generate scientific data that are challenging to de-identify or still pose privacy risks even when data are de-identified due to the presence of information that can allow inferences to be made about a research participant’s identity. For example, imaging data, rich clinical/phenotypic data, transcripts from focus groups or in-depth interviews, ethnographic observations, audio recordings of deliberative community-based engagements, social media posts, etc. may need special protections to ensure participant privacy. In these instances, a controlled-access mechanism may be most appropriate to ensure the protection of participants in the study.

Investigators are not obligated to share scientific data if 1) the privacy or safety of research participants would be compromised or would place them at greater risk of re-identification or suffering harm, and 2) protective measures such as de-identification and Certificates of Confidentiality would be insufficient. NHGRI encourages investigators to engage with communities affected by the research and the sharing of sensitive data to discuss approaches for informed consent, appropriate use, risk mitigation and benefit sharing.

Further Reading:

  • Considerations for Sharing Scientific Data from Human Research Participants

    NHGRI supports many projects that generate scientific data, both genomic and non-genomic, from human research participants. In these instances, researchers must balance the expectation to share data with participant privacy considerations. Though the expectation is for investigators to maximize data sharing, the NIH DMS Policy recognizes that there may be justifiable limitations to sharing scientific data.

    Generally, the scientific data derived from human research participants, including qualitative data, should be adequately de-identified prior to sharing to ensure protection of research participants, maintain privacy, and mitigate risk, especially for vulnerable or marginalized groups. NHGRI recognizes that methods for de-identification of qualitative and other data types are developing, and existing tools may have limitations. Investigators are encouraged to review and consider the merits of various tools for data de-identification, such as those listed by the Johns Hopkins Libraries or the UK Data Archive.

    Certain studies, for example, qualitative or mixed-methods projects, may generate scientific data that are challenging to de-identify or still pose privacy risks even when data are de-identified due to the presence of information that can allow inferences to be made about a research participant’s identity. For example, imaging data, rich clinical/phenotypic data, transcripts from focus groups or in-depth interviews, ethnographic observations, audio recordings of deliberative community-based engagements, social media posts, etc. may need special protections to ensure participant privacy. In these instances, a controlled-access mechanism may be most appropriate to ensure the protection of participants in the study.

    Investigators are not obligated to share scientific data if 1) the privacy or safety of research participants would be compromised or would place them at greater risk of re-identification or suffering harm, and 2) protective measures such as de-identification and Certificates of Confidentiality would be insufficient. NHGRI encourages investigators to engage with communities affected by the research and the sharing of sensitive data to discuss approaches for informed consent, appropriate use, risk mitigation and benefit sharing.

    Further Reading:

Genomic Data Sharing

For information on NIH’s genomic data sharing requirements, see the Genomic Data Sharing Policy webpage. For more on NHGRI’s expectations, see below.

Applicability of the Genomic Data Sharing Policy

The NIH Genomic Data Sharing (GDS) Policy (NOT-OD-14-124) and NHGRI’s implementation of the policy applies particularly to single nucleotide polymorphism (SNP) array data, genome sequence data, transcriptomic data, epigenomic data or other molecular data produced by array-based or high-throughput sequencing technologies.

NHGRI's Expectation Under the Policy

NHGRI values and encourages the sharing of smaller project sizes that do not meet the definition of “large-scale” according to the NIH guidance regarding scope of the GDS Policy. Investigators should consult with appropriate NHGRI program directors as early as possible to determine whether the GDS Policy applies to their research study. See the Notice of Plans for NHGRI Implementation of NIH Genomic Data Sharing Policy (NOT-HG-20-011 and NOT-HG-15-038) for more information about NHGRI’s expectations for genomic data sharing.

Informed Consent Requirements

Per the NIH GDS Policy, for studies that started after January 25, 2015 (the NIH GDS Policy effective date), informed consent documents for prospective data collection should state what data types will be shared (e.g., genomic, phenotype, health information) and for what purposes (e.g., general research use, disease-specific research use) and whether sharing will occur through open (unrestricted) or controlled-access databases.

Information that NIH expects to be conveyed in informed consent documents are defined in the NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy.

For more in-depth discussion of principles and best practices for drafting informed consent documents for genomics research, see the NHGRI Informed Consent Resource.

NHGRI's Expectations Under the Policy

  1. NHGRI strongly encourages studies that propose to derive genomic data from human specimens and cell lines to obtain participant consent either for general research use through controlled access or for unrestricted access. Similarly, consent language should avoid both restrictions on the types of users who may access the data and restrictions that add additional requirements to the access request process. NHGRI acknowledges that this will not always be possible or appropriate. In addition, individual participants who do not consent to future research use or broad sharing of their data (i.e., submission of their data to a publicly accessible data repository) may still participate in the primary study if consistent with study design.
     
  2. As of January 25, 2021, NHGRI expects that all human data generated by NHGRI-funded or supported research will be derived from biospecimens or cell lines for which explicit consent for future research use and broad data sharing can be documented. This NHGRI expectation goes beyond those of the NIH GDS Policy; the NIH GDS Policy does not require explicit consent for future use and broad data sharing when specimens or cell lines were created or collected before January 25, 2015, but NHGRI’s expectation contains no such clause. Research that proposes to use specimens and cell lines that lack explicit consent for future research use and broad data sharing should be accompanied by a request for an exception that describes the scientific reason(s) for using the specified data sources.

    The need for an exception should be documented in the DMS Plan of the grant application. Requests for exceptions should be submitted via the template for Requesting an Exception for Samples Lacking Explicit Consent for Future Research Use and Broad Data Sharing. (See the “How to Register Controlled-Access Studies” section).

    For more information on the explicit consent requirement, visit our NHGRI-Specific GDS Policy FAQ.

Alternatives to the GDS Expectations

When consistent with program priorities, NHGRI may accept well-justified data sharing plans that:

  • Are unable to deposit genomic data in an NIH-designated data repository.
  • Propose to share genomic data via a non-NIH-designated data repository.
     

A detailed explanation of the alternative mechanism for data sharing should be documented in the DMS Plan of the grant application and via the Alternative Data Sharing Plan Template. (See the “How to Register Controlled-Access Studies” section.)

Notices

NOT-HG-21-022: Notice Announcing the National Human Genome Research Institute’s Expectation for Sharing Quality Metadata and Phenotypic Data

NOT-HG-15-038: Notice of Plans for NHGRI Implementation of NIH Genomic Data Sharing Policy

NOT-HG-20-011: NHGRI Implementation of the NIH Genomic Data Sharing Policy

Contacts

For Specific GDS Policy Compliance Questions:

Barbara Thomas
Barbara Thomas, Ph.D.
  • NHGRI Data Access Committee (DAC) Chair
  • Scientific Review Branch
Kris Wetterstrand, M.S.
Kris A. Wetterstrand, M.S.
  • Acting Lead Genomic Program Administrator
  • National Human Genome Research Institute

For Policy Questions:

static
Elena M. Ghanaim, M.A.
  • Policy Advisor for Data Science and Sharing
  • Office of Genomic Data Science

Resources for Intramural Staff

NHGRI intramural staff should refer to specific instructions for NHGRI forms and submitting plans on the NHGRI Intranet's Genomic Data Sharing Policy Resources page (requires NIH login).

Last updated: January 9, 2024