Data Sharing Policies and Expectations
This webpage and the associated FAQs describe the various expectations for data sharing that are specific to NHGRI-supported studies. For general NIH data sharing policy information, please visit NIH's Scientific Data Sharing website.
Data Management and Sharing
Note: The NIH Data Management and Sharing (DMS) Policy (NOT-OD-21-013) goes into effect on January 25, 2023.
Broad data sharing promotes maximum public benefit from federally funded research, as well as rigor and reproducibility. For studies involving humans, responsible data sharing is important for maximizing the contributions of research participants and promoting trust. NHGRI supports the broadest appropriate data sharing with timely data release through widely accessible data repositories. These repositories may be open access (unrestricted) or, if more appropriate, controlled access. NHGRI is also committed to ensuring that publicly shared datasets are comprehensive and Findable, Accessible, Interoperable and Reusable (FAIR).
For information on NIH’s data management and sharing requirements, see their Data Management and Sharing webpage. For more on NHGRI’s expectations, see below.
Where to Submit Data
When determining where to submit data, investigators should first determine whether the Funding Opportunity Announcement (FOA) to which they are applying includes specific repository expectations.
- If not, AnVIL will serve as the primary repository for NHGRI-funded data, metadata and associated documentation. AnVIL supports submission of a variety of data types (not limited to genomic data nor data derived from human research participants) and accepts both controlled-access and unrestricted data..
You may propose an alternative to AnVIL in your Data Management and Sharing (DMS) Plan, which will be assessed by NHGRI prior to funding. The NIH maintains a list of repositories for sharing scientific data.
You may submit to one or more repositories. However, NHGRI expects researchers to share study data, associated documentation and metadata together in a single location whenever possible. Metadata, documentation and code/software/tools should generally be shared through open access. For more information on expectations for sharing comprehensive and standardized metadata and phenotypic data, see below.
NHGRI will update this guidance over time to account for the changing landscape of data resources.
Note: Per the NIH Clinical Trials Policy, NIH-funded clinical trials are expected to register and submit results to Clinicaltrials.gov. Sharing summary data via ClinicalTrials.gov may be a component of one’s DMS Plan but does not solely fulfill the requirements of the NIH DMS Policy.
How to Register Controlled-Access Studies
Note: After January 25, 2023, NHGRI's Genomic Data Sharing Policy (GDSP) templates will be replaced by a new document for capturing the information needed to register a controlled-access study. Signing official approval will no longer be needed. For application receipt dates on or after January 25, 2023, requests for an Alternative Data Sharing Plan should be documented in the DMS Plan.
Follow the process outlined in How to Register and Submit Your Study in dbGaP (Steps 1 – 6) to register your study in dbGaP.
Note: Study registration in dbGaP is required for large-scale human genomic studies submitting data to AnVIL and studies with an Alternative Data Sharing Plan.
For Step 2: Please complete the relevant template or send the basic study information needed for study registration to the NHGRI GPA (Jennifer.email@example.com):
Investigators seeking to submit non-NIH funded data to an NIH-designated data repository (e.g., AnVIL or dbGaP) should follow these instructions.
* Requires NIH Login
NHGRI follows the NIH’s expectation for submission and release of scientific data, with the following exception: for genomic data, NHGRI expects non-human genomic data that are subject to the NIH GDS Policy to be submitted and released on the same timeline as human genomic data.
- Level 0: Raw data generated directly from the instrument platform
- Level 1: Initial sequence reads, the most fundamental form of the data after the basic translation of raw input
- Level 2: Data after an initial round of analysis or computation to clean the data and assess basic quality measures
- Level 3: Analysis to identify genetic variants, gene expression patterns, or other features of the data set
- Level 4: Final analysis that relates the genomic data to phenotype or other biological states
Metadata and Phenotypic Data Sharing Expectations
Note: After the effective date of the NIH DMS Policy, investigators will outline plans for sharing of metadata, phenotypic data and other descriptive information in the DMS Plan rather than in the Resource Sharing Plan.
Per NOT-HG-21-022, NHGRI-funded and supported researchers are expected to:
- Share the metadata and phenotypic data associated with the study.
- Use standardized data collection protocols and survey instruments for capturing data, as appropriate.
- Use standardized notation for metadata (e.g., controlled vocabularies or ontologies) to enable the harmonization of datasets for secondary research analyses.
Investigators should outline plans for comprehensive sharing of metadata, phenotypic data and other descriptive information (e.g., protocols or methodologies used) in the Resource Sharing Plan of the grant application.
NHGRI strongly encourages the use of existing data standards and ontologies that are generally endorsed by the community of your research area, although it does not require the use of any particular one. Investigators should use data standard(s) and ontologies that facilitate comparison across similar studies within their research field.
For ideas of where to find a standard that aligns with your research domain, see the FAQ “Where should I start?”
Considerations for Sharing Non-Genomic Data from Human Research Participants
NHGRI supports various projects for which there may be justifiable limitations to sharing scientific data under the DMS Policy. These qualitative or mixed-methods projects use non-genomic data that may be challenging to de-identify or pose privacy risks even when data are de-identified due to the presence of information that can allow inferences to be made about a research participant’s identity (e.g., transcripts from focus groups or in-depth interviews, ethnographic observations, audio recordings of deliberative community-based engagements, social media posts).
All data derived from human research participants, including qualitative data, should be adequately de-identified prior to sharing to ensure adequate protection of research participants, maintain privacy, and mitigate risk, especially for vulnerable or marginalized groups. NHGRI recognizes that methods for de-identification of qualitative data are developing and existing tools may have limitations. Investigators are encouraged to review and consider the merits of various tools for data de-identification, such as those listed by the Johns Hopkins Libraries or the UK Data Archive. Investigators are also encouraged to review NIH’s resource on Repositories for Sharing Scientific Data, which includes a listing of generalist repositories that accepts all data types, Nature’s Data Repository Guidance, and the global Registry of Research Data Repositories.
NHGRI acknowledges that there may be technical, legal, informed consent or ethical factors that necessitate limited sharing or controlled access for qualitative or sensitive data. Investigators should consider whether there are justifiable limitations to sharing data under the DMS policy and the impact of de-identification or limited sharing on scientific utility. Investigators are not expected to share data if 1) the privacy or safety of research participants would be compromised or would place them at greater risk of re-identification or suffering harm, and 2) protective measures such as de-identification and Certificates of Confidentiality would be insufficient. NHGRI encourages investigators to engage with communities affected by the research and the sharing of sensitive data to discuss approaches for informed consent, appropriate use, risk mitigation and benefit sharing.
- NOT-OD-22-213 Guidance on Protecting Privacy When Sharing Human Research Participant Data, September 2022
- NIH Informed Consent for Secondary Research with Data and Biospecimens: Points to Consider and Sample Language for Future Use and/or Sharing, May 2022
Genomic Data Sharing
Note: After the effective date of the NIH DMS Policy, investigators will indicate an Alternative Data Sharing Plan or a request for an exception to the NHGRI expectation for explicit consent in their DMS Plan rather than in the Resource Sharing Plan.
For information on NIH’s genomic data sharing requirements, see the Genomic Data Sharing Policy webpage. For more on NHGRI’s expectations, see below.
Applicability of the Genomic Data Sharing Policy
The NIH Genomic Data Sharing (GDS) Policy (NOT-OD-14-124) and NHGRI’s implementation of the policy applies particularly to single nucleotide polymorphism (SNP) array data, genome sequence data, transcriptomic data, epigenomic data or other molecular data produced by array-based or high-throughput sequencing technologies.
NHGRI's Expectation Under the Policy
NHGRI values and encourages the sharing of smaller project sizes that do not meet the definition of “large-scale” according to the NIH guidance regarding scope of the GDS Policy. Investigators should consult with appropriate NHGRI program directors as early as possible to determine whether the GDS Policy applies to their research study. See the Notice of Plans for NHGRI Implementation of NIH Genomic Data Sharing Policy (NOT-HG-20-011 and NOT-HG-15-038) for more information about NHGRI’s expectations for genomic data sharing.
Informed Consent Requirements
Per the NIH GDS Policy, for studies that started after January 25, 2015 (the NIH GDS Policy effective date), informed consent documents for prospective data collection should state what data types will be shared (e.g., genomic, phenotype, health information) and for what purposes (e.g., general research use, disease-specific research use) and whether sharing will occur through open (unrestricted) or controlled-access databases.
Information that NIH expects to be conveyed in informed consent documents are defined in the NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy.
For more in-depth discussion of principles and best practices for drafting informed consent documents for genomics research, see the NHGRI Informed Consent Resource.
NHGRI's Expectations Under the Policy
- NHGRI strongly encourages studies that propose to derive genomic data from human specimens and cell lines to obtain participant consent either for general research use through controlled access or for unrestricted access. Similarly, consent language should avoid both restrictions on the types of users who may access the data and restrictions that add additional requirements to the access request process. NHGRI acknowledges that this will not always be possible or appropriate. In addition, individual participants who do not consent to future research use or broad sharing of their data (i.e., submission of their data to a publicly accessible data repository) may still participate in the primary study if consistent with study design.
- As of January 25, 2021, NHGRI expects that all human data generated by NHGRI-funded or supported research will be derived from biospecimens or cell lines for which explicit consent for future research use and broad data sharing can be documented. This NHGRI expectation goes beyond those of the NIH GDS Policy; the NIH GDS Policy does not require explicit consent for future use and broad data sharing when specimens or cell lines were created or collected before January 25, 2015, but NHGRI’s expectation contains no such clause. Research that proposes to use specimens and cell lines that lack explicit consent for future research use and broad data sharing should be accompanied by a request for an exception that describes the scientific reason(s) for using the specified data sources.
The need for an exception should be documented in the Resource Sharing Plan of the grant application. Requests for exceptions should be submitted via Part VI (“Request for an Exception for Samples Lacking Explicit Consent for Future Research Use and Broad Data Sharing”) of the NHGRI Genomic Data Sharing Plan (GDSP) template. (See the “How to Register Controlled-Access Studies” section).
For more information on the explicit consent requirement, visit our FAQ.
Alternatives to the GDS Expectations
When consistent with program priorities, NHGRI may accept well-justified data sharing plans that:
- Are unable to deposit genomic data in an NIH-designated data repository.
- Propose to share genomic data via a non-NIH-designated data repository.
A detailed explanation of the alternative mechanism for data sharing should be documented in the Resource Sharing Plan of the grant application and via the Alternative Data Sharing Plan Template. (See the “How to Register Controlled-Access Studies” section.)
NHGRI to require explicit consent for data sharing in genomics research
December 30, 2019
We need your input: dbGaP data submission and access process
February 21, 2017
The Genomics Landscape: The natural evolution of genomic data sharing
September 3, 2014
NIH issues finalized policy on genomic data sharing
August 27, 2014
NOT-HG-21-022: Notice Announcing the National Human Genome Research Institute’s Expectation for Sharing Quality Metadata and Phenotypic Data
NOT-HG-15-038: Notice of Plans for NHGRI Implementation of NIH Genomic Data Sharing Policy
NOT-HG-20-011: NHGRI Implementation of the NIH Genomic Data Sharing Policy
For Policy Questions:
Resources for Intramural Staff
NHGRI intramural staff should refer to specific instructions for NHGRI forms and submitting plans on the NHGRI Intranet's Genomic Data Sharing Policy Resources page (requires NIH login).
Last updated: January 9, 2023