The National Human Genome Research Institute (NHGRI) supports and complies with all National Institutes of Health (NIH) data sharing policies. Information about general NHGRI expectations for implementation of the NIH Genomic Data Sharing (GDS) Policy is provided below.
If applicable, Funding Opportunity Announcements (FOA) will specify additional data sharing expectations for specific programs. Information about the NIH GDS Policy and expectations is available through the NIH GDS Policy website. NHGRI will update this implementation plan as needed to maintain consistency with program priorities, agency policies or any trans-NIH implementation guidance.
Broad data sharing promotes maximum public benefit from federally funded genomics research. NHGRI supports the broadest appropriate genomic data sharing with timely data release through widely accessible data repositories. These repositories may be open access (unrestricted) or, if more appropriate, controlled access (see the NIH list of data repository examples for guidance).
Whenever possible, NHGRI studies involving human data should use data generated from sources with participant consent for unrestricted access or for general research uses through controlled access. Similarly, consent language should avoid restrictions on the types of users who may access the data. NHGRI acknowledges that this will not always be possible or appropriate. In addition, individual participants who do not consent to future use or broad data sharing may still participate in the primary study, if consistent with study design.
NHGRI encourages sharing of all data types. However, at this time the NIH GDS Policy and NHGRI implementation plans apply particularly to single nucleotide polymorphism (SNP) array data, genome sequence data, transcriptomic data, epigenomic data, or other molecular data produced by array-based technologies or high-throughput sequencing technologies.
Data pertinent to the interpretation of genomic data-such as associated phenotype data (e.g., clinical information relevant to the disease under study), exposure data, and descriptive information (e.g., protocols or methodologies used)-are expected to be shared. All data sets should include the appropriate metadata to allow efficient sharing and integration with other data sets.
Examples of research or research-related activities funded or supported by NHGRI that are outside the scope of the NIH GDS Policy include, but are not limited to, projects that do not meet the criteria specified in the NIH GDS Policy Supplemental Information.
Per the NIH GDS Policy, informed consent documents for prospective data collection after January 25, 2015 should state what data types will be shared (e.g., genomic, phenotype, health information, etc.), for what purposes (e.g., general research use, disease-specific research use, etc.), and whether sharing will occur through open (unrestricted) or controlled access databases (or an approved alternative sharing plan). This and other information that NIH expects to be conveyed in documents obtaining explicit consent for future research use and broad data sharing are defined in the NIH Guidance on Consent for Future Research Use and Broad Sharing of Human Genomic and Phenotypic Data Subject to the NIH Genomic Data Sharing Policy. These expectations also apply to data to be produced from cell lines or clinically derived samples.
Likewise, for research involving samples collected prior to January 25, 2015, NHGRI recognizes that informed consent processes may not have explicitly anticipated future broad data sharing or research use. In these instances, submitting institutions should assure that the future research use and data sharing plans are not inconsistent with the informed consent provided by study participants. Relevant issues to consider in these situations are reviewed in the NIH Points to Consider for Institutions and IRBs (part 2, page 12) regarding genomic data sharing. Please note that NIH will be updating this document.
If established or commercially available cell lines or clinical specimens created prior to January 25, 2015, are included as data sources in a study, investigators should seek whenever possible to use samples where consent for future research use and data sharing can be documented.
NIH and NHGRI acknowledge that broad data sharing may not always be appropriate. In these instances, investigators should request an exception from data deposition in an NIH-designated data repository prior to initiating research activities, if appropriate samples with broader data sharing consent are not available. Exceptions from data deposition should be justified through a data sharing plan submitted with the funding request (see Data Sharing Plans below).
Similarly, there may be cases involving cell lines or specimens collected after January 25, 2015, where requesting explicit consent for future research use and broad data sharing was not possible but where there are compelling scientific reasons to conduct the research with those data sources. In those cases, and consistent with any NIH guidance issued, an exception from obtaining explicit consent may be requested from NHGRI in a data sharing plan.
Investigators should note the following NHGRI expectation, which goes beyond the basic NIH expectation with regard to grandfathered data sources. NHGRI expects that by January 25, 2020, all human data used by NHGRI-funded or -supported research will be generated from specimens or cell lines for which explicit consent for future research use and broad data sharing can be documented. Research proposing to use samples lacking such consent should be accompanied by an alternative data sharing plan supported by a compelling scientific reason for using the specified data sources. Exceptions to this expectation will continue to be granted when there is a compelling scientific reason, as provided for in the NIH GDS Policy.
For more in-depth discussion of principles and best practices for drafting informed consent documents for genomics research, see the NHGRI Informed Consent Resource.
Resources regarding the expected elements of data sharing plansare provided on the NIH GDS Policy website, including information on how these plans are considered during peer review. After peer review, NHGRI will assess the potential value of the dataset for use in secondary analyses to confirm findings, explore different research questions, develop or refine analytic methodologies or programs, etc. In addition, funds and other resources needed for data deposition, management, or access will be considered.
For studies involving human data, NHGRI also will consider Institutional Review Board (IRB) assessments of informed consent processes and consent documents as noted in the NIH GDS Policy. Other participant protection issues for the proposed study population (e.g., particular privacy concerns or a potential for group harm) or related to the scientific design (e.g., isolated geographic population or small family studies), as evaluated by an IRB and consistent with program priorities, will also inform data sharing plan review.
As specified in the NIH GDS Policy, data submitting institutions (including the NHGRI Intramural Research Program) should submit to the appropriate NHGRI Program Director or NHGRI Genomic Program Administrator (GPA), as appropriate, an Institutional Certification document signed by an appropriate Institutional Signing Official, for studies that require this document.
Per the NIH GDS Policy, NHGRI will consider requests for exceptions to standard data sharing plan expectations. When consistent with program priorities, NHGRI may accept well-justified data sharing plans that do not include broad data sharing or that include more narrow data use limitations for future research.
Basic criteria that NHGRI will use to assess exception requests include an IRB or equivalent determination that informed consent materials preclude broad data sharing, or an IRB assessment that there are additional participant protection concerns related to the nature or character of the study population (e.g., geographical location or small study designs focused on a rare disease).
Investigators may also submit a justification within a data sharing plan demonstrating that data sharing costs (e.g., financial or personnel resources) outweigh the potential for broad scientific value of access to the data.
In all cases where alternative data sharing plans are determined to be appropriate, information on how to request access to the data and a basic summary of the study and study data will be listed in dbGaP (or other appropriate NIH-designated data repository). Timelines for data submission and access under alternative data sharing plans should be consistent with those for standard data sharing under the NIH GDS Policy.
Information about any additional elements to consider in requesting exceptions from data deposition in NIH-designated data repositories will be added to this page as they are developed.
All final datasets (human or non-human, including microbial data) generated through large-scale genomic projects, not just those datasets generated to support a publication, should be submitted to appropriate data repositories or made available through NHGRI-approved alternative data sharing plans.
All metadata and descriptive information (e.g., protocols or methodologies used) needed to support future use of the data should be submitted. As much de-identified phenotype data as is practicable should be submitted. In this context, phenotype data refers to clinical data, environmental data, demographic variables, and any non-genotype data. When appropriate, relevant phenotype data from non-human studies also should be shared through open (unrestricted access) community resource data repositories.
Large resource projects (e.g., 1000 Genomes) should share their raw data (e.g., reads), intermediate data (e.g., assemblies), and processed data (e.g., variant calls, genotypes, haplotypes). When possible, investigators should use standard formats and vocabularies/ontologies to describe data elements (e.g., sequence data, variants, or phenotypes).
Clear milestones for the timing of data deposition should be established for each project to provide a timeline by which to assess progress toward meeting data submission expectations. Milestones should adhere to standard data release timelines outlined in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release table below, and should be agreed upon prior to the start of research projects. Large resource projects may develop project-specific timelines for data release, in conjunction with program officers or NHGRI intramural leadership, that exceed the minimum expectations specified in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release table (see table below).
Unless otherwise specified by project funding announcements, analyses by submitting investigators that are conducted subsequent to the initial data submission, final data sets, or any data updates should be submitted for release concurrent with the first publication analyzing the dataset.
Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016 should include pre-publication timelines for data submission and release consistent with NIH GDS Policy expectations for human genomic data (including a possible holding period before data release not to exceed six months).
Data sharing progress reports will be expected consistent with trans-NIH processes as they are implemented, or through other NHGRI consortia reporting mechanisms, as applicable. Program directors will monitor progress against the timelines established through the data sharing plans.
Last Updated: October 3, 2017