What Information and Data Are Submitted?
All final datasets (human or non-human, including microbial data) generated through large-scale genomic projects, not just those datasets generated to support a publication, should be submitted to appropriate data repositories or made available through NHGRI-approved alternative data sharing plans. NHGRI finds value in and encourages the sharing of smaller project sizes that do not meet the definition of ‘large-scale’ according to the NIH guidance regarding the scope of the GDS Policy. Investigators should consult with appropriate NIH Program Officers as early as possible to determine whether the GDS Policy applies to their research study.
For more information on the scope of the NIH GDS Policy, see Section B of the NIH Genomic Data Sharing FAQs. For more information on how NHGRI’s expectations differ from the NIH expectations, see the NHGRI-Specific GDS Policy FAQs.
All metadata and descriptive information (e.g., protocols or methodologies used) needed to support future use of the data should be submitted. As much de-identified phenotype data as is practicable should be submitted. In this context, phenotype data refers to clinical data, environmental data, demographic variables, and any non-genotype data. When appropriate, relevant phenotype data from non-human studies also should be shared through open (unrestricted access) community resource data repositories.
Large resource projects (e.g., 1000 Genomes) should share their raw data (e.g., reads), intermediate data (e.g., assemblies), and processed data (e.g., variant calls, genotypes, haplotypes). When possible, investigators should use standard formats and vocabularies/ontologies to describe data elements (e.g., sequence data, variants, or phenotypes.
Data Sharing Plans
Resources regarding the expected elements of data sharing plans are provided on the NIH GDS Policy website, including information on how these plans are considered during peer review. After peer review, NHGRI will assess the potential value of the dataset for use in secondary analyses to confirm findings, explore different research questions, develop or refine analytic methodologies or programs, etc. In addition, funds and other resources requested for data deposition, management, or access will be considered.
For studies involving human data, NHGRI also will consider Institutional Review Board (IRB) assessments of informed consent processes and consent documents as noted in the NIH GDS Policy. An IRB should be consulted during the process of developing a Data Sharing Plan for studies generating data from human subjects. Participant protection issues for the proposed study population (e.g., particular privacy concerns or a potential for group harm) or related to the scientific design (e.g., isolated geographic population or small family studies), as evaluated by an IRB and consistent with program priorities, will also inform data sharing plan review.
The Institutional Certification asserts that that plans for the submission of human genomic data to the NIH meet the expectations of the NIH GDS Policy, and describes the Data Use Limitations (DULs) associated with the data set. DULs are developed by the submitting institution and are based on the terms of the informed consent of the study participants from whom the genomic data are being generated, or otherwise stipulated by the submitting institution.
As specified in the NIH GDS Policy, data submitting institutions (including the NHGRI Intramural Research Program) should submit to the pertinent NHGRI Program Director or NHGRI Genomic Program Administrator (GPA), as appropriate, an Institutional Certification document signed by an appropriate Institutional Signing Official, for studies that require this document.
For information about exceptions and alternatives to the NHGRI genomic data sharing expectations, see Exceptions and Alternatives.
On November 1, 2019, NIH updated the management of Genomic Summary Results (GSR) to allow unrestricted access to GSR from most studies deposited in NIH-designated data repositories.
Genomic Data Sharing Plan Forms
NHGRI requires that investigators complete an NHGRI Genomic Data Sharing Plan (GDSP) form as part of their Just-in-Time information. Investigators should work with their Program Director and the NHGRI GPA to finalize and approve your Just-in-Time information.
Process for Submitting and Releasing Data
Clear milestones for the timing of data deposition should be established for each project and included in the Data Sharing Plan to provide a timeline by which to assess progress toward meeting data submission expectations. Milestones should adhere to standard data release timelines outlined in the NHGRI Genomic Data Sharing (GDS) Policy: Data Standards and the NHGRI Guidance for Data Submission and Data Release instructions below, and should be discussed with the relevant Program Director prior to the start of research projects. Large resource projects may develop project-specific timelines for data release, in conjunction with program officers or NHGRI intramural leadership, that exceed the minimum expectations specified in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release instructions below.
Unless otherwise specified by funding opportunity announcements, analyses by submitting investigators that are conducted subsequent to the initial data submission, final data sets, or any data updates should be submitted for release concurrent with the first publication analyzing the dataset.
Data sharing progress reports will be expected, consistent with trans-NIH processes, as they are implemented, or through other NHGRI consortia reporting mechanisms, as applicable. Program directors will monitor progress against the timelines established through the data sharing plans.
NHGRI Guidance for Data Submission and Data Release
Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016 should include pre-publication timelines for data submission and release consistent with NIH GDS Policy expectations for human genomic data (including a possible holding period before data release not to exceed six months).
For more detailed information on NHGRI’s expectations for data sharing, see the NHGRI Genomic Data Sharing (GDS) Policy: Data Standards.
of Data Processing
|Example Data Types||Data Submission Expectation||Data Release
|0||Raw data generated directly from the instrument platform||Instrument image data||Human data: Not expected.Non-human data: Not expected.||Human data: NA.Non-human data: NA.|
|1||The basic data after the initial processing of raw input data||DNA sequence reads, ChIP-Seq reads, RNA-Seq reads, SNP array data, array CGH data||Human data: Not expected.Non-human data*: Not expected, except for de novo sequence data (unless it is included with Level 2 aligned sequence files). Submission of de novo sequence data is expected no later than the time of initial publication.||Human data: NA.Non-human data: No later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.|
|2||Data after an initial round of processing or computation to clean the data and assess basic quality measures||DNA sequence alignments to a reference sequence or de novo assembly, RNA expression profiling||Human data: After data cleaning and quality control, which is generally within 3 months after data were generated. Project specific.Non-human data*: Data submission expected at the time of initial publication; an earlier submission date may be designated for certain data types or NIH projects.||Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.|
|3||Analysis to identify genetic variants, gene expression patterns, or other features of the dataset||SNP or structural variant calls, genotypes, expression levels, epigenomic features||Human data: After cleaning and quality control, which is generally within 3 months after data have been generated. Project specific.Non-human data*: Data submission expected at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.||Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.|
|4||Final analysis that relates the genomic data to phenotype or other biological states||Genotype-phenotype relationships, relationships of RNA expression or epigenomic patterns to biological state||Human data: Data submitted as analyses are completed.Non-human data*: Data submission expected at the time of initial publication.||Human data: Data released with publication.Non-human data: Data released at the time of initial publication.|
Last updated: July 11, 2019