Skip Navigation
NIH

Genomic Data Sharing (GDS) Policy 

Data Submission

Data Sharing Expectations: What Information and Data Are Submitted?

All final datasets (human or non-human, including microbial data) generated through large-scale genomic projects, not just those datasets generated to support a publication, should be submitted to appropriate data repositories or made available through NHGRI-approved alternative data sharing plans.

All metadata and descriptive information (e.g., protocols or methodologies used) needed to support future use of the data should be submitted. As much de-identified phenotype data as is practicable should be submitted. In this context, phenotype data refers to clinical data, environmental data, demographic variables, and any non-genotype data. When appropriate, relevant phenotype data from non-human studies also should be shared through open (unrestricted access) community resource data repositories.

Large resource projects (e.g., 1000 Genomes) should share their raw data (e.g., reads), intermediate data (e.g., assemblies), and processed data (e.g., variant calls, genotypes, haplotypes). When possible, investigators should use standard formats and vocabularies/ontologies to describe data elements (e.g., sequence data, variants, or phenotypes.

Data Sharing Plan

Note: intramural investigators should refer to specific instructions for NHGRI forms and submitting plans on the Genomic Data Sharing Policy Resources page. (Coming Soon)

Resources regarding the expected elements of data sharing plans are provided on the NIH GDS Policy website, including information on how these plans are considered during peer review. After peer review, NHGRI will assess the potential value of the dataset for use in secondary analyses to confirm findings, explore different research questions, develop or refine analytic methodologies or programs, etc. In addition, funds and other resources requested for data deposition, management, or access will be considered.

For studies involving human data, NHGRI also will consider Institutional Review Board (IRB) assessments of informed consent processes and consent documents as noted in the NIH GDS Policy. An IRB should be consulted during the process of developing a Data Sharing Plan. Participant protection issues for the proposed study population (e.g., particular privacy concerns or a potential for group harm) or related to the scientific design (e.g., isolated geographic population or small family studies), as evaluated by an IRB and consistent with program priorities, will also inform data sharing plan review.

As specified in the NIH GDS Policy , data submitting institutions (including the NHGRI Intramural Research Program) should submit to the appropriate NHGRI Program Director or NHGRI Genomic Program Administrator (GPA), as appropriate, an Institutional Certification document signed by an appropriate Institutional Signing Official, for studies that require this document. The Institutional Certification asserts that a Data Sharing Plan and an Informed Consent are compatible, and finalizes approval of a Data Sharing Plan.

Exceptions to Data Deposition and Alternative Data Sharing Plans

For more information, refer to the Exceptions to the Policy section.

Process for Submitting and Releasing Data

(Images Coming Soon)

Clear milestones for the timing of data deposition should be established for each project and included in the Data Sharing Plan to provide a timeline by which to assess progress toward meeting data submission expectations. Milestones should adhere to standard data release timelines outlined in the NIH GDS Policy Supplemental Information  and the NHGRI Guidance for Data Submission and Data Release table below, and should be agreed upon prior to the start of research projects. Large resource projects may develop project-specific timelines for data release, in conjunction with program officers or NHGRI intramural leadership, that exceed the minimum expectations specified in the NIH GDS Policy Supplemental Information and the NHGRI Guidance for Data Submission and Data Release table (see table below).

Unless otherwise specified by project funding announcements, analyses by submitting investigators that are conducted subsequent to the initial data submission, final data sets, or any data updates should be submitted for release concurrent with the first publication analyzing the dataset.

Data sharing progress reports will be expected consistent with trans-NIH processes as they are implemented, or through other NHGRI consortia reporting mechanisms, as applicable. Program directors will monitor progress against the timelines established through the data sharing plans.

NHGRI-Specific Expectations

Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016 should include pre-publication timelines for data submission and release consistent with NIH GDS Policy  expectations for human genomic data (including a possible holding period before data release not to exceed six months).

NHGRI Guidance for Data Submission and Data Release
Level

 

General Description
of Data Processing

 

Example Data Types

 

Data Submission Expectation

 

Data Release
Timeline

 

0 Raw data generated directly from the instrument platform Instrument image data Human data: Not expected.

 

Non-human data: Not expected.
Human data: NA.

 

Non-human data: NA.
1 The basic data after the initial processing of raw input data DNA sequence reads, ChIP-Seq reads, RNA-Seq reads, SNP array data, array CGH data Human data: Not expected.

 

Non-human data*: Not expected, except for de novo sequence data (unless it is included with Level 2 aligned sequence files). Submission of de novo sequence data is expected no later than the time of initial publication.
Human data: NA.

 

 

Non-human data: No later than the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.
2 Data after an initial round of processing or computation to clean the data and assess basic quality measures DNA sequence alignments to a reference sequence or de novo assembly, RNA expression profiling Human data: After data cleaning and quality control, which is generally within 3 months after data were generated. Project specific.

 

 

Non-human data*: Data submission expected at the time of initial publication; an earlier submission date may be designated for certain data types or NIH projects.
Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.

 

 

Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.
3 Analysis to identify genetic variants, gene expression patterns, or other features of the dataset SNP or structural variant calls, genotypes, expression levels, epigenomic features Human data: After cleaning and quality control, which is generally within 3 months after data have been generated. Project specific.

 

 

Non-human data*: Data submission expected at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.
Human data: Up to 6 months after data submission is initiated or at the time of acceptance of initial publication, whichever occurs first.

 

 

Non-human data: Data released at the time of initial publication; an earlier release date may be designated for certain data types or NIH projects.
4 Final analysis that relates the genomic data to phenotype or other biological states Genotype-phenotype relationships, relationships of RNA expression or epigenomic patterns to biological state Human data: Data submitted as analyses are completed.

 

 

Non-human data*: Data submission expected at the time of initial publication.
Human data: Data released with publication.

 

 

Non-human data: Data released at the time of initial publication.

* Investigators should note the following NHGRI data release expectation for non-human genomic data that differs from the NIH expectation. Data sharing plans for NHGRI-funded or -supported projects to generate non-human genomic data proposed after January 25, 2016, should include pre-publication timelines for data submission and release consistent with NIH GDS PolicyPDF file expectations for human genomic data (including a possible holding period before data release not to exceed six months).
 

Posted: October 3, 2017