Reaffirmation and Extension of NHGRI Rapid Data Release Policies

National Human Genome Research Institute

National Institutes of Health
U.S. Department of Health and Human Services


Reaffirmation and Extension of NHGRI Rapid Data Release Policies:
Large-scale Sequencing and Other Community Resource Projects

February 2003

Background

A guiding principle of the Human Genome Project has been that the data and resources it has generated should rapidly be made available to the entire scientific community. In practical terms, this has involved the release of data and materials prior to publication, i.e. much more rapidly than is traditional in the scientific community.

This restriction attracted little attention until early 2002, when a community debate began about the merits of any limitation on the use of whole genome assemblies that have been submitted to the public sequence databanks (GenBank, EMBL and DDBJ). To discuss the issue and try to resolve their differences, the Wellcome Trust convened an international group of data producers, users, database personnel, journal editors and funding agency representatives in Fort Lauderdale, Fla. in January 2003.

The meeting attendees unanimously agreed that pre-publication release of large-scale genome sequence data has been of tremendous benefit to the scientific research community, and that it is very important to ensure that such rapid release of sequence data continues. The group therefore reaffirmed the Bermuda Principles and recommended that they be extended to all types of sequence data.

Furthermore, the attendees at the meeting recognized that other large efforts, designated as "community resource projects," would increasingly be generating data and other resources that should also be rapidly released to the community in an unrestricted manner (a "community resource project" was defined as a research project specifically devised and implemented to create a set of data, reagents or other material whose primary utility will be as a resource for the broad scientific community). To ensure the continuing effectiveness of the system of rapid, pre-publication release of data from community resource projects, the meeting attendees concluded that each of the three stakeholders in the system - data producers, data users and funding agencies - has an active role to play in promulgating this tradition of open and rapid data release.

In response to the recommendations of the Fort Lauderdale meeting, the NHGRI is proposing to modify its data release policy to implement the system of "tripartite responsibility."

Proposed Update of the NHGRI Policy for the Release of Large-Scale Genomic DNA Sequence Data

The NHGRI reaffirms and extends its commitment to the Bermuda Principles for all types of large-scale DNA sequence data sets, including those that were not considered when the Bermuda Principles were originally devised.

Specifically:

The deposited data should be available for all to use without restriction.

The NHGRI recognizes that the successful maintenance of the system of rapid, unrestricted, pre-publication data release requires constructive behavior on the part of both sequence producers and users. Sequence producers are in a unique, central position. The community is dependent on the success of their efforts and they typically face relatively little direct competition. However, it is not possible to guarantee them the standard scientific incentive of publishing the initial analysis of the data they generate without applying restrictions that might inhibit the broadest possible use of the data by the scientific community. Accordingly, the sequence producers must recognize that even if the sequence data are occasionally used in ways that violate normal standards of scientific etiquette, unconditional release of sequence data from large-scale sequence production centers is a necessary risk set against the considerable benefits of immediate data release.

Sequence users, in turn, must accept that they have significant responsibilities consistent with standard scientific norms. Users of unpublished genomic sequence data are expected to acknowledge the source of the sequence data through the use of appropriate citations. Users also need to recognize that the sequence producers have a legitimate interest in publishing peer-reviewed reports describing and analyzing the sequence that they have produced. Data depositions in the public sequence databanks are not the equivalent of such publications. The entire scientific community can also help ensure that the system works fairly for all participants through the peer review systems of both journals and funding agencies.

The NHGRI will encourage the sequence producers to publish a project description for each new genome sequencing project, beginning with new projects initiated in 2003. The purpose of the project description, which will be a new type of scientific publication, is to inform the scientific community about the sequencing project at its inception, and to provide a citation that can be used to reference the source of the sequence. A project description should describe the plans for and scope of the project, as well as any analyses that the sequence producer intends to undertake. It should also include a timeline for sequence production goals and data release. However, the NHGRI does not consider the project description to be the equivalent of the first peer-reviewed published analysis of the results of the sequencing project.

NHGRI strongly encourages the entire scientific community to recognize that the continued success of the system of pre-publication data release requires active community-wide support. There should be no restrictions on the use of the genomic sequence data, but the best interests of the community are served when all act responsibly to promote the highest standards of respect for the scientific contribution of others.

Other Community Resource Projects

Large resource data sets are becoming an increasingly critical component of biomedical and biological research and, as such, will be more frequently produced specifically as community resources. NHGRI will encourage, as an integral component of the development of the new community resources it will support, planners and participants to devise appropriate approaches to implement the principle and achieve the advantages of rapid pre-publication data release. While addressing important considerations as data quality standards, data storage and dissemination modes, protection from parasitic intellectual property claims, and producer and user interests, the development of effective means to achieve the objectives of the community resource concept will maximize the benefit to the entire scientific community and to research.

Top of page

Last Reviewed: October 1, 2012