![]() |
|
|
|
ENCODE Project Data Release Policy (2003-2007)Available now for public comment: Draft ENCODE Consortia Data Release Policy 2008-2009
|
| Top of page |
As recommended at the Ft. Lauderdale meeting for a community resource project, the ENCODE Consortium has published an initial manuscript, a so-called "marker paper", describing the goals of the project, its data release practices, and the publication policies that it intends to follow.
As noted, the main goal of the ENCODE pilot project is to compare the ability of a set of research methods to identify comprehensively all sequence-based functional elements in genomic DNA. Thus, the final product of the Consortium, which it intends to publish in a peer-reviewed journal, is planned to be an overall analysis of the different methods tested by the Consortium members, an annotated version of the full set of selected ENCODE target sequences, with all of the functional elements identified by the Project, and a recommendation for how to expand the ENCODE project to annotate the entire human genome. The Consortium expects to submit this manuscript or manuscripts for publication within six months of the end of the pilot project. In addition to group publication(s), all of the individual research groups in the ENCODE Consortium are free to publish the results of their own efforts in independent publications at any time. In these individual papers, Consortium participants will not be restricted to describing the methods developed for the project, but can and should expand into describing biological insights that arise from their analyses. To facilitate comparison of data between different groups involved in ENCODE, all publications by Consortium members should, when possible, include data on a common reference set of reagents agreed upon by the Consortium, e.g., a common cell line or a common antibody, as applicable.
Users of Consortium data, whether members of the Consortium or not, should be aware of the publication status of the data they use and treat them accordingly. For example, all investigators, including other Consortium members, should obtain the consent of the data producers before using unpublished data in their individual publications.
Consortium members will not have privileged access to data from other members of the Consortium. Rather, all data shared by the Consortium members will be obtained from the data that has been released to public databases.
Investigators outside of the ENCODE Consortium are free to use the ENCODE Consortium data, either en masse or specific subsets, but are asked to follow the guidelines developed at the Ft. Lauderdale meeting. Specifically, data users should cite the source of the data (referencing the initial ENCODE marker paper) and should acknowledge the data producers from the ENCODE Consortium. In addition, the data users are asked to recognize the interests of the data producers to publish reports on the generation and analysis of their data. The ENCODE data are released to public databases as pre-publication data and remain unpublished until they appear in peer-reviewed publications. Outside investigators who perform an in-depth analysis of data from the ENCODE Consortium and are interested in publishing a report before the data producers do so should discuss their results with the data producer(s) and are encouraged to establish collaborations. However, the ENCODE Consortium members are not required to collaborate with any outside investigators. All investigators, through their roles as journal and grant reviewers, should enforce a high standard of respect for the scientific contribution of the data producers.
This discussion of the ENCODE data release policy has been primarily directed at issues concerning the use of ENCODE data in scientific publications. The intent of the policy is to accelerate the use of the data by the scientific community. To facilitate this goal, the data producers agree not to restrict the use of the data by others while the data users are encouraged to act in a manner that is consistent with this unrestricted access policy. The associated issue of intellectual property as it pertains to the ENCODE data is addressed in Appendix B.
| Top of page |
All of the data generated by the ENCODE project will be linked to the human genome sequence. Data from the ENCODE Project that can be directly displayed on the human genome sequence will be stored and delivered by the University of California, Santa Cruz (UCSC) Genome Browser; other Project data will be stored and delivered by the appropriate databases to be coordinated by the NHGRI Genome Technology Branch. All ENCODE data must have the associated information on how the experiment was performed and how the raw data were analyzed to generate the conclusions (i.e., sequence elements) to be displayed. As data are deposited into public databases, individual tracks will be created to display these data on the UCSC Browser. Where applicable, the primary data underlying any sequence elements will be linked directly to the browser track. Participating labs are encouraged to submit their data rapidly even if they conflict with data from other groups. As additional data validations are performed, the investigators can modify the submitted data or even withdraw the data if further tests call into question the validity of the released data. All data will be accompanied by prominent caveats to notify users of the level of verification of the data and that frequent data release and updates will be forthcoming as further validation and analyses are performed.
| Top of page |
The Bayh-Dole Act of 1980 provides a statutory mandate to NIH grantees and contractors to seek patent protection, when appropriate, on inventions made using government funds and to license those inventions with the goal of promoting their utilization, commercialization and public accessibility. While the NHGRI has, in accordance with that law, encouraged grantees to seek patent protection for genomic technologies that have been developed with grant funds, the Institute has been concerned about the claims and exercises of those claims in the case of large-scale genomic data sets because of the Institute's belief that broad accessibility to the data is of paramount importance, and that such data are generally pre-competitive, i.e., a considerable amount of work would need to be performed beyond the initial data production to demonstrate utility. For genomic sequence data, for example, NHGRI indicated its opinion that raw data, in the absence of additional experimental biological information, lack demonstrated specific utility and therefore are inappropriate materials for patent filing. The grantees participating in the NHGRI large-scale sequencing program have been monitored for whether they filed patent claims and, to date, none have.
In the case of the HapMap Project, the participants (including the NHGRI grantees) agreed not to file for patents on the bulk data from the Project. However, there was a complication because the raw data produced by the Project (SNPs and individual genotypes) had to be processed to generate the Project's ultimate output (haplotypes). In considering the issue of data release, HapMap participants were concerned about the possibility that researchers outside of the Project could add some of their own data to the raw Project data, develop haplotypes prior to the Project's ability to do so, file patent claims based on the combined data, and then potentially restrict access by others to the HapMap data (a so-called parasitic patent). To deal with this concern, a click-wrap license was imposed on the individual genotype data; to gain access to the data, researchers were required to agree not to restrict the access of others to the data and not to share the data with anyone who has not agreed to the click-wrap license. In December of 2004, this click-wrap license restriction was lifted to allow HapMap data to be incorporated into other public genomic databases.
In some respects, the cases of genomic sequence data and haplotype data were relatively easy to deal with because the data themselves do not have "utility" (in the patent law sense of the term). As a result, grantees did not express concern about the NHGRI policies on data release. In the case of the ENCODE Project, however, the applicability of this argument is not as obvious. The ENCODE Consortium will include both members funded by NHGRI ENCODE grants and those funded by other sources. The purpose of the ENCODE Project is to generate data that identify or define genomic DNA sequence elements that have biological function, and therefore might be considered to have utility and be able to be patented. Therefore, the use of patents in ways that might restrict access to large amounts or broad categories of data, e.g., all transcription factor binding sites, is an issue that needs to be addressed.
NHGRI's primary interest is to ensure the widespread availability of all information and any inventions that are generated during the ENCODE Project. NHGRI, therefore, encourages all ENCODE data producers to consider placing all information generated from their project-related efforts in the public domain and to address the NIH guidelines [ott.od.nih.gov] on the sharing of research tools. In the cases in which the Consortium members elect to exercise their intellectual property rights, NHGRI encourages consideration of maximal use of non-exclusive licensing of patents to allow for broad access and stimulate the development of multiple products. As a criterion for joining the ENCODE Consortium, investigators have agreed to abide by the Project's data release policy.
NHGRI also encourages users of the ENCODE data to act responsibly and share the effort involved in maintaining unrestricted access to the data. Thus, for example, if a data user were to incorporate ENCODE data into an invention, the subsequent license should not restrict the access of others to the ENCODE data. For this purpose, the term "data users" is meant to include both researchers who are members of the ENCODE Consortium and researchers who are not.
The ENCODE pilot phase, during which time data corresponding to only 1% of the human genome will be produced, will provide NHGRI with an opportunity to observe data producer and data user practices with respect to intellectual property and the ENCODE Project. NHGRI grantees are reminded that the grantee institution is required to disclose each subject invention to the Federal Agency providing research funds within two months after the inventor discloses it in writing to grantee institution personnel responsible for patent matters. NHGRI will monitor grantee activity in this area to learn whether or not attempts are being made to patent large amounts of information derived from the ENCODE Project. If, in the future, circumstances arise that convince NIH that additional measures are needed to achieve the goal of widespread access to the results of the Project, the Institute reserves the right to consider a determination of exceptional circumstance to restrict or eliminate the right of parties, under future grants, to elect to retain title. Similarly, NHGRI will monitor the activity of data users to attempt to determine whether access to the ENCODE data is being encumbered by any restrictive licenses. If the policy of reliance on data user responsibility to maintain unrestricted data access is not effective, the NHGRI will consider adopting a click-wrap license similar to that used by the HapMap Project to protect the ENCODE data and to ensure unrestricted access to the use of this data.
| Top of page |
Last Updated: September 12, 2008
|
|
|