William W. Lowrance, Ph.D., Project Leader
Genomic research is now being broadened to include complex population-based studies, and the results of medical sequencing projects are being assembled into databases. While all this holds great promise to further the understanding of health and disease, it also brings potential threats both to the privacy of the people whose genomes are being studied and to public trust in the burgeoning genomic research enterprise.
Concerned about the risks, the National Human Genome Research Institute (NHGRI) asked Dr. William Lowrance, a consultant on health research ethics and policy, to prepare a white paper (Privacy, Confidentiality, and Identifiability in Genomic Research) and co-chair with Dr. Francis Collins, the NHGRI director, a workshop involving an experienced group of scientists, NIH and other government officials and staff, ethicists, lawyers, lay advocates, and leaders of several large research projects, to discuss the issues.
Generally the analysis of the white paper was accepted - it should be considered part of this summary - and the following conclusions informally emerged.
Increasing amounts of highly detailed genomic data, often linked with health and other data, are being released in both research and non-research settings, and this trend will continue. Although the chances of identity disclosure and possible negative consequences for the data-subjects, researchers, or institutions are difficult to estimate (because they depend heavily on circumstances), probably they are small. But it is clear that access to more data by more diverse accessors for more varied purposes inevitably will increase the risks.
Matching: If identified or straightforwardly identifiable reference genotype data are available, matching can be performed with very high reliability.
Linking: When genomic data are associated with such clues as diagnosis, locale, health care or payment information, treatment dates, and so on, there is a possibility that the data can be searched against administrative or other identified data-sets and lead to identification of individuals.
Profiling/Describing: As the phenotypic manifestations of various genes become known, it will increasingly become possible to construct probabilistic descriptions of persons from genomic data. Already a small number of physical attributes and proxies for ethnicity are inferable; soon many chronic-disease susceptibilities will be; and before long some behavioral tendencies will be.
Limiting the amount of genomic information released from each sample.
Technically this is easy to do and is often done. Precautions can be taken to make sure that individual genotypes or separate sequence-reads from a sample cannot be reassembled into a dataset that might be unique to the individual. But releasing too few SNPs or too-short snippets of sequence may limit research usefulness.
Statistically degrading data before releasing.
Techniques such as micro-aggregating ("binning"), scrambling, and masking can be employed, and they may be acceptable for some analyses, but generally these degrade usefulness.
Sequestering identifiers via key-coding.
This is a pivotally important safeguard. It does not totally obviate the possibility of identifying via matching or profiling, but the latter can be much reduced by carefully removing strongly identifying data from data-sets before key-coding and releasing them.
Based on the preceding points, it is clear that although a lot of sequence and related data can still be made freely accessible, such as by being posted on publicly accessible websites, increasingly projects will have to manage access via controlled release arrangements in which, among other things, accessors commit to protecting privacy and confidentiality.
Freely open data-release is acceptable only if either: (a) consent to it is ethically and legally legitimate, and granted; or (b) the data are for all practical purposes non-identifiable.
Generally the experience with controlled release has been positive. But the scale and potential international accessibility of many new projects will test the robustness and enforceability of access arrangements.
Last Reviewed: March 13, 2012