NHGRI logo

The Beacon Project works towards privacy protections

Data sharing is key to the success of medical research, and participants in medical research are often those most eager to see their donation of data put to use.

Advances in Privacy Technology

However, getting the right balance between protecting sensitive genomic information and making it useful to researchers has been a challenge. Recent advances in privacy technology have begun to make a difference in this equation. A paper published in the Journal of the American Medical Informatics Association compares three practical strategies for reducing the risk of re-identification - the process by which anonymized personal genomic data are matched with the true owner.

The paper is the result of a project called The Beacon Project, initiated by the Global Alliance for Genomics & Health (GA4GH) and funded in part by the National Human Genome Research Institute (NHGRI). The Beacon Project tests the willingness of international sites to share genomic data. The data is stored on servers, known as beacons, at institutions participating in the project, and users can query the beacons for genomic information stored in that beacon. Essentially, the system allows a user to ask whether a specific nucleotide (an A, T, C or G) exists at a particular chromosome location in any genome in a given beacon, but keeps all other sequence data concealed. This would allow a clinician to check whether a patient's mutation had been discovered in other patients without needing access to those other patients' genomes.

Data servers for genomic data

In 2015, Beacon Project researchers at Stanford University showed that privacy can be compromised by repeated queries for variants present in an individual's genome within a beacon. Although tens of thousands of queries may be needed, similar to playing many thousand rounds of Twenty Questions, computers can be programmed to enable such an attack. Potential re-identification of a research participant, as well as identification of their family members, was taken as a real threat.

The newest publication details three strategies to reduce the risks of re-identification using advances in privacy technology that are based on cryptography, the mathematics of information. The study authors evaluated how well these strategies worked using real data and a variety of scenarios.

Work like this is important as the human genetics community needs protocols that enable secure sharing of genomic data from participants in genetic research, and research participants need to feel secure that their information will not be shared without their consent.

Last updated: May 10, 2017