NHGRI logo

Frequently Asked Questions regarding the Update to NIH Management of Genomic Summary Results Access.

What are Genomic Summary Results?

Genomic summary results (GSR) are the output of analyses of genomic data across the many individual participants included within a specific study’s dataset or across many studies. For most studies in NIH-designated data repositories, for example, this means that GSR represent a summary of the information generated from hundreds, or thousands, of research participants. There are two broad classes of GSR information: allele frequency information1 and association analysis statistics2.

  1. An allele frequency is the proportion of a specific allele, or variation in the DNA code, relative to other possible alleles at the same position in the code in a given population, or in some cases, an entire species. Allele frequency information is used in the fields of Genomics, Population Genetics, and Clinical Genetics to help interpret the potential for links between the presence of specific alleles and observed “outcomes”, such as physical traits or disease risks.
     
  2. In genomics, association analysis statistics are the information generated when investigators evaluate the correlation of genotype to phenotype. Phenotypes studied may be diseases (e.g., diabetes), traits (e.g., height), or molecular traits (e.g., mRNA or protein expression levels). Examples of these kinds of statistics are: p-values, beta values in regression, the odds ratio, and effect size.
  • What are Genomic Summary Results?

    Genomic summary results (GSR) are the output of analyses of genomic data across the many individual participants included within a specific study’s dataset or across many studies. For most studies in NIH-designated data repositories, for example, this means that GSR represent a summary of the information generated from hundreds, or thousands, of research participants. There are two broad classes of GSR information: allele frequency information1 and association analysis statistics2.

    1. An allele frequency is the proportion of a specific allele, or variation in the DNA code, relative to other possible alleles at the same position in the code in a given population, or in some cases, an entire species. Allele frequency information is used in the fields of Genomics, Population Genetics, and Clinical Genetics to help interpret the potential for links between the presence of specific alleles and observed “outcomes”, such as physical traits or disease risks.
       
    2. In genomics, association analysis statistics are the information generated when investigators evaluate the correlation of genotype to phenotype. Phenotypes studied may be diseases (e.g., diabetes), traits (e.g., height), or molecular traits (e.g., mRNA or protein expression levels). Examples of these kinds of statistics are: p-values, beta values in regression, the odds ratio, and effect size.

How are GSR different from individual-level genomic data?

“Individual-level data” provide the specific DNA sequence for a single research participant and are usually only available through controlled-access pathways. The privacy risks for individual-level data are greater than those for GSR because they refer to the unique pattern in the DNA code of a single participant, rather than calculations about the patterns seen across a group of people.

  • How are GSR different from individual-level genomic data?

    “Individual-level data” provide the specific DNA sequence for a single research participant and are usually only available through controlled-access pathways. The privacy risks for individual-level data are greater than those for GSR because they refer to the unique pattern in the DNA code of a single participant, rather than calculations about the patterns seen across a group of people.

How are GSR shared or used?

Currently, some GSR are included by investigators in the manuscripts that they publish to share the key findings from their research studies with the scientific community.

After May 1, 2019, GSR from most studies that are shared through NIH-designated data repositories, such as the database of Genotypes and Phenotypes (dbGaP), will be shared through open access (unrestricted) pathways. This means that dbGaP, and other NIH-designated data repositories, may begin to share publicly more of the statistical findings for most of the studies hosted within the repository. This will allow more GSR to be used by the broader scientific community to promote scientific or health-related research or health. Investigators requesting access to individual-level data through controlled-access will continue to be able to share GSR calculations that they generate through their research for others to use (e.g., through a publication). However, if investigators wish to disseminate GSR more broadly (e.g., through an online resource), this should be described in a data access request, which will be reviewed by the Data Access Committee.

Accessibility of GSR is beneficial because these analyses can be used to assess the validity and potential significance of results seen in other studies. They can also be useful for assessing the frequency of an individual genomic variant in different populations and for interpreting the possible pathologic importance of specific genomic test results in patients. While publications only share a small number of GSR relevant to the specific research questions discussed, sharing the complete set of GSR across a dataset or many datasets creates the opportunity for the information to be used to answer many different research questions.

  • How are GSR shared or used?

    Currently, some GSR are included by investigators in the manuscripts that they publish to share the key findings from their research studies with the scientific community.

    After May 1, 2019, GSR from most studies that are shared through NIH-designated data repositories, such as the database of Genotypes and Phenotypes (dbGaP), will be shared through open access (unrestricted) pathways. This means that dbGaP, and other NIH-designated data repositories, may begin to share publicly more of the statistical findings for most of the studies hosted within the repository. This will allow more GSR to be used by the broader scientific community to promote scientific or health-related research or health. Investigators requesting access to individual-level data through controlled-access will continue to be able to share GSR calculations that they generate through their research for others to use (e.g., through a publication). However, if investigators wish to disseminate GSR more broadly (e.g., through an online resource), this should be described in a data access request, which will be reviewed by the Data Access Committee.

    Accessibility of GSR is beneficial because these analyses can be used to assess the validity and potential significance of results seen in other studies. They can also be useful for assessing the frequency of an individual genomic variant in different populations and for interpreting the possible pathologic importance of specific genomic test results in patients. While publications only share a small number of GSR relevant to the specific research questions discussed, sharing the complete set of GSR across a dataset or many datasets creates the opportunity for the information to be used to answer many different research questions.

Why did the NIH change the way it manages access to GSR?

NIH has considered the risks and benefits of access to GSR carefully since it was first described in 2008 that individuals could potentially be ‘re-identified’ through their use. Specifically, the agency held public workshops and solicited stakeholder comments through requests for information on the risks and benefits of different models of GSR access.

Public input over the years increasingly noted that the benefits of expanded access to GSR from most genomic studies outweighed the potential risks. Respondents highlighted the significant scientific value of GSR and the fact that there would be minimal risk to most participants if GSR were to be moved from controlled-access to an unrestricted access model. Based on this input, NIH changed the data access model for most GSR to make it more proportional to the risks for this type of information. However, because there are some studies where there might be additional privacy concerns, such as those that include populations from isolated geographic areas or with rare or stigmatizing traits, the access model includes a pathway for GSR from some studies to have additional protections.

  • Why did the NIH change the way it manages access to GSR?

    NIH has considered the risks and benefits of access to GSR carefully since it was first described in 2008 that individuals could potentially be ‘re-identified’ through their use. Specifically, the agency held public workshops and solicited stakeholder comments through requests for information on the risks and benefits of different models of GSR access.

    Public input over the years increasingly noted that the benefits of expanded access to GSR from most genomic studies outweighed the potential risks. Respondents highlighted the significant scientific value of GSR and the fact that there would be minimal risk to most participants if GSR were to be moved from controlled-access to an unrestricted access model. Based on this input, NIH changed the data access model for most GSR to make it more proportional to the risks for this type of information. However, because there are some studies where there might be additional privacy concerns, such as those that include populations from isolated geographic areas or with rare or stigmatizing traits, the access model includes a pathway for GSR from some studies to have additional protections.

What are the privacy risks associated with sharing GSR?

GSR can be used to determine whether an individual was in a particular group of a study (e.g., the disease group vs. the control group) but ONLY IF someone already has access to the research participant’s genomic information. While the risk is very low, it is possible that knowing that a person is part of group (e.g., a disease group) could potentially reveal sensitive information that was not already known from the individual-level genomic information itself.

It is possible that certain study populations may be more vulnerable to this privacy risk if they are from a small or isolated population or have a rare condition or trait. In other cases, the potential stigma of certain conditions or traits included in a study population may also increase privacy concerns.

  • What are the privacy risks associated with sharing GSR?

    GSR can be used to determine whether an individual was in a particular group of a study (e.g., the disease group vs. the control group) but ONLY IF someone already has access to the research participant’s genomic information. While the risk is very low, it is possible that knowing that a person is part of group (e.g., a disease group) could potentially reveal sensitive information that was not already known from the individual-level genomic information itself.

    It is possible that certain study populations may be more vulnerable to this privacy risk if they are from a small or isolated population or have a rare condition or trait. In other cases, the potential stigma of certain conditions or traits included in a study population may also increase privacy concerns.

What are the benefits of sharing GSR through open (unrestricted) access?

Sharing GSR through openly accessible mechanisms means that these summary findings can be used to address many different research questions or to inform the interpretation of clinical test results by health care providers. When GSR are available through unrestricted access, they also become easier to use for the development of new methods to interpret genomic information and its connection to phenotypes by a range of scientists from different fields. In addition, since GSR can be used to assess the validity or potential significance of results seen in other studies, the need to request individual-level data from a study will potentially decrease, thereby focusing access to that depth of data about individual participants to only those secondary studies that truly require it.

  • What are the benefits of sharing GSR through open (unrestricted) access?

    Sharing GSR through openly accessible mechanisms means that these summary findings can be used to address many different research questions or to inform the interpretation of clinical test results by health care providers. When GSR are available through unrestricted access, they also become easier to use for the development of new methods to interpret genomic information and its connection to phenotypes by a range of scientists from different fields. In addition, since GSR can be used to assess the validity or potential significance of results seen in other studies, the need to request individual-level data from a study will potentially decrease, thereby focusing access to that depth of data about individual participants to only those secondary studies that truly require it.

Who will calculate GSR and how?

Any GSR shared through dbGaP for each individual study will be generated by the study Principal Investigator(s) (PIs). NIH Institutes and Centers currently vary in what summary statistics they expect PIs to submit. dbGaP also plans to calculate allele frequencies across all non-sensitive datasets within the repository (displayed by population) and share the results through unrestricted access on dbSNP.

  • Who will calculate GSR and how?

    Any GSR shared through dbGaP for each individual study will be generated by the study Principal Investigator(s) (PIs). NIH Institutes and Centers currently vary in what summary statistics they expect PIs to submit. dbGaP also plans to calculate allele frequencies across all non-sensitive datasets within the repository (displayed by population) and share the results through unrestricted access on dbSNP.

What are the options under the NIH GDS Policy for sharing GSR?

If an institution determines there to be substantive individual privacy or group harm concerns for a particular study population, they may designate the study as “sensitive” when the data sharing plan and Institutional Certification for the study is submitted to NIH during the Just-in-Time process. If an institution designates GSR as “sensitive,” they will only be shared through controlled-access, in conjunction with and under the same terms of access and use as the individual-level data for the study.

For studies that are already submitted to or registered in an NIH-designated data repository, institutions had until May 1, 2019, to notify NIH if they 1) need additional time to consider if a study’s dataset should be designated as sensitive, or 2) that a sensitive designation has been made and GSR should not be made available through unrestricted access. If the institution did not contact NIH before May 1, 2019, GSR was moved to unrestricted access.

  • What are the options under the NIH GDS Policy for sharing GSR?

    If an institution determines there to be substantive individual privacy or group harm concerns for a particular study population, they may designate the study as “sensitive” when the data sharing plan and Institutional Certification for the study is submitted to NIH during the Just-in-Time process. If an institution designates GSR as “sensitive,” they will only be shared through controlled-access, in conjunction with and under the same terms of access and use as the individual-level data for the study.

    For studies that are already submitted to or registered in an NIH-designated data repository, institutions had until May 1, 2019, to notify NIH if they 1) need additional time to consider if a study’s dataset should be designated as sensitive, or 2) that a sensitive designation has been made and GSR should not be made available through unrestricted access. If the institution did not contact NIH before May 1, 2019, GSR was moved to unrestricted access.

Can approved data users share GSR that they derive from individual-level genomic data in NIH-designated data repositories?

For individual-level human genomic data in NIH-designated data repositories, which are usually only available through controlled-access, a data access request that is reviewed by a Data Access Committee (DAC) is always required. For non-sensitive datasets, data requesters can indicate plans to generate and disseminate GSR in their research use statement if they wish to post GSR more broadly than publication within the scientific literature as an intrinsic piece of evidence to support a study’s conclusions, and this may be approved by a DAC. Requestors do not need to indicate what specific GSR they plan to generate and disseminate.

For datasets that are designated as sensitive, DACs will not approve research use statements that indicate plans to disseminate GSR more broadly than publication within the scientific literature to support a study’s conclusions.

  • Can approved data users share GSR that they derive from individual-level genomic data in NIH-designated data repositories?

    For individual-level human genomic data in NIH-designated data repositories, which are usually only available through controlled-access, a data access request that is reviewed by a Data Access Committee (DAC) is always required. For non-sensitive datasets, data requesters can indicate plans to generate and disseminate GSR in their research use statement if they wish to post GSR more broadly than publication within the scientific literature as an intrinsic piece of evidence to support a study’s conclusions, and this may be approved by a DAC. Requestors do not need to indicate what specific GSR they plan to generate and disseminate.

    For datasets that are designated as sensitive, DACs will not approve research use statements that indicate plans to disseminate GSR more broadly than publication within the scientific literature to support a study’s conclusions.

Last updated: December 16, 2019