Genomics' daunting challenge: Identifying variants that matter
While the latest genome sequencing technologies can generate detailed catalogs of genomic variants, researchers face an ongoing challenge of distinguishing variants that cause disease from those that do not. Scientists estimate that each person's genome contains between three and four million genomic variants, which are specific changes in DNA sequence.
Identify Variants that Affect Complex Diseases
"Deciding which genomic variants are important players in disease is probably the most difficult challenge that we face in trying to implement genomic data in medicine," said James Evans, M.D., Ph.D., Bryson Distinguished Professor of Genetics and Medicine at the University of North Carolina at Chapel Hill. "It's difficult to implicate specific variants as having an effect on disease because there are millions of variants in the human genome, and most are rare and do not have a big impact on health. This will likely be a long-term challenge."
All too often, clinicians and researchers studying a family with a rare disease encounter DNA mutations or variants that appear to be responsible - only to find other people with the same mutation who don't have the disease, or are affected to a lesser degree.
In a paper appearing April 24, 2014, in the journal Nature, authors recommended a set of genomic approaches to implicating rare, inherited variants involving one or a handful of genes that have large effects on an individual's risk for developing rare diseases. These same approaches might help researchers identify variants that affect complex diseases such as cancer and cardiovascular disease.
The stakes for identifying variants that matter can be high. "Mistakes are happening in the clinic based on questionable evidence of an association," said co-author Teri Manolio, M.D., Ph.D., director of the Division of Genomic Medicine at the National Human Genome Research Institute (NHGRI). "People are jumping to the conclusion that if a patient has the same variant as was previously implicated in a disease, then they must also have the same disease. Medical treatment decisions are then being based on this information, sometimes to the detriment of the patient."
Misinterpreting variants can potentially lead to misdiagnoses and unnecessary screening tests and treatments, including surgery. At the same time, individuals may not get needed diagnostic tests or therapies.
Based on the proceedings of a workshop convened by NHGRI in September 2012, the recommendations focus on several key areas, such as study design, gene- and variant-level implication, databases and implications for diagnosis. Gene-level implication refers to finding evidence that an alteration in a gene may cause or contribute to a disease. Variant-level implication - connecting a variant to a disease - is often more difficult to determine, since there are usually many variants for each gene.
Workshop participants recommended that a study should have a large enough sample size to be able to determine if a gene or variant is likely to affect a disease.
"Our discussion found two different axes to determine that a variant is implicated in disease," said Daniel MacArthur, M.D., Ph.D., first author and an assistant professor of medicine at Massachusetts General Hospital in Boston. "First, there needs to be some statistical evidence that a gene is associated with the disease process. Second, in that particular patient for that specific variant, how much evidence is there that the variant caused disease?"
Dr. MacArthur added that scientists who design studies need to think about the underlying genetic basis of a disease. "Will it be driven by common or rare variations?" he said.
The paper stressed the need to carefully weigh evidence in analyzing genes and variants for their potential roles in disease. Discovering new mutations in the same gene in a group of people doesn't necessarily mean they cause a condition. For example, several recent studies in 945 families, each with a child with autism, found four mutations in an extremely large gene called TTN. Yet, the investigators didn't think mutations in the gene contributed to autism. Using a statistical model that took into account the size of the gene, mutation rate and other factors, the researchers determined that the mutations could have occurred by chance.
"These guidelines will be useful for the research community as a starting point in addressing the challenges of implicating variants in disease," said Leslie Biesecker, M.D., co-author and chief of NHGRI's Medical Genomics and Metabolic Genetics Branch. "The peer review community can use these guidelines to set benchmarks for the kinds of proof they want in papers that claim an association of a gene or variant with a disease."
Difficulty implicating variants
The authors noted that the growing availability of individual genome sequence information may be open to more misinterpretation than ever before because of the large number of genomic mutations and variants present in each individual. On average, there are several million differences in each person from what is considered the "reference genome sequence," according to Dr. Manolio. While most turn out to have few effects, many "may suggest a potentially convincing story about how the variant may influence the trait," she said.
Finding and associating variants with disease is not the problem, Dr. Biesecker said. "The disease gene identification business works pretty well, and we have the tools to reliably identify genomic variants," he added. One such tool - the genome-wide association study (GWAS), in which researchers compare variants found in the genomes of people with a disease to those without the disease - is useful in establishing associations. But they often leave a large gap between an association and understanding what causes disease.
Oncology researchers know, for example, that alterations in the MSH2 gene result in a five-fold increased risk for colon cancer. But they hesitate when asked which variations in that gene are enough to cause disease, Dr. Evans pointed out. Co-author David Valle, M.D., Ph.D., director of the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University in Baltimore, puts variants into three broad categories. At one end sits those that are harmful and directly cause disease, such as the gene mutation responsible for cystic fibrosis. At the other extreme are the benign variants.
"Then we have a group in the middle and those we are not sure about," he said. "Some are risk variants. They are not absolutely pathologic and their effect on health depends on the context in which they occur, meaning the presence of other genetic variants and the genetic and environmental background of the individual."
"Studies that conclude that rare variants in aggregate in a gene are associated with this disease almost always get it right," said Dr. Biesecker. "But trying to determine which of 20 variants in a gene are robustly associated with a disease is a challenge. If we want to do predictive medicine, we'll want to do it at the variant level."
New analytical, predictive modeling methods and high-throughput DNA sequencing technologies are needed to improve the assessment of rare variants, he said.
The authors were careful to differentiate between genes that cause a disease and variants that are thought to contribute to a disease. Causality, they noted, was difficult to define. Few rare disease genes are fully expressed, or expressed the same way, in everyone. In fact, the authors preferred to talk about "pathogenic" instead of disease-causing variants, to denote that while a gene or a variant may be sufficient to cause disease, it may also only play a supporting role.
"We were limited in laying out rigorous guidelines and specific definitions on what is considered pathogenic for genomic variants," Dr. MacArthur said. "We don't know enough about diseases and variants yet. We called for more rigorous statistical approaches when examining variant causality, and focused on a set of general principles that we regard as being important to consider. This is not a problem that can be solved easily."