NHGRI logo

NIH's new automated toolset detects disease-causing genes in undiagnosed patients

Researchers with the National Institutes of Health Undiagnosed Disease Program (UDP) have developed a powerful new toolset for finding potential disease-causing gene variants in undiagnosed patients. The work is automatically accomplished by computers - with no human interpretation or bias - and takes about three hours per exome to analyze an individual's protein-coding genes.

Solving Medical Mysteries

The UDP is part of the Undiagnosed Diseases Network, which brings together clinical and research experts to solve the most challenging medical mysteries using advanced technologies. The NIH study was published February 1 in Genomics in Medicine.

"One of our goals was to improve upon the 25 percent diagnosis rate for unknown diseases," said Thomas Markello, M.D., Ph.D., the study's corresponding author and clinical staff physician in the Office of the Clinical Director, National Human Genome Research Institute. "The toolset we developed significantly narrows down the number of genomic changes that could be responsible for an individual's undiagnosed disease, thereby increasing the likelihood of a diagnosis."

Artificial DNA artwork

The researchers compared the exome in 97 undiagnosed patients to their 113 healthy siblings using a menu of techniques to detect potentially harmful gene variants. Researchers found, on average, 6.6 faulty genes in people with undiagnosed diseases compared to 5.8 faulty genes in their healthy brothers and sisters. Researchers then investigated which of these potentially bad gene variants contributed to the patients' undiagnosed diseases. Laboratory researchers must still make major commitments to investigate the almost seven bad gene variants per patient, but it's an improvement over previous approaches yielding as many as 100 to 500 bad gene variants, Dr. Markello said.

The exome analysis included programs aimed at detecting gene variants specific to certain ethnicities and populations, rare disease genes and raw data error corrections.

Many of the potentially bad gene variants were located in regions of the genome that historically have been difficult to sequence and were not part of the original human reference genome. They found some of the gene variants in places in the genome where the DNA structure bunches up like cooked spaghetti, Dr. Markello said. Cells can get confused when they copy their own DNA in these places, so they are among the best places to look for gene variants that cause rare diseases, he said.

The researchers plan to sequence the entire genomes of 50 families using the automated toolset.  "We are very optimistic that we're going to find new candidates in the less-explored areas of exomes and genomes," Dr. Markello said. "We're leaving the low-hanging fruit days."

Last updated: February 4, 2019