New technique promises more accurate genomes by sequencing families
Researchers at the National Institutes of Health (NIH) and the United States Department of Agriculture (USDA) have developed a new technique that will aid in a more accurate reconstruction of human genomes by determining the sections of the genome that come from each parent. This new technique, published in the journal Nature Biotechnology, will also allow researchers to identify further complexity within any type of genome - from plants to animals - and provide more precise reference genomes in researcher databases than are currently available.
The Trio Binning Method
Genome assembly computationally reconstructs a genome from the much smaller pieces of DNA that sequencing machines are able to read, much like putting together pieces of a jigsaw puzzle.
"In the current way of doing things, though, we're missing something," said Adam Phillippy, Ph.D., co-senior author on the study and head of the Genome Informatics Section at the National Human Genome Research Institute (NHGRI), part of NIH. "This is because we actually have two genomes in each one of our cells, one from our mom and one from our dad, which are known as haplotypes."
Human genomes have relatively few differences between them. This makes it difficult to tell the two parental haplotypes apart, so they are often mixed together into a single assembly. On the other hand, some animal genomes have the opposite problem and contain many differences. To avoid this, scientists assembling animal reference genomes have used inbred animals because their genomes are less diverse. Neither of these solutions to assembling genomes is ideal, because they miss the natural variation that exists in most genomes.
The researchers' goal was to design and test a better way to reconstruct the haplotypes and, by so doing, give a more accurate assembly of genomes overall. The "trio binning" method, developed by NHGRI researchers Sergey Koren, Ph.D., and Arang Rhie, Ph.D., was tested on a cross between two cattle breeds that not only looked very different but were also genetically distinct.
Working in close collaboration with University of Adelaide researchers, the Angus and Brahman cattle breeds were chosen to represent two subspecies of cattle that correspond to separate domestication events from thousands of years ago. The two subspecies were exposed to very different pressures in their environment through their history, resulting in distinct differences between the breeds that are reflected in their genomes. For example, the Angus breed evolved to produce a very high-quality beef product, while the Brahman breed, emerging from India, evolved to be tick and drought resistant, along with having a characteristic hump.
"In the old way of doing genome assembly, you wanted to use inbred animals," said Tim Smith, Ph.D., co-senior author and research chemist at the USDA's Agricultural Research Service. "Trio binning has completely turned that on its head, and for this method, it's better to use a cross of the most different genomes that you can find."
Dr. Phillippy explained this concept further using a photo "mash-up" of both parental cattle breeds to visually represent what happens at the genomic level. Most of the genome is shared by both parents. But in the parts of the offspring's genome where the two parents are different, those areas previously were blurred together.
"Because those sections are blended, you've now assembled an artificial genome that represents neither the offspring nor the parents," he said. "It's a very inaccurate representation of what's going on genetically."
Trio binning takes advantage of the newest generation of sequencing technology that can "read" much longer areas of the genome - as many as 20,000 bases at a time or more - compared to a few hundred bases in previous technology. The parents' genomes are first sequenced using high-accuracy short reads to determine which parts of their genomes are unique to each parent. The offspring's genome is then sequenced using much longer but more expensive reads. These reads are then sorted using shorter marker sequences based on which parent they were inherited from.
"For these cattle, about 92 percent of the markers sequences are shared by both parents," said Dr. Phillippy. "The remaining percent are unique to each parent, so anytime you see one of those markers you know which parent it's coming from. Knowing this, you can sort the offspring's reads by which parent they are from and then assemble both parental haplotypes separately."
This method doesn't just benefit the researchers working with cattle reference genomes, however. Trio binning can also contribute to the goal of using a person's unique DNA sequence in their clinical care, otherwise known as precision medicine. Typically, if a clinician is treating a patient with a suspected genetic disease, the clinician will order DNA sequencing for their patient to identify where in the genome a disease-causing variant may lie. However, current methods might miss the causative variant altogether if it exists on only one of the patient's haplotypes.
"If you're looking for a disease variant and the patient's genome has had their haplotypes blurred together, you might miss it," said Dr. Phillippy. "This new method will also help us build a more inclusive representation of human genome variation. By assembling more of these high-quality human haplotypes, we'll get a much clearer picture of what's missing in the reference databases. This will improve the accuracy of genetic tests."
Dr. Smith added that trio binning has an additional bonus for further studies. "Cattle are leading the way in terms of using genomes to better understand which agricultural traits, like higher milk production, are better passed down to new generations. It's pretty transformational work."
Last updated: October 22, 2018