The goal of sequencing a human genome for $1000 is well within reach, but that's just the beginning of the story. Once a genome is sequenced, researchers are left with the formidable challenge of analyzing and interpreting its embedded code — a complex task that requires sophisticated data analysis tools.
With increasingly dropping DNA sequencing costs, more and more researchers are generating large amounts of genome sequence data. With such data in hand, researchers then need to use many data analysis tools to detect genetic patterns underlying various common diseases, to diagnose diseases and to individualize treatments.
As one component of its new Genome Sequencing Program announced in December 2011, the National Human Genome Research Institute (NHGRI) has awarded six researchers approximately $4 million in fiscal year (FY) 2012 to create robust, well-documented and well-supported computer software programs for analyzing genome sequence data that can be readily adopted outside of large genome sequencing centers. Many sequence analysis tools have been developed and are publically available, but their use is often limited by the lack of experts who can install and use the tools.
"The forthcoming surge of genome sequence data will inevitably create an analysis bottleneck unless computational tools can be developed for easy access and use in finding the biological information encoded in our genomes," said NHGRI Director Eric D. Green, M.D., Ph.D. "The goal is to develop user-friendly informatics tools for accelerating the use of genome sequence information in basic and clinical research."
NHGRI plans to invest a total of nearly $20 million over the next four years to make existing computational tools more generally accessible and to speed up the ability of investigators to analyze genome sequence data. The teams of project researchers, funded through cooperative agreements, are part of a newly established iSeqTools network.
Boston College and University of Michigan, Ann Arbor
Principal Investigators: Gabor Marth, D.Sc. and Gonçalo Abecasis, Ph.D.
$1 million to produce robust software tools and workflows for variant identification and functional assessment
University of Southern California, Los Angeles
Principal Investigators: Ting Chen, Ph.D., Ewa Deelman, Ph.D., James Knowles, M.D., Ph.D.
$345,000 to produce robust and portable workflow-based tools for mRNA and genome resequencing
The Broad Institute, Cambridge, Mass.
Principal Investigator: Mark DePristo, Ph.D.
$1 million for GATK (Genome Analysis Toolkit) informatics toolkits for high-throughput sequence data analysis
Washington University in St. Louis
Principal Investigators: Li Ding, Ph.D., and David Dooling, Ph.D.
$805,000 to produce robust toolkits and the GeMS turnkey computational framework for high-throughput variant discovery and interpretation
Harvard Medical School, Boston
Principal Investigator: Steven McCarroll, Ph.D.
$448,000 to produce accurate genome structural variation analysis with Genome STRiP using large-scale sequence data
Scripps Translational Science Institute, La Jolla, Calif.
Principal Investigator: Ali Torkamani, Ph.D.
$382,000 to develop the Scripps Genome ADVISER: annotation and distributed variant interpretation server
"We are committed to getting reliable informatics tools into the hands of researchers who are doing biological experiments so that the analysis of large amounts of sequence data do not become a rate-limiting step," said Heidi Sofia, Ph.D., NHGRI program director for computational biology. "The funded investigators will work together to help overcome barriers that researchers face when working with next-generation genome sequence data."
All of the tools developed by this program will be made publicly and freely available.
Last Updated: May 31, 2012