Developments in DNA sequencing technologies over the past two decades have been a critically important driver of breath-taking advances in our understanding of broad areas of biology and biomedicine, ranging from human disease to microbial ecology to evolution. During this time, and especially in the past few years, sequencing costs have decreased faster than Moore's Law and sequencing capacity has increased at an ever- greater pace. These advances have enabled the landmark accomplishments of genomics - the determination of the genome sequences of the major model organisms used in biomedical research (bacteria, yeast, roundworms, fruit flies, and mouse) and, ultimately, that of human. Building on those fundamental data sets, DNA sequencing has been applied on a large scale to learn more about human variation (e.g., HapMap and 1000 Genomes), the functional composition of genomes (e.g., ENCODE and modENCODE), and the genetic basis of many human diseases and other traits.
In the past few years, a new generation of DNA sequencing platforms based on fundamentally different methodologies has collectively become a 'disruptive technology' that will create a new set of research opportunities for the coming decade. The availability of these so-called 'next-gen' sequencing technologies (and the already emerging 'third-generation' technologies that will closely follow) raises many questions for the research community. NHGRI has identified a number of questions that are pertinent to the future of DNA sequencing and of genomics. These questions are listed below, and are open to the community for comment. We are particularly interested in input as to whether these are these the right questions and what other questions are important to consider?
I. What will be the consequences for biomedical and biological research of the rapid increase in DNA sequencing capacity and the rapid decrease in the cost of DNA sequencing that the new technologies afford?
A. What will be the consequences of 'next-gen' and then 'third-gen' sequencing technologies for biomedical research and for medicine?
What are the most scientifically important questions that 'next-gen' sequencing technologies can be used to answer?
What will investigators use de novo genome sequencing approaches for?
What will investigators use resequencing approaches for?
What will investigators use metagenomic sequencing approaches for?
How will investigators use these technologies to address questions beyond those relating to the structure of genomes?
What community resources are needed that can be generated using 'next-gen' sequencing technologies?
What is the scale of sequencing that will be required to address these various questions?
B. What will be the consequences of using these new sequence technologies in terms of data analysis and informatics, including infrastructure, software development, data storage, etc.?
What new informatics (e.g., hardware and general infrastructure) and analysis (e.g., software) needs will be created by generating large amounts of data with these new technologies?
How should the informatics needs for dealing with sequence data be addressed?
Does the greater scientific community need access to primary sequence data, such as currently provided in the Trace Archive and Short-Read Archive at NCBI and comparable sites at other nucleotide sequence databases? Or is processed data sufficient to meet the needs of most users?
C. What will be the consequences of the new sequencing technologies in terms of the way sequencing efforts should be organized?
In designing research programs, how should funding agencies anticipate and take into account the rapidly changing sequencing technologies?
How will the new sequencing technologies affect the current approach that is centered on a few very large, 'production-oriented' centers, in terms of both the generation of sequence data and the associated informatics requirements?
Are there important projects that only such large centers can accomplish?
What are the merits of encouraging the 'de-centralization' of sequencing?
What are the consequences of the inevitable, widely dispersed use of new sequencing technologies?
What additional kinds of projects would be enabled if the new sequencing technologies were more widely disseminated to smaller-scale programs?
Would the biomedical research enterprise be better off overall if large-scale sequencing was carried out in a more distributed manner than it is now?
How important is immediate data release of large sequence data sets to the research community?
As sequencing becomes more distributed, is it likely that large data sets generated in individual laboratories will be released prior to publication? Will they be released at the time of (or after) publication?
What will be the consequences for informatics and computational biology of widespread dissemination of the new sequencing technologies?
What 'associated' issues (e.g., sample supply and data standards) need to be addressed in both the centralized and decentralized scenarios?
II. What is the value of continued development of sequencing technologies and further reduction in the cost and increase in the throughput of sequencing platforms?
What is the value for basic research, clinical research, and healthcare delivery?
What are the merits and consequences of having new sequencing technologies introduced directly to the research community, as compared to introducing them to large centers prior to general distribution?
III. How will the data generated by the new sequencing technologies, and the technologies themselves, be used to improve human health?
How will sequencing and sequence data be utilized to address human health issues?
How should funding agencies encourage the application of sequencing and sequence data to the study of human health issues?