Revised White Paper #4

The Future of Genome Sequencing

A white paper for the National Human Genome Research Institute

Mark Guyer and Adam Felsenfeld

Comments (below)

Workshop Report: The Future of DNA Sequencing at the National Human Genome Research Institute
March 23-24, 2009
This workshop addressed the questions raised in the planning white paper The Future of Genome Sequencing. The report is not the only set of conclusions that we expect from the planning process regarding the future of the large-scale sequencing program, and we welcome further comments on the white paper and/or responses to the report.

Developments in DNA sequencing technologies over the past two decades have been a critically important driver of breath-taking advances in our understanding of broad areas of biology and biomedicine, ranging from human disease to microbial ecology to evolution. During this time, and especially in the past few years, sequencing costs have decreased faster than Moore's Law and sequencing capacity has increased at an ever-greater pace. These advances have enabled the landmark accomplishments of genomics - the determination of the genome sequences of the major model organisms used in biomedical research (bacteria, yeast, roundworms, fruit flies, and mouse) and, ultimately, that of human. Building on those fundamental data sets, DNA sequencing has been applied on a large scale to learn more about human variation (e.g., HapMap and 1000 Genomes), the functional composition of genomes (e.g., ENCODE and modENCODE), and the genetic basis of many human diseases and other traits.

In the past few years, a new generation of DNA sequencing platforms based on fundamentally different methodologies has collectively become a 'disruptive technology' that will create a new set of research opportunities for the coming decade. The availability of these so-called 'next-gen' sequencing technologies (and the already emerging 'third-generation' technologies that will closely follow) raises many questions for the research community. NHGRI has identified several questions that are pertinent to the future of DNA sequencing and of genomics. These are listed below, and the community is invited to offer comments and opinions on them, as well as suggesting other questions that would be important for NHGRI to consider.

I. What will be the consequences for biomedical and biological research of the rapid increase in DNA sequencing capacity and the rapid decrease in the cost of DNA sequencing that the new technologies afford?

A. What will be the consequences of 'next-gen' and then 'third-gen' sequencing technologies for biomedical research and for medicine?

What are the most scientifically important questions that 'next-gen' sequencing technologies can be used to answer?
What will investigators use de novo genome sequencing approaches for?
What will investigators use resequencing approaches for?
What will investigators use metagenomic sequencing approaches for?
How will investigators use these technologies to address questions beyond those relating to the structure of genomes?
What new community resources are needed that can be generated using 'next-gen' sequencing technologies?
What is the scale of sequencing that will be required to address these various questions?

B. What will be the consequences of using these new sequence technologies in terms of data analysis and informatics, including infrastructure, software development, data storage, etc.?

What new informatics needs (e.g., hardware and general infrastructure) and analysis needs (e.g., software) will be created by generating large amounts of data with these new technologies?
How should the informatics needs for dealing with sequence data be addressed?
Does the greater scientific community need access to primary sequence data, such as currently provided in the Trace Archive and Short-Read Archive at NCBI and comparable sites at other nucleotide sequence databases? Or is processed data sufficient to meet the needs of most users?
What is NHGRI's role in promoting data standards that allow interoperability and data integration in sequencing studies and between sequencing and other kinds of studies?
What are the security needs for this data, which promote discovery while protecting rights of study participants?

C. What will be the consequences of the new sequencing technologies in terms of the way sequencing efforts should be organized?

In designing research programs, how should funding agencies anticipate and take into account the rapidly changing sequencing technologies?
How will the new sequencing technologies affect, in terms of both the generation of sequence data and the associated informatics requirements, the current NHGRI approach that is centered on a few very large, 'production-oriented' centers?
Are there important projects that only such large centers can accomplish? Should NHGRI's role in the support of sequencing activities be limited to such projects?
What are the consequences of the inevitable, widely dispersed use of new sequencing technologies? What are the merits of encouraging the 'de-centralization' of sequencing? How would your response be affected if de-centralization of sequencing meant that the support of sequencing would have to come from sources other than NHGRI?
Are there important projects that could not be done if sequencing were completely de-centralized?
What additional kinds of projects would be enabled if the new sequencing technologies were more widely disseminated to smaller-scale programs?
Would the biomedical research enterprise be better off overall if large-scale sequencing were carried out in a more distributed manner than it is now?
How important is immediate data release of large sequence data sets to the research community?
As sequencing becomes more distributed, is it likely that large data sets generated in individual laboratories will be released prior to publication? Will they be released at the time of (or after) publication?
What will be the consequences for informatics and computational biology of widespread dissemination of the new sequencing technologies?
What 'associated' issues (e.g., sample supply and data standards) need to be addressed in both the centralized and decentralized scenarios?

II. What is the value of continued development of sequencing technologies and further reduction in the cost and increase in the throughput of sequencing platforms?

What is the value for basic research, clinical research, and healthcare delivery?
What are the merits and consequences of having new sequencing technologies introduced directly to the research community, as compared to introducing them first into large centers prior to general distribution?

III. How will the data generated by the new sequencing technologies, and the technologies themselves, be used to improve human health?

How will sequencing and sequence data be utilized to address human health issues?
How should funding agencies encourage the application of sequencing and sequence data to the study of human health issues?
What are the consequences for clinical data standards and security? How should policies across different parts of the government (e.g. NIH and FDA) be integrated?
How will the existence and use of these data be regulated at the national and state levels?
What will be the challenges for avoiding healthcare disparities?
How will the existence of very large-scale sequence data interact with intellectual property laws? What legal framework will maximize the clinical application of discoveries?
How will clinicians handle sequence-based information and communicate it effectively to their patients?

------- Comments -------

The biggest health problem in the United States and the entire developed world is now infertility. It has been decades since the developed world could produce enough babies to survive. A study done in Iceland (An Association Between Kinship And Fertility of Human Couples. Agnar Heigason, Snaebjoern Palsson, Daniel F. Guobjartsson, Pordur Kristjansson and Karl Stefanson, SCIENCE vol 329 8 February 2008) has shown unequivocally that the major determinate of fertility is the kinship between members of a couple. A glance at their results shows that the primary problem right now is that in the developed world almost no couples are sufficiently related to permit normal fertility. There is copious supporting evidence at nobabies.net but the study needs no support.

The findings of the study are not generally applicable since no other population has the genealogical data Iceland has. But the same information could be obtained by comparing the genomes of the prospective couples. All that is needed is a test of sufficient power to determine consanguinity out to sixth cousin within two standard deviations. I suspect that the issue is how closely non-coding repeats match, but that is not critical. What is critical is comparing consanguinity as measured by the test against proven fertility. Once that relationship is clear, then it will be possible to match donor with recipient for assisted pregnancy and even to council prospective couples about their expected fertility. It sounds weird, but it makes more sense than astrological signs.

(262) Thursday, May 7, 2009 8:58 AM

Last updated: March 19, 2012