Comments on "White Paper #4: The Future of Genome Sequencing"
Several issues that should be considered in future genome sequencing plans:
1. What should be proposed/done is to a significant degree dependent on how fast and inexpensive sequencing becomes, and when such improvement would occur (admittedly something that can only be estimated).
Ideas that are of high merit but are currently impractical and/or too expensive now may be feasible when sequence technology improves further. A list of such projects could be put together so that they are ready to go once the technology is ready.
2. Extreme phenotypes: Assumming whole genome sequencing becomes much more practical (faster and cheaper), consideration should be given to looking at the genomes of extreme phenotypes. The idea of looking at "mutations" has been among the earliest strategies in genetics. The application of the strategy could be extended to humans as well as other organisms. By allowing the comparison of whole genomes (extreme vs. normals), the strategy holds the promise of identifying key genomic changes that underlie the extreme phenotypes.
3.Dynamic, complex, unfinished genomic regions:
These areas still lag behind in sequencing coverage/completeness and unless new sequencing methods will do a better job of accurately covering these regions they will remain excluded from most genomic comparisons. This if particularly damaging in that recent work is finding that such regions are key players in human diseases and evolutionary adaptation...i.e. among the most interesting in the genome. The nature of how, when and why these regions change is a critical, yet poorly addressed question. New sequencing approaches should keep these regions in mind and some support should be given to pursue such questions.
Will newer sequencing methods accurately detect and score highly duplicated regions? These are among the most active and biologically interesting regions of the genome, yet they are often misassembled by current sequencing approaches. Will newer sequencing approaches be able to correctly identify and assemble such sequences, and if not, are there other approaches that will be needed?
(136) Thursday, January 29, 2009 11:15 AM
It is our recommendation that the following be considered in your long-range planning process.
Clinical Data Standards and Security
How should structured genetic data be incorporated into the electronic medical record? What are the security needs for this data? What is the appropriate balance of patient privacy, optimization of clinical care and utilization of the cost reducing potential of Personalized Medicine?
Research Data Standards and Security
Within research, what are the structured data requirements for genetic studies that promote interoperability and data integration? What are the security needs for this data, which promote discovery while protecting rights of study participants?
State versus National Laws Regulating Access to Genetic Test Results
Should access to a patientýs genetic test results be regulated at the national or state level? If at the state level, should current state laws be reexamined in the context of protections offered by GINA? For states that restrict clinical genetic test results (for asymptomatic patients) to the ordering clinician, what is the impact on continuity of clinical care? Does this introduce clinical liability? What is the impact on healthcare disparity, as many patients may not be able to effectively convey this information to different clinicians involved in their care?
Visibility of FDA Pharmacogenomic Guidelines
Should FDA pharmacogenomic guidelines be available in standardized form, for potential integration into clinical decision support tools? If so, what are the data formats and processes for doing so?
Research, Clinical Adoption and Intellectual Property
What restrictions, if any, should the government place on patentability/intellectual property protection on the research it funds? What policies would maximize the clinical application of relevant discoveries?
Knowledge Management and Clinical Care
When new discoveries result in improved interpretations of genetic data, how are clinicians (and patients) informed? For example, how might a patient be informed that their previously reported ýMutation of Unknown Significanceý is now known to be causal to a preventable disease? What updates to models currently used to manage other clinical laboratory data are required to enable management of dynamic interpretations of genetic data? What gaps exist in these models? How should these gaps be filled?
NHGRI and Stake Holders
Where NHGRI might not be the responsible party for ensuring followthrough with findings, what is NHGRIýs plan for working with stake holder groups? How might the stake holders be engaged early in the process?
(145) Friday, January 30, 2009 1:08 PM
The elimination of one bottleneck in a process elicits another. So, in human genetics and epidemiology research, the elimination of the time and cost bottleneck in sequencing will elicit a bottleneck somewhere else. Where? And then what do we do? These are questions the community needs to consider.
Our evidence suggests that the new bottleneck is in "accrual". The full economic cost of obtaining and processing a human sample with rich phenotypic descriptions is high and rising. The economic and environmental cost of storage is ongoing and rising. Inevitably, the need for ever larger series or cohorts is growing as effect sizes diminish. These two tendencies will require innovation in and re-tooling and re-organisation of accrual.
Accrual is a social activity: researchers are obliged to interact with citizens. How much effort, in reality, do researchers make to get and keep citizens' trust? Do we have a clue how to do it in an age where the citizenry is connected and won't be spoken down to? It's true, implementing answers to these questions take us right outside our comfort zone. But what kind of researcher wants to stay within their comfort zone?
One "fix" of the accrual problem is to aggregate existing dispersed sets of samples and data. This work is being coordinated globally by P3G (www.p3gconsortium.org), especially for prospective studies. Its working groups are outlining solutions to problems of harmonising and standardising samples, data, national / state laws, study designs. These solutions - if widely implemented - will be invaluable to designing and initiating new globally comparative studies.
Two cost centres within accrual stand out: [a] the cost of accruing subject data [b] the cost of long term storage. For [a], much basic data exists within the health record: the challenge is to make the data accessible to bona fide researchers for experimental study design. This challenge requires legal and societal change globally: the computing issues are trivial by comparison. For [b], we need lower unit costs, either through cheaper energy (?) or reductions in its use. Reductions can be achieved via engineering and/or biochemical solutions.
If we have a target of the $1000 or $10 genome, then we now need to set targets for the fully annotated sample. Let's start with DNA. How about $10 for DNA + data - and annual aliquot storage costs of $0.001? The costs need to include energy, labour (outside and in the lab), rent, depreciation, servicing, transport etc. This $10.001 target is well over an order of magnitude less than current costs.
(155) Wednesday, February 4, 2009 4:30 AM
1. My principal concern is about transcriptome analysis. Historically, ever since the genome sequencing ramp-up of the late 1990s, through the HGP era and until the present, there has been entirely too much emphasis on genomic DNA sequencing and small-RNA sequencing, at the expense of the transcriptome. What are we missing? Example: A comparison of human and chimpanzee transcriptomes might elucidate interspecies alternative splicing differences in genes, which cannot be predicted from genomic sequence or from aligning human cDNAs to the chimp genome alone. We are rapidly amassing inferences, unsupported by experimental data, based on aligning human and mouse (well-sequenced) transcriptomes to nonhuman primate and other mammalian genomes. Instead of these unsupported inferences, we should be aligning species transcriptomes to the same species' genomes in order to gauge interspecies diversity of gene expression and RNA processing, a potential window into the origin of species-specific phenotypes and human uniqueness.
2. We need to move beyond the dismal-quality status quo (the "let's sequence as many 2x genomes as we can" status quo) and to fund significantly deeper-coverage WGS, or even BAC/PAC-based targeted resequencing, to fill poorly assembled regions in genome assemblies.
3. It is imperative to advance functional genomics, not just sequencing and array projects, within NIH / NHGRI and/or through targeted extramural funding programs. One example of an unsolved problem: There is an abundance of endogenous sense-antisense and non-protein-coding RNA outside of the well-characterized (rRNA, tRNA etc, microRNA) transcript classes. It is not enough just to say that these transcripts are there and that they are differentially expressed under condition X in cell type Y. We must have comprehensive genomewide information on cellular impacts of, for example, suppression and overexpression of all these transcripts. If we don't, we will never conclusively differentiate between the "all noncoding RNA is junk" hypothesis and its counterpart "noncoding RNA as a determinant of biological complexity" notion.
(157) Thursday, February 5, 2009 9:01 PM
I would like to respond to the following questions:
- What are the merits of encouraging the 'de-centralization' of sequencing?
- What are the consequences of the inevitable, widely dispersed use of new sequencing technologies?
I think it is extremely important that funding agencies promote the wide-spread use of the next generation sequencing technology. There is a tendency of new technology being monopolized by large consortia or centers -- at least initially -- which receive very large amounts of funding. Although I recognize that new developments may be advanced through collaboration networks, a lot of that work can be done quite as effectively, or more effectively by individual laboratories. Widely disseminated use of the technology will advance science by a low entering threshold into the field and an expanded knowledge base. This will increase creativity and competition.
The best ideas should be funded: Perhaps some of these will require large consortia or major infrastructure. However, most will be feasible for individual PIs with access to a Solexa machine and sufficient funding for consumables/machine time and bioinformatic support.
(158) Friday, February 6, 2009 3:37 AM
I would like to respond to the questions under point I. I think nextgen/thirdgen sequencing will allow traditional, non-genetic model organisms for which we have decades of cellular and developmental data to be enhanced as models through transcriptome and whole genome sequencing. I think these efforts will best be resolved via decentralization of sequencing.
(160) Monday, February 9, 2009 4:24 PM
My comments relate to agricultural animals i.e. primarily non-human and non-experimental vertebrates. These are often used to aid genome human and model species genome annotation.
It seems to me that we will move to having a few large sequencing centres, but also many many more "small" next gen sequencing centres. This proliferation of sequencing centres and amount of data produced raises issues about data release, storage and access. It would be appropriate for questions in 1B to also refer not only to what is a available (full or summary data) but also range of species covered. My guess is an explicit 2 tiered system at NCBI may be required: summary data only with fully available tools as at present and a raw data archive that is only searchable for certain annotation fields describing the study (depositor, species...). What is required though IS a central database to host both.
As more and more genomes are sequenced within a species (and these numbers will rapidly increase in agricultural species). The "associated issues" of sample identification become very relevant. Often these animals will have existing identification systems but submitters may be reluctant to supply these identifiers for commercial reasons, but minimal information to identify a unique individual is still required: species, age, sex, strain, approximate birth location, sequencing centre and their anonymous unique ID. It is a key future need to identify unique individuals and the current NCBI information is often lacking. I suggest a question that fully explores that issue for all species.
(163) Thursday, February 12, 2009 5:30 AM
An important issue that is not discussed is the interpretation of genome wide association tests. Technological advances have increased the yield of genetic tests by orders of magnitude. Our techniques for separating true positive from chance findings have not kept pace. Statistical and other techniques are needed to interpret the results of large scale tests. This could be added to Paper 4, Section I. B.
(189) Tuesday, February 24, 2009 5:38 PM
Overall, the questions contained in Paper 4 adequately address the future of the science and technology of genome sequencing. The primary issue area not addressed in this overarching paper is the social implications of these advances, which play a significant role in its application. These must be integrated into a future of genome sequencing strategy, and not considered outside of how this field will evolve. Questions from papers 1 and 3 that explore the impact of genome sequencing on individuals and the healthcare system should be included in this strategy as well.
(195) Friday, February 27, 2009 6:04 PM
As full genome sequencing technology becomes more available and affordable for translation into healthcare delivery, we suggest that it is important to ask questions regarding sequence patenting, sequence-dependent proprietary applications, and the potential for a fractured and protected knowledge base that may impede the development and delivery of important and/or efficient healthcare applications. Do safeguards currently exist or they needed to ensure that this does not become a major barrier?
(206) Monday, March 2, 2009 2:54 PM
A Q1. Sequencing will be use to answer questions related the molecular basis of understanding phenotypes from genotypes. Toward this goal developing integrated approaches to transcription profiling, promoter mapping, DNA methylation, and genotyping will be essential.
Q2. Microbiology and agriculture are two significant areas - with respect to human health understanding host pathogen relationships will be extremely important. Sequencing will also be used heavily to understand viral epidemiology and how integrated viruses affect gene expression. Metagenomics will play a role, but will be most important in environmental studies.
Q3. Understanding the relationship between genotype disease and health. Cancer genomics will be significant focus.
Q5. Gene expression, DNA transcription, and epigenomics.
Q6. Good databases and reference datasets for comparisons - these need to highly currated and capture the state-of-the-art knowledge which will be accumulating at a rapid pace.
Q7. Whole genome will need the efforts of single molecule, long read length sequencing. In viral sequencing this will be essential for quasi-species identification where linkage between alleles is required.
B. Q1. 1. A sound body of open-source computational algorithms that are standardized and vetted by the scientific community is essential to make results comparable between groups. 2. NIH and research groups need to look beyond constantly building everything - this is an extremely costly endeavor and Internet-based infrastructures have advanced to the point were it is realistic to provide many services through "cloud" computing technology.
Q2. We are at a point where the cost of informatics is becoming greater than the cost of collecting data. It is time for a commercial ecosystem to develop to capitalize and lower infrastructure costs - much like instruments have done. Thus NIH should take a more active roll in learning what is available and encouraging grantees to look to multiple solutions to solve informatics problems.
Q3. The trace and short read archives are useless from the perspective to researchers using these resources to do science. From my analysis this was initially a good idea, that now only serves the purpose of few genome centers. Finding and retrieving are significant limitations. Furthermore, while individual case studies can be presented to demonstrate the utility of the data for "rescuing" information by reanalysis, I believe the long-term value, with projected accumulations, is not cost justified. An analysis of NCBI's projections for data storage vs service costs would be worthwhile.
C. Q1. There should an understanding that technologies will be changing, but it is difficult to anticipate out too far. The current technologies are extremely effecttive for many kinds of quantitive assays that utilize DNA sequences as unit measures.
Q2. If not handled well, the large center approach will move from advancing research to stifling science. This is standard scenario in any kind of wealth concentration that can lead to monopolies.
Q3. Genome centers are most effective for large resource generating initiatives where the value is their focus on large projects - HGP, 1000 genomes ...
Q4. In my experience de-centralized efforts should be encouraged. Institutes, like cancer centers, medical schools, and so on, have the advantage that experimental design and data collection are close to the experts who understand the samples and questions.
Q5. Innovation - when a large number of individuals have access, new ideas will emerge. This has been proven throughout history.
Q6. Rare alleles and linkage to phenotype is one example, for this small cohorts of samples are needed like the stocks that exist in many medial schools throughout the country.
Q8. Data release is essential - however, the challenge is not in the release of the data but how others will access it and be able to use it. In this respect, much more needs to be done to communicate the value of and how to make use of the data.
Q9. It's likely the data will mostly sit where is it collected. As part of publication it may get deposited, and will be mostly unused for a variety of reasons. Other groups won't know it exists, they won't have resources to work with it, the data themselves will incomplete in terms of describing experiments. This is an ongoing issue with microarray data.
Q10. The consequences will be that researches need access to better software and computational infrastructures. Today we have a lot of disconnected technologies, but the transitioning to user friendly complete systems with integrated components will require a stronger commercial ecosystem than exists today. The current state is inadequate for the clinic.
The human health impact of sequencing is significant. The above questions have a "genomics/discovery focus, but the next waves of sequencing will focus increasingly on diagnostic applications.
(212) Saturday, March 7, 2009 7:07 PM
These are the right questions regarding the future of genome sequencing. There are a number of key convergence drivers impacting genomic sequencing also to consider such as the future of personalized medicine, consumer genomic informatics, nano-biology and synthetic biology to be factored in as well to this strategic planning process.
How and why we collect, manage and mine the data from sequencing is the real value of this effort. We need to explore new ways to leverage off of social networking and research collaborations that will encourage process innovations from beyond the large institutions, we need to inspire a highly innovative era of Sequencing 2.0.
We should consider what the future of the Post-Genomic Society looks like on the other side of the next ten years and then work backwards to consider strategic planning, funding and policy to shape a desired future. If not, we run the risk of genomic developments happening faster then we can anticipate, plan for and control. Too much control in the hands of the large institutions will be no worse then no control over a highly distributed open source model. We need the right balance of players--large and small, all focused on the same objectives.
There will be radical outcomes of an inexpensive highly pervasive distributed genomic sequencing information infrastructure that will and should transform health care, medicine and influence human evolution. This impact will be beyond the disruptive innovations we see today such as the development of a personal predictive health forecast. Society, medicine and policy makers in government are not ready for this extreme future that is coming fast.
Consumers will want to know their Predictive Health Future Outcome. The policy implications of this social, economic and scientific impact will be comprehensive--sequencing is just the first stage of this transformation in either consumer empowerment or confusion.
The future impact of high scale inexpensive sequencing will likely create a new consumer awareness of the value of personal genomics. How long will I live? what will be the diseases I may be at risk for? What can I do?
Also, the practice of medicine and health care delivery, from early genomic detection to wide distributed data-rich real-time sequencing should be planned for now. Policies and funding to create new medical education and treatment models that will change the paradigm of health care should be researched now. Sequencing opens slowing a door of transparency, but an imperfect future of predictive and preventive health care in the short term. Managing expectations will be important from a policy perspective.
Future scenarios may be problematic and defeat consumer expectations if the control of genomic info via sequencing is not distributed and controlled with a scientifically sound and rationale plan, given this information today. It is likely that the link between genomic personal data and medicine, i.e. how disease can be prevented or disease treated will continue to lag our capacity to turn genomic data into treatment.
At the same time, we should expect dramatic and consistent breakthroughs if we make available via a social networking model, access online to a larger collaborative community to pool data, share insights, create tools and conduct the large scale knowledge management of the outcomes of sequencing the large population.
The real value of sequencing to address human health issues from forming a National Genomic Sequencing Data Infrastructure project (beyond current efforts Hap Map etc.)to warehouse, mine and conduct research linking drug discovery, prevention, diet to individual health outcomes, is to actually prevent disease and promote health with personalized information. This should be pursued as the ultimate end game. This would be accelerated if a private and public partnership were to incentivize academics, research centers and private sector companies to collaborate beyond the efforts currently available.
Sequencing projects might focus on: smaller scale projects to offer innovation grants to research the links to develop lifestyle, genomic and cardiovascular disease; early detection links with specific Cancer populations; evidence based medicine with nutra-genomics.
Funding agencies should encourage sequencing innovations towards enhancing human health, more innovative research into drug discovery and genomics and the creation centralized collection, and distribution of genomic data with the object to better understand the much needed transformation of medicine and health care that must be addressed if we are to enable healthy consumers.
In an era of the aging baby boomers, depopulation, reduced GDP yet a multi-trillion dollar health care system, funding agencies must be on the vanguard to drive innovation, collaboration and discovery to direct the future of sequencing to become a viable cost-effective strategy for enhancing the health of America.
(220) Tuesday, March 17, 2009 1:57 PM
« Back to White Paper Page