Workshop on the Future of the Large-Scale Sequencing Program
National Human Genome Research Institute
National Institutes of Health U.S. Department of Health and Human Services
Workshop on the Future of the Large-Scale Sequencing Program
June 13, 2005
The National Human Genome Research Institute convened a workshop to obtain
opinions from the scientific community on the current status and potential future
directions of the NHGRI large-scale sequencing program. Participants were asked
to consider the scientific, technological, and strategic opportunities in evaluating
NHGRI's future investment in sequencing, and to specifically address several
general questions and challenges:
Given what has already been accomplished - very high quality assembled genome
sequences of the human and major model organisms, draft sequence assemblies
of genomes representing many of the nodes of the metazoan lineage, concerted
application of comparative sequencing to annotate mammalian genomes - what
are the best future opportunities for large-scale sequencing? What is the
proper balance of these types of projects going forward? Should other kinds
of large-scale sequencing projects be considered? What is the continuing priority
of large-scale sequencing as a source of genomic data compared with other
types of genomic data?
Disruptive technologies appear to be promising enough that a significant
reduction in the cost of DNA sequencing could occur within the next three
years. What are the realistic prospects for the introduction of such a disruptive
technology? How should it be anticipated and encouraged? How would it affect
sequencing costs and capacity? How would it affect the types of scientific
questions that can be addressed? How should the possibility of future significant
cost reductions affect the decisions about the types of sequencing projects
that should be initiated in the next two to three years?
How should NHGRI evaluate the ongoing value of its investment in a large-scale
sequencing program? How should it assess the contribution that continued sequencing
will make to scientific research overall and genomic research in particular?
How should it ensure that the genomic sequencing program will continue to
yield the greatest return for biomedical research?
Participants included members of user communities, sequencing center personnel,
sequencing advisors, members of the various working groups that select new sequencing
targets for the NHGRI program, developers of new sequencing technologies, scientific
advisors to the large-scale sequencing program, and members of the National
Advisory Council on Human Genome Research.
There was a strong consensus that the genomic sequence information that has
already been obtained is extraordinarily valuable, and indeed that the larger
scientific community had just begun to make use of it. Participants had a sense
that, with new technologies on the horizon making sequencing perhaps 10-fold
less expensive, more ambitious scientific challenges could be addressed, and
that the sequence information would continue to transform the way that biomedical
science is done. Indeed, participants thought that sequence information was
still inappropriately undervalued by much of the community, even by some of
those that depend on it. There was a broad consensus that the program should
continue at a level of investment not very different from what it is today.
However, there was also broad consensus that the next solicitation for sequencing
proposals should be significantly modified from the previous one for the program,
that the target selection process would have to be revised to be driven more
by important scientific problems, and that NHGRI needed to seek ways to ensure
that the broader community can better use the genomic information and indeed
become true stakeholders, rather than recipients of the data.
Most of the discussion and recommendations fell into a number of broad and
closely inter-related themes, within which were some specific recommendations.
Disseminate knowledge about how to use sequence and develop true stakeholders.
NHGRI and other agencies that fund sequence production have generally relied
on simply releasing the data into the public domain. As a consequence
of the current programmatic structure, in which all the effort has been centralized,
much of the expertise about how best to use sequence information actually
resides in the sequencing centers. This has been a reasonable beginning strategy,
but will not serve science as well in the future. Instead, NHGRI should look
for ways to actively increase the constituency for genome sequence to develop
a community of true stakeholders that will be able to find increased use for
genomic sequence. In the new solicitation, there should be an incentive for
centers to disseminate knowledge through collaborations, education, and other
means. Past performance in this regard could be a review criterion.
NHGRI should take advantage of the knowledge about how to use sequence
information that resides in the centers. This emphasizes that the centers
themselves often have some of the most compelling ideas for the use of sequencing
capacity. Thus, they should be allowed a more active role in selecting projects,
especially in collaborations on specific problems. In the next solicitation,
NHGRI should consider allowing center-initiated projects in the mix of inputs
to the target selection process.
Technology will change rapidly, but unpredictably. New technologies
seem poised to rapidly decrease the cost of sequencing, perhaps by as much
as 10-fold. In all likelihood, new read types will first be used as an adjunct
to more traditional reads in an assembly or for resequencing. But as read
lengths improve and paired-end sequencing becomes possible with them, the
new technologies will replace Sanger sequencing over time for whole genome
shotgun assemblies. Sequencing centers will continue to have to spend considerable
effort adapting even commercially available instruments into their production
environments. The timing of all of this is uncertain, but it is beginning
now and could play a significant role in the next three years. NHGRI should
consider with care how much it should decrease its investment in sequencing-if
cuts to the program are too large or made too quickly, it will stifle the
transition to new technology.
Target selection should be based increasingly on big, compelling scientific
questions and less on review of individual organism targets. As capacity
increases, the current target selection process will not scale. In general,
NHGRI must continue to find the most compelling problems to address with its
large-scale sequencing capacity. The current program seeks proposals for sequencing
targets and evaluates them only one, or a few, at a time. Instead, NHGRI should
seek to address the most compelling questions. Several examples were mentioned:
the Human Cancer Genome Project; a similar effort aimed at another major system
or disease; identifying the basis for 50-100 Mendelian disorders; identifying
all the differences within the hominid lineage between apes and humans; and
several others. NHGRI only needs to have a few of these at a time; these would
not preclude doing other, still important but less ambitious projects at the
same time. It was suggested that NHGRI organize a scientific publication that
would invite several prominent scientists to imagine how they would use sequence
information if it became much more easily available.
Matching the throughput of the sequencing pipelines to their inputs
and outputs will require resources. As capacity increases, NHGRI will
need to pay attention to the community's ability to use the data. In addition
to the points mentioned in the other numbered themes, this will require new
computational infrastructure. With more capacity, obtaining samples for new
sequencing projects will also require significant resources and coordination.
The program must remain flexible in several respects. Production
of whole genome shotgun sequence will still be a useful and important part
of the program. But increasingly, other sequencing products will be more relevant
for solving certain critical biomedical problems. Therefore, the next solicitation
should seek centers with a range of capabilities, including whole genome shotgun
production, production of directed sequencing reads, ESTs and cDNAs, and other
products. This will enable the program to respond to the most important challenges
as they are identified over the next several years. The program must also
maintain flexibility to adapt new technologies. One of the strengths of the
program is that it is composed of a portfolio of centers with different abilities.
Sequencing is not merely a commodity (even if Q20 base pairs may be).
Production of reads is an important core function of sequencing centers, but
is not sufficient for a number of reasons. For example, genomes are not all
identical-some are more challenging than others. Methods continue to evolve.
Centers must be able to adopt new technologies rapidly. Most significantly,
centers are repositories of knowledge about how to use sequence information,
and thus are intellectual resources for the scientific community. The centers
are evolving towards being true genome centers. There was some discussion
that the centers should be allowed to venture beyond sequencing-related activities.
One suggestion was to organize the centers as a component of a program along
the lines of the MIT Media Center, in which one would construct a model for
a fully integrated set of projects to address a major problem-for example,
how rational health care could be delivered. In any event, the performance
of the future centers should be measured by a wider set of parameters than