Advanced Sequencing Technology Awards 2004

In 2004, the National Human Genome Research Institute (NHGRI) initiated a coordinated effort to support the development of technologies to dramatically reduce the cost of DNA sequencing, a move aimed at broadening the applications of genomic information in medical research and health care. The awards were announced on October 14, 2004 (NHGRI Seeks Next Generation of Sequencing Technologies).

Project summaries for advanced sequencing technology development projects in alphabetical order by grant recipient:

Polymerases for Sequencing by Synthesis

Benner, Steven A.
University of Florida, Gainesville
R21 HG003581

This project, as its R21 milestone, will deliver Taq DNA polymerases that catalyze the template-directed addition of nucleoside triphosphates carrying large fluorescent groups attached to their 3'-ends. The fluorescent groups therefore both terminate transiently the growth of the oligonucleotide chain, and signal the nature of the nucleotide that was last added. These polymerase variants will form the core of a "cheap reagent" approach to the Sequencing by Synthesis (SbS) strategy. Gaining control over polymerase behavior is key for this approach to generate inexpensive genome-quality sequence data. The research will exploit a decade of experience in the Benner laboratory with nucleic acid analogs, polymerases that accept them, and practical application of the combination. The tactics assume that site-directed mutagenesis is generally site-directed damage, and therefore must be followed by directed evolution to obtain polymerase-substrate combinations that meet specifications. Here, directed evolution will be used to restore catalytic power and fidelity in polymerases that have been engineered to accept fluorescent tags. We shall: (a) synthesize nucleoside triphosphates that have fluorescent blocking groups; (b) use a directed evolution system in water-in-oil emulsions to select polymerases that accept the triphosphates efficiently and faithfully; (c) obtain polymerases to incorporate these to within 10% the catalytic activity of native polymerases, and with specificity to better than one part in 10,000. The next phase of the project will be to develop a working prototype for a multiplexed sequencing-by-synthesis device using these polymerases. The Aims of that phase will be to: (d) optimize the fluorescent compound-cleavage chemistry-polymerase combination; (e) use an artificially expanded genetic information system (AEGIS), the artificial alphabet invented in the Benner group, to bin primer-template combinations for parallel sequencing; and (f) exploit 2D gels to develop an architecture for a prototype parallel sequencing instrument based on the technologies developed in Aims a-c.

DNA Sequencing Using Nanopores

Benner, Steven A.,
University of Florida, Gainesville
R21 HG003579

This project, as its R21 milestone, will deliver a combination of conical nanopores having read length dimensions slightly less than 1 nm, and nucleobase-modified DNA oligonucleotides, where the passage of the DNA through the nanopore proceeds with a time constant of 10-100 microseconds per nucleotide, and where the ion current through the nanopore, during the time when the DNA is in transit, varies detectably depending on the nucleotide that is in the pore at the time that the current is measured. This nanopore-modified DNA combination will form the core of an extremely inexpensive technology to generate long reads of DNA sequence at the single molecule level. The research will exploit a decade of experience in the Martin laboratory preparing nanopores and engineering their chemical context, and an equal experience in the Benner laboratory working with nucleic acid analogs, polymerases that accept them, and practical applications of the combination. As specific aims, we shall: (a) prepare the nanotubes; (b) attach chemical functionality to the nanotubes; (c) prepare nucleoside triphosphates carrying different sized polyether dendrimers attached at the 5-position (for pyrimidines) and the 7-position (for 7-deazapurines); (d) use these triphosphates to synthesize modified DNA molecules. The nanopores will then be physically characterized to determine their ion transport dynamics, and in conjunction with the modified oligonucleotides, to find a combination that meets the R21 milestone specifications. If this milestone is passed, the next period will be used to develop sequence specific and randomly targeted primers that incorporate DNA, PNA, and tags that exploit an artificial genetic alphabet, and to develop improved processes for generating conical nanopores in a form suitable for large scale application. These will then be targeted against specific sequences extracted from mammalian genomes.

High-Speed Nanopore Gene Sequencing

Collins, Scott D.
University of Maine, Orono
R01 HG003565

Significant enhancements in gene sequencing may be achieved through implementation of analysis instruments at the same dimensional scale as DNA, i.e., nanometers. Nanotechnology has recently provided the necessary tools to create such nanoinstruments and this proposal seeks to utilize these tools to fabricate a high-speed, low-cost gene sequencer. The gene sequencer is based on the nanopore approach and incorporates tunneling current electrodes to sense the individual nucleotides as they transverse the pore.

Specific aims for this proposal are:

Design and fabricate nanopore devices complete with tunneling current electrodes and integrated sense and control circuitry on chip.
Characterize the nanopore device using known ssDNA sequences. Initial characterization will be limited to DNA strands of approximately 1000 bases.

Bead-based Polony Sequencing

Costa, Gina L.
Agencourt Bioscience Corp., Beverly, Mass.
R01 HG003570-01

The goals of this project are to develop a robust sequencing by synthesis methodology for de novo and resequencing applications using the bead-based polony technology. Our overall R & D focus is to address key aspects of the technology that need to be refined to enable robust, high quality polony sequencing. Our experience in large-scale genome sequencing will serve well to ensure that the key issues involved in optimizing the technology against current industry standards, data processing, management, and analysis are effectively addressed in a time- and cost-efficient manner.
The specific aims are to:

Develop effective procedures for production of paired-end PCR libraries with virtual insert sizes (distance between read pairs) in the range of 2 to 50 kilobases.
Develop methods for effective solid-phase template amplification on derivatized microspheres and for enrichment of beads containing amplified templates.
Develop methods for robust array preparation.
Develop procedures for fluorescent in situ sequencing by synthesis.
Develop an integrated data acquisition system including fluorescence microscope, automated stage, flow cell, fluidics system and control software.
Develop data management and assembly software.
Develop functional reversible chain terminators.
Develop modified enzymes capable of efficiently incorporating reversible terminators.

Single Molecule Nucleic Acid Detection with Nanopipettes

Davis, Ronald W.
Stanford University, Stanford, Calif.
R21 HG003448

The long-term objective of this project is to develop a new technology that will enable rapid, single-molecule detection and identification of DNA sequences present in a biological sample. The current effort will focus on detecting nucleic acid molecules labeled with varying sizes of nanoparticles by recording changes in ionic current through a small, nanometer-scale channel in a "nanopipette." Once this detection technology has been demonstrated, the labeled oligonucleotides can be hybridized to a test sample, the unhybridized labeled molecules removed, and the remaining labeled DNA molecules can be rapidly detected on a single-molecule basis through the nanopipette. This will result in an ultra-sensitive, rapid genotyping technology that can be used for point-of-care diagnostics. The diagnostics can include the detection of pathogens or the determination of a human genotype in a clinical sample. This nanopipette DNA detection technology will also pave the way for second-generation devices, which allow higher resolution detection and could be used for rapid, single-molecule DNA sequencing, eventually realizing the possibility of sequencing an entire human genome in a matter of seconds. In this effort, Stanford will develop and demonstrate this nucleic acid detection technology with the following Specific Aims: -Nanopipette fabrication and characterization -Labeling DNA with nanoparticles -Measurement of labeled DNA.

Microbead INtegrated DNA Sequencer (MINDS)

Jovanovich, Stevan B.
Microchip Biotechnologies System, Fremont, Calif.
R01 HG003583

This collaborative project is aimed at the development of a "Microbead INtegrated DNA Sequencer" (MINDS) that efficiently integrates all of the major steps in DNA sequencing, from library construction to final sequence output exploiting low-cost microfluidic devices. The automated MINDS system will combine three fundamental steps: 1) library construction, amplification, and selection using microbead colony technologies; 2) nanoliter cycle sequencing sample preparation and purification; and 3) microfabricated capillary array electrophoresis (µCAE)-based separation of DNA sequencing fragments. The library construction and amplification process will input sheared, sized DNA fragments and construct an emulsion PCR amplified library of template on beads, with each bead representing a single DNA fragment. Single beads will then be processed in a 25 nL cycle sequencing reactor to produce fluorescently labeled sequencing fragments that are efficiently captured, concentrated and purified using on-chip affinity capture. The fragments are then separated and sized on a proven microfabricated µCAE sequencer.

This project will combine the efforts of Microchip Biotechnologies Inc. (MBI) with subcontracts to three collaborating academic institutions. MBI will develop a prototype microchip-based DNA sample preparation nanoscale thermal cycling module and a prototype µCAE sequencing system using conventional sequencing chemistries. These will then be integrated to produce a MINDS microchip with arrays of 25 nL cycle sequencing sample preparation, affinity purification, and µCAE sequencing. When this has been accomplished, by 30 months, MBI will further integrate microbead-based library technology being developed by the Mathies laboratory at UC Berkeley to create MINDS System prototypes ready for beta-testing. These developments will build upon novel methods and strategies developed in tandem by the academic collaborators, in particular the µCAE separation system and bead-based microfluidic "cloning" methods. A subcontract to the Mathies lab at U.C. Berkeley will support the development of new microtechnologies for the amplification and selection of clones, and the integration of these methods and processes with prototype microfabricated sequencing systems. In collaboration with the Mathies group, the Barron lab at Northwestern will develop and test novel DNA separation matrices that are easily loaded into and replaced from chip microchannels, and that provide rapid, high-resolution separations. The overall project goal is to develop and then beta test a fully integrated, prototype Sanger sequencing system at the Ju lab of the Columbia Genome Center to demonstrate the feasibility of performing genomic sequencing and resequencing at 100-fold lower cost with an anticipated throughput of about 7 million bases/day/machine.

The MINDS system will greatly reduce the cost of shotgun sequencing and resequencing, by exploiting the ability of well established µCAE devices to analyze sub-nanoliter volumes through preparation of samples in volumes more closely matched to the analytical requirements, reducing cycle sequencing reagent consumption by 100-fold. Library construction will be automated in the bead-based format, with amplification and selection performed at full scale in a single bulk reaction, again reducing reagent consumption and cost. A novel polymeric separation matrix designed for microchips already shows good performance and, along with microfluidic volume reductions, will minimize matrix expense. With these combined innovations, the MINDS system will drive CAE instrumentation close to the ultimate performance possible for four-color Sanger fluorescent DNA sequencing in an ultra-high-throughput implementation for genome centers. Future work will explore the development of lower-throughput versions appropriate for core and individual laboratories.

An Integrated System for DNA Sequencing by Synthesis

Ju, Jingyue
Columbia University, New York, N.Y.
R01 HG003582

The objective of the proposed research is to develop an integrated system for DNA sequencing by synthesis (SBS) using photocleavable fluorescent nucleotides. The SBS system includes the construction of a chip with immobilized single stranded DNA templates by site-specific coupling chemistry. These templates contain a self priming moiety to generate the complementary DNA strand in polymerase reaction using 4 photocleavable fluorescent nucleotides whose 3-prime-OH group is modulated to allow their efficient incorporation into the growing strand of DNA as temporary terminators in the polymerase reaction. A 4-color fluorescence imager is then used to identify the sequence of the incorporated nucleotide on each spot of the chip. Upon removal of the fluorophore photochemically and reactivation of the 3-prime-OH group, the polymerase reaction will proceed to incorporate the next nucleotide analogue and detect the next base. It is estimated that 10,000 bases will be identified after each cycle on one sequencing chip that contains 10,000 different DNA templates.

Experimental R&D for Rapid Sequencing Nanotechnology

Lee, James W.
Oak Ridge National Laboratory, Oak Ridge, Tenn.
R01 HG003592

The long-term goal of this NIH research project is to demonstrate a novel nanotechnology concept that we developed at Oak Ridge National Laboratory for rapid nanoscale reading of nucleic acid sequences on an individual molecule. According to this concept, it is possible to obtain genetic sequence information by probing through a DNA molecule base by base at a nanometer scale-as if looking through a strip of movie film. The proposed nanotechnology has the potential capability of performing DNA sequencing at a speed at least 2800 times faster than that of the current technology: that is, a sequencing job that would take 2000 years to complete using the current machine could be accomplished within 1 year via this nanotechnology. This enhanced performance is made possible by a series of innovations, including novel applications of a fine-tuned nanometer gap for passage of a single DNA molecule, thin-layer microfluidics for sample loading and delivery, programmable electric fields for precise control of DNA movement, and detection of DNA nucleotide bases by nanoelectrode-gated tunneling conductance measurements. One of the most crucial components is the nanometer nucleotide detection gate, which comprises two sharp tips of nanoelectrodes pointing toward each other on a nonconductive (e.g., SiO2) plate. At the R21 pilot phase of this experimental project, we will fabricate this detection gate using electron-beam lithography and our patented programmable pulsed precision electrolytic nanofabrication technique. We will also perform proof-of-principle demonstration for detection of nucleotide bases such as poly [A] or poly [C] in a 2- to 5-nm electrode gap by tunneling conductance spectroscopic measurements across the nanoelectrode gate. When we achieve these R21 milestones (fabrication of nanoelectrode detection gate and proof-of-principle demonstration for detection of nucleotide bases), this project would then move onto a development phase to fully develop and demonstrate this novel nanotechnology for rapid DNA sequencing by nanoscale direct reading on single DNA molecules. This project is expected to deliver a prototype of the envisioned rapid sequencing nanotechnology near the end of its next phase.

Computational R&D for Rapid Sequencing Nanotechnology

Lee, James W.
Oak Ridge National Laboratory, Oak Ridge, Tenn.
R21 HG003578

The goal of this R21 computational and software project proposal is to support the parallel R21 experimental project. The long-term goal of these two projects is to demonstrate a novel nanotechnology concept that we developed at Oak Ridge National Laboratory for rapid nanoscale reading of nucleic acid sequences directly on an individual molecule. According to this concept, the genetic sequence information can be obtained by scanning a DNA molecule base by base at a nanometer scale as if one were looking through a strip of movie film. The proposed nanotechnology has the potential capability of performing DNA sequencing at a speed that is at least about 2800 times faster than that of the current technology. This enhanced performance is made possible by a series of innovations, including novel applications of a fine-tuned nanometer gap for passage of a single DNA molecule, thin-layer microfluidics for sample loading and delivery, programmable electric fields for precise control of DNA movement, and detection of DNA nucleotide bases by nanoelectrode-gated tunneling conductance measurements. During the R21 pilot phase of this computational project, we will perform quantum-mechanical computations to provide better understanding of the nanoelectrode-gated electron-tunneling nucleotide detection process and apply molecular dynamics simulations to compute the needed electric fields to effectively drive and control the transport and conformational motion of a DNA chain through the detection gate. We will also develop key software that will be employed by the experimental project at the beginning of the next phase for the system assembly and control.

Molecular Reading Head for Single-Molecule DNA Sequencing

Lindsay, Stuart M.
Arizona State University, Tempe
R21 HG003061

The goal of this study is to evaluate a novel single-molecule DNA sequencing technology that has the potential to sequence a molecule of genomic dimension in hours. The DNA is attached to a rotaxane complex consisting of a molecular ring (cyclodextrin) that self-threads onto a propylene oligomer. The far end of the propylene oligomer is attached to a fixed surface, and the cyclodextrin ring is covalently attached to an AFM probe. As the AFM probe is pulled away from the fixed surface, the DNA passes through the cyclodextrin ring, one base at a time. Fluctuations in molecular friction as the ring passes each base are recorded as deflections of the AFM cantilever. If these data can be interpreted in terms of the base sequence of long DNA molecules, then single DNA molecules can be sequenced rapidly with this new technology. Preliminary studies appear to show that the DNA can be pulled through a cyclodextrin ring. They also indicate that the cantilever deflection during retraction depends on the DNA sequence. An unanticipated discovery is that double stranded DNA appears to pass the ring more easily than single stranded DNA, and does so with less random fluctuation than is the case for single stranded DNA. Our first goal is to put the 'ring sliding' model to further test. Does the ring really slide over one strand of double-stranded DNA, peeling the complementary strand off? Does sequence-specific adhesion between the DNA and the fixed surface contribute to the sequence-related signal? If these experiments unearth a problem with our system, we will modify the chemistry appropriately. Once the operation of the system is verified, we will carry out a program of theory and experiment aimed at understanding these initial observations and establishing the limitations of the technology as developed thus far. Guided with information from these studies, improved molecular 'reading heads' will be designed, sequencing parameters will be optimized and hardware will be improved with the goal of reliable sequencing of oligomers, a prerequisite for subsequent attempts at large-scale sequencing.

Massively Parallel High Throughput, Low Cost Sequencing

Lohman, Kenton L.
454 Life Sciences Corp., Branford, Conn.
P01 HG003022

Background: Large-scale genomic sequencing currently requires high cost equipment and is labor intensive. The throughput of conventional sequencing has grown inadequate in fulfilling the escalating demands for genomic sequence. Understanding the intricacies of human genetic organization and how it relates to human health and inheritance, requires genomic-level comparative analyses that cannot currently be performed due to the lack of sequence information. The 454 Corporation has developed a massively parallel, high-throughput sequencing instrument that combines simultaneous sequencing in hundreds of thousands of picoliter-scale reaction wells, with high-powered bioinformatics. The method does not require an exponential scale up in effort or cost, despite exponential increases in genome size. The effort and cost of conventional sequencing scales up proportionately with the size of the genome. The 454 approach will be low cost, and make sequencing large genomes available to a wide variety of laboratories.

Specific Aims: In this program we will (i) Construct a robust double-ended sequencing method that generates short sequences from both ends of each individual fragment; and (ii) Develop a robust sequence assembly tool appropriate for double-ended sequencing. Our Study design incorporates a multi-disciplinary effort across molecular biology, chemistry, engineering, software and bioinformatics groups at 454 Corporation. The molecular biology and sequencing efforts will be lead by the PI, Dr. Kenton L. Lohman. The hardware, fluidics, optics, software and bioinformatics efforts will be led by co-PI, Dr. Marcel Margulies. We will be taking advantage of current 454 infrastructure and key personnel.

Relevance: There is a growing need across the research, pharmaceutical and clinical communities for low-cost, high throughput genomic sequencing. Comparative genomics, SNP and haplotype analyses have shown tremendous potential to rapidly characterize individual susceptibilities to many classes of chronic and acute diseases and disorders. Current costs of whole genome sequencing can only be borne by large institutions. The cost of automated sample preparation and sequencing, scales up proportionately to the size of the genome being sequenced. The 454 Sequencing system simultaneously analyzes millions of fragments in massively parallel sequencing of mammalian organisms. The use of massively parallel sequencing and bioinformatics analysis creates a low cost, high throughput sequencing system.

454 Life Sciences Massively Parallel System DNA Sequencing

Margulies, Marcel
454 Life Sciences Corp., Branford, Conn.
R01 HG003562

Relevance: There is a growing need across the research, pharmaceutical and clinical communities for low-cost, high throughput, genomic sequencing. Comparative genomics, SNP and haplotype analyses have shown tremendous potential to characterize individual susceptibilities to many chronic and acute diseases or disorders. Additionally, genomic data will lead to advances in agriculture, environmental sciences and further our understanding of evolution and ecological systems. However, the cost of sequencing mammalian-sized genomes is currently too high and we remain too far away from being able to afford the use of comprehensive genomic sequence information on a routine basis, in part because such large-scale sequencing requires a great deal of equipment and is labor intensive. Of equal importance with a significant decrease in cost, is the need to develop a complete platform that brings to any research laboratory the capability to perform sequencing of sizable organisms without a large and expensive infrastructure.

Background: 454 Life Sciences has developed a massively parallel, high-throughput sequencing system, designed to simplify, parallelize and speed up all aspects of sequencing viral and bacterial genomes, from sample preparation, through amplification and sequencing, to data processing and assembly. There is one sample preparation and one amplification process for a whole genome, done without need for robotics, cloning or colony picking, by one individual, in one laboratory. That same individual can do whole genome sequencing on a single high throughput instrument that simultaneously sequences all fragments in hundreds of thousands to millions of picoliter-scale reaction wells, and performs base calling and scaffolding in real time, with consensus accuracies of > 99.99%. Currently, sequencing of viruses and bacteria is performed at a throughput of 5 Mbp/hour. Under a separately funded NIH grant, 454 will be scaling up this system to perform paired-end sequencing of whole genomes up to the size of small fungi.

Projects: We will further expand the existing platform to handle resequencing and de novo sequencing of mammalian genomes at very low cost and high accuracy in 3 projects: (i) Scaling the 454 hardware to achieve two orders of magnitude reduction in the cost per base, at a throughput of up to 50 Mb/hour, and at an accuracy of > 99.99%; (ii) Extending the 454 molecular biology to very small beads and to combined read lengths of 400 basepairs with paired-end sequencing of very long fragments; (iii) Extending the modular assembler algorithms to allow the use of large-span paired-end reads, leading to resequencing and de novo assembly of mammalian-sized genomes.

454 relies on a very talented, multi-disciplinary team that encompasses engineering, molecular biology, chemistry, software and bioinformatics groups. The hardware, fluidics, optics, software and bioinformatics efforts will be led by PI Dr. Marcel Margulies. The molecular biology, chemistry and sequencing efforts will be led by co-PI, Dr. Michael Egholm. During year 3 of the program, 454 will build at its own expense, and make available on a contract basis, a small, high throughput mammalian genome sequencing facility that can perform de novo sequencing of mammalian genomes at a price of less than $300,000 and in less than 5 days. This facility will cover less than 3,000 sq. ft and be staffed by less than 10 personnel. We will be ready to deploy such a facility commercially at other sites at the end of year 3.

Nanopores for Trans-membrane Bio-molecule Detection

Marziali, Andre
University of British Columbia, Vancouver, Canada
R01 HG003248

Single-molecule approaches to the collection of biological data can reveal temporal dynamics of processes that would otherwise be unavailable through measurements of ensembles of molecules or cells. The complete elucidation of regulatory networks in cells will require time-resolved gene expression data obtained from a single cell to determine the time constants of the network feedback loops. It has been shown that there is a strong analogy between networks in cell biology and electronic circuits - present tools available to cell biologists are the equivalent of a voltmeter in electronics, yielding information only on slowly varying averages. Cell biologists will eventually need the biological equivalent of an oscilloscope to perform minimally invasive measurements of biomolecule levels in live cells in real time. Single molecule techniques are the most promising candidate at this time for such a tool. Furthermore, single molecule approaches may lead to highly sensitive assays with broad applications including genotyping, gene expression studies, and protein detection. It is conceivable that arrays of single-molecule nanosensors would provide data similar to microarrays for gene expression or SNP determination, but with increased data quality and higher sensitivity. In preliminary work, we have developed an organic nanosensor capable of detecting and distinguishing between similar nucleic acid strands across a lipid membrane. The sensor is based on a 2 nm wide protein channel that self-assembles into a lipid membrane, with an engineered nucleic acid and protein construct inserted into the pore under an applied electric field. This nanosensor assembly results in a nucleic acid tail protruding through the lipid bilayer the pore is inserted in. This tail is engineered to bind to specific analytes, such that when an analyte is bound and an attempt is made to withdraw the tail from the pore, resistance is encountered - the whole operation resulting in something analogous to ice-fishing. We have successfully used this nanosensor to detect and characterize binding of single DNA strands. In this application, we propose an expansion of this work to determine the operating limitations of this prototype nanosensor, and to develop additional nanosensor prototypes for improved detection of both nucleic acids and other bio-molecules. Though beyond the scope of this initial application, this research is intended to eventually provide a powerful tool for in vivo sensing of biomolecules for the study of cellular function and complex cellular diseases (such as cancer), as well as novel synthetic nanosensor arrays for highly accurate quantitation of gene expression and improved, low cost genotyping.

Ultra Fast Nanopore Readout Platform for Designed DNAs

Meller, Amit
Rowland Institute at Harvard, Harvard University, Cambridge, Mass.
R21 HG003574

We describe a novel methodology for rapid and massively parallel DNA sequencing that promises to considerably reduce the time and cost of genome sequencing. The method includes two main steps: 1. Conversion of the target DNA molecules into easily readable code units (Designed DNAs, invented by LingVitae AS); and 2. rapid readout of the designed DNAs using our nanopore based approach. The first step has been recently demonstrated by LingVitae AS. Here we present a novel readout platform based on the simultaneous optical probing of multiple nanopores.

The unique combination of designed DNAs with the nanopore optical readout, eliminates the uncertainties associated with the development of new chemical compounds, required in other approaches. In addition, since the nanopore readout does not rely on the relatively slow enzymatic incorporation of nucleotides, and because it can be applied to read the sequence of single molecules, an extremely high throughput is expected, resulting in a cheaper and faster approach. In this proposal, we lay down a straightforward experimental strategy for testing our approach, based on our expertise in nanopores and in the optical probing of single-biomolecules.

The specific aims of our proposal are: fabrication of an instrument for concurrent electrical and optical probing of single DNA molecules inside the nanopore; testing the DNA readout of 5, 10 and 20 nucleotide DNAs using our nanopore setup; and implementing simultaneous multi pore DNA readout.

Ultrafast SBS Method for Large-Scale Human Resequencing

Metzker, Michael
Baylor College of Medicine, Houston, Texas
R01 HG003573

Identifying and understanding roles of single nucleotide polymorphisms (SNPs) will lead to accurate diagnosis of inherited disease states, determination of risk factors, and characterization of patients' metabolic profiles. Such technology promises to lead to prophylactic treatments to delay the onset or progression of disease, and prescriptions of the safest and most efficacious medications. Current DNA sequencing technology, however, is too slow and expensive for these tasks.

Here, we propose to develop an ultrafast DNA sequencing strategy featuring sequencing-by-synthesis (SBS). The collaborative team involved in this project was responsible for some of the earliest published work on SBS, and recognize the fundamental challenge that any method based on this approach must address before tangible progress to a practical system can be made. That is, to identify combinations of appropriately modified nucleoside triphosphates that will be accepted, efficiently and with high fidelity, by suitably mutated DNA replicating enzymes. Consequently, this proposal features a strong synthetic chemistry component featuring two laboratories focused on preparing nucleoside triphosphates with fluorescent, labile 3'-protecting groups. It also describes molecular biology to produce relatively large libraries of mutated polymerases. Even though the numbers of modified enzymes generated is high, the mutations will focus on key structural regions to maximize the chances of finding suitable systems. This molecular biology component is coupled with a combinatorial screen to rapidly identify suitable enzyme/modified dNTP pairs. Once suitable combinations are identified, then the SBS methodology will be implemented on solid support surfaces for DNA sequencing applicability. It is envisioned that successful demonstration of the SBS technology would then fit into a broader, comprehensive research plan encompassing microfluidics for sample manipulation and delivery of the DNA to the SBS system, fluorescent imaging via our proprietary Pulse-Multiline Excitation (PME) system, computational methods for identifying an optimal tiling path and thermodynamic properties of oligonucleotides across whole chromosomes, and informatics to process and store the data generated.

High Throughput Single Molecule DNA Sequencing

Quake, Stephen R.
Stanford University, Stanford, Calif.
R01 HG003594

The Human Genome Project took several years to complete, yet it is only the beginning of a period in which large amounts of DNA and RNA sequence information will be required for medical diagnostics, forensics, and developmental biology. Conventional sequencing technology has limitations in cost, speed, and sensitivity and the demand for sequence information far outstrips the current ability to obtain it. We recently demonstrated the first proof of principle experiments for a new technology that will provide a fast, low cost, and highly parallel technique for DNA and RNA sequencing. This technology uses single molecule detection of fluorescently labeled nucleotides after DNA polymerase incorporates labeled dNTPs into immobilized individual DNA molecules. A major advantage of this technique over current sequencing methods lies in its ability to obtain sequence information from millions of independent molecules in parallel. Here we propose to develop reagents and methods for single molecule sequencing runs with longer read lengths and higher accuracy, ultimately reaching the NIH gold standard of 99.99%, while reducing the cost of sequencing a mammalian genome to below $100,000.

Nanotechnology for the Structural Interrogation of DNA

Ramsey, J. Michael
University of North Carolina, Chapel Hill
R01 HG002647

We propose a research program to achieve the goal of sequencing single molecules of polynucleotides using conductance probes within a molecular scale aperture and to determine the technical feasibility of this promising approach. There have recently been intriguing suggestions about how one might rapidly determine the sequence of a single DNA molecule contained in a buffer solution by transporting it through a voltage-biased nanoscale aperture while monitoring the ionic current through that aperture [Kasianowicz, 1996; Deamer, 2000]. Some suggestive proof-of-principle experiments have been demonstrated using lipid bilayer supported protein pores and observing variations in pore axial conductance. We contend that for this strategy to become a realizable technology, robust nanometer scale apertures must be fabricated using a combination of top-down and bottom-up approaches. In addition, interesting variants of this approach such as incorporating laterally opposed nanoelectrodes in a nanochannel for probing monomeric variations in the electrical properties of polynucleotides can only be achieved through nanofabrication. Our specific aims are listed below. Develop fabrication capabilities that combine top-down and bottom-up strategies for forming fluidic channels and electrical probes with length scales approaching 1 nm. Investigate the dependence of the length scale probed on nanopore axial and lateral dimensions. Compare the signal-to-noise ratio for axial and lateral conductance probes of single DNA strands. Determine variation of measurement signal-to-noise ratios as a function of chemical and physical parameters such as aperture size, buffer conditions, interfacial hydrophobicity, and electrode size. Determine impact of polymer dynamics on fundamental limits of DNA structural determinations.

Pyrosequencing Array for Genome Sequencing

Ronaghi, Mostafa
Stanford University, Stanford, Calif.
R01 HG003571

We propose the development of the Pyrosequencing array for genome sequencing. Pyrosequencing has been widely used by other laboratories for de novo sequencing and has great potential for miniaturization. The aim of this proposal is to develop an exportable, inexpensive device that is able to produce sequence data from millions of features on a single chip. As a multidisciplinary team at Stanford University, we have already worked toward the development of such a platform. The team proposes a plan to develop this methodology to reduce the cost below $100,000 for mammalian genome sequencing. We will discuss a step-by-step development plan to achieve this goal in three years. Briefly, the proposal covers clonal amplification, miniaturized Pyrosequencing, integrated PCR, a Pyrosequencing platform, an integrated fluidic and CMOS imaging platform including a signal processing unit, automation of an inexpensive platform and methodology for short read assembly to assemble a mammalian genome.

Single-Molecule DNA Sequencing Using Charge-Switch dNTPs

Williams, John
LI-COR Inc., Lincoln, Neb.
P01 HG003015

This Program Project is related to an effort at LI-COR begun in 1998 to develop a system for de novo sequencing of single DNA molecules with very long reads. In particular, the Program Project will further the development of reagents and microfluidics flowcells for the system. A successful system would be revolutionary with respect to speed, read length, cost and minimized laboratory infrastructure. An entire genome would be sequenced from a single genomic DNA sample without cloning or amplification, and the long reads not only enable de novo genome sequencing, but automatically provide haplotype information. We are targeting a per-instrument throughput of 500 raw base calls per second with low error rates (1 error per 10,000 finished bases with 5x coverage). Manufacturing cost for reagents and flowcells is initially targeted to be about 0.001 CENTS per FINISHED (5x) base, with the potential of laying the technological framework to enable future significant additional cost reductions. Concurrent instrumentation and image analysis developments are funded independently of this Program Project. The proposal has three Program Goals: 1) Design, fabricate, and evaluate multichannel flowcells that enable bead-docking of DNA templates; fluidic control of reagents (including polymerases and modified nucleotide substrates); and charge-switched partitioning of released labeled pyrophosphates from intact gamma-phosphate-labeled nucleotides. 2) Design, synthesize, and evaluate four modified nucleotide types (A,C,G,T) whereby such modification involves attaching a fluorescent dye with photophysics suitable for single molecule detection via various linker arm configurations to the gamma-phosphate of the nucleotide as well as the attachment of a charge moiety (e.g., +2 charge) to the nucleotide base. 3) Preparation, expression, purification and screening of mutant polymerase libraries to evolve a polymerase that is suitable for incorporating the charge-switch nucleotide substrates with a nucleotide incorporation rate and fidelity as appropriate for meeting the throughput goals in conjunction with the multichannel flowcells.

Multiplexed Reactive Sequencing of DNA

Williams, Peter
Arizona State University, Tempe
R01 HG003567

In a new approach to DNA sequencing, DNA primers synthesized to be complementary to specific sequences on targeted genes are covalently tethered in known locations in an array on the surface of a glass slide and allowed to hybridize with and capture complementary target gene fragments. The primer/template duplexes are serially interrogated by single species of fluorescently-labeled deoxyribonucleotide triphosphate (dNTP) in the presence of an exonuclease-deficient DNA polymerase enzyme. The polymerase interrogates the template sequence beyond the 3' end of the primer strand and incorporates a deoxynucleotide monophosphate if it is complementary to the next template base. Quantitative fluorescent imaging of the array identifies the extended primers and determines the number of nucleotides incorporated, thus reading a short length of sequence (one to several bases) at a known location on the target gene. The fluorescent label is then destroyed in a selective photochemical reaction and the cycle is continually repeated with all four types of dNTP. Studies will be directed towards increasing read length and sequence accuracy by optimizing attachment chemistries and enzyme performance, reducing nucleotide impurities to negligible levels, calibrating any context-dependent in the fluorescence response, and correcting for signals arising from extension failure. The initial read length target is at least 50 bases per spot; primers will be tiled at short intervals (~ 20-50 bases apart) along the target gene sequences so that long sequences can rapidly be read out in short parallel bytes. Array densities > 10,000 spots are anticipated, which, with reaction cycle times ~ 2 min/dNTP in an automated system, should allow data rates approaching 3,000 bases per minute on a single slide. Initial sequencing studies will address genes known to be associated with an elevated risk of cancer. As read length is increased, the technology will be applied to de novo sequencing by spotting cloned templates annealed to a universal primer corresponding to the vector sequence.

Last updated: October 03, 2011

Advanced Sequencing Technology Awards 2004