Francis S. Collins on behalf of the International Human DNA Sequencing Consortium:
Until less than a year ago, the Human Genome Project (HGP) projected that completion of the 3 Gb human DNA sequence would not happen until 2005. Several recent developments have led to a dramatic acceleration of that timetable. These include both technical advances, such as capillary sequencers and the automation of several steps in sequencing, and advances in the organization of large-scale data production efforts. The success of several large-scale, pilot, human sequencing projects and the completion of the DNA sequences of important model organisms such as C. elegans add growing confidence of the ability to carry out large-scale sequencing in a cost-effective manner.
Fundamental changes in the scientific approach to the public sequencing effort have occurred in 1999, many in the past three months. Consequently, the accrual of human sequence information has dramatically increased. By September 24,1999, 821 Mb of human sequence had been deposited in GenBank in finished (428.6 Mb) form or as "working draft", and the total should reach a billion base pairs in a matter of weeks. The first complete sequence of a human chromosome is anticipated before the end of the year. By next spring, 90 percent of the human DNA sequence should be freely available as a working draft. The working draft of the human sequence will be derived from BACs of known map position, sequenced to 4 to 5X coverage, with an average expected contig size of 10 to 20 kb. As useful as the working draft will be, the unswerving commitment of the international sequencing consortium remains the completion of the finished sequence, at an accuracy of at least 99.99 percent (error rate equal to or less than 1 in 10,000 bp), and with no gaps other than the few areas that are unclonable in current vectors.
To achieve this accelerated timetable, the participants in the effort have organized themselves into an integrated team, and several additional resources of broad general utility have been generated. The genome has been completely allocated on a regional basis to the 16 international genome centers that have the capacity to rapidly carry out high throughput sequencing. Bacterial Artificial Chromosomes (BAC), seeded at roughly 0.4 Mb spacing, have been identified across the genome and are the first to be sequenced. As of September 1, 1999, restriction enzyme fingerprints of more than 267,000 clones from a deep, human BAC library (RPCI -11) have been generated; more than 169,000 have been fully analyzed and collapsed into a total of 10,470 contigs (containing 2 or more BACs), of which 8,287 have been localized using radiation hybrids or other methods. In addition, sequencing of all the ends of this BAC library and another (Cal Tech D) has almost been completed. The BAC end sequences and the fingerprint data provide a resource for rapidly walking out from the seed BACs. A central server, established by NCBI, keeps track of the status of individual BACs, in order to assist in project coordination and avoid duplication of effort.
Since 1996, the international sequencing consortium has been committed to the public release of all sequence information within 24 hours of assembly into 1 to 2 kb contigs. We continue to believe that the scientific community in both the public and private sectors, as well as the general public, is best served by the free and unimpeded access by all scientists to the fundamental information contained within the human genetic instruction book, and will continue to adhere to these principles.
Many researchers in human genetics spend a large fraction of their laboratory efforts mapping, cloning and sequencing human DNA. With the incipient availability of this unique resource, along with a catalog of common human variants (SNPs), the sequence of the mouse genome, and improved technologies for large-scale gene expression analysis, the face of human genetics research will change profoundly in the near future.
Top of page
Last Reviewed: March 17, 2012