Public-Private Consortium to Accelerate Sequencing of Mouse Genome

Results will expedite discovery of human genes

October 2000

The National Institutes of Health (NIH), the Wellcome Trust and three private companies today announced they have formed a consortium to speed up the determination of the DNA sequence of the mouse genome. The Mouse Sequencing Consortium will provide $58 million over the next six months to decipher the mouse genetic code.

Members of the Mouse Sequencing Consortium (MSC) and their contributions to the effort are SmithKline Beecham ($6.5 million), the Merck Genome Research Institute ($6.5 million), Affymetrix, Inc. ($3.5 million), the Wellcome Trust ($7.75 million), and six of the National Institutes ($34 million*), including the National Cancer Institute (NCI), the National Human Genome Research Institute (NHGRI), the National Institute on Deafness and Other Communication Disorders (NIDCD), the National Institute of Diabetes and Digestive and Kidney Disease (NIDDK), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute of Mental Health (NIMH).

MSC funds will support mouse genome sequencing at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass.; Washington University School of Medicine in St. Louis; and the Sanger Centre in the United Kingdom.

The MSC is another example of an emerging model for supporting large-scale genomics research in which public and private sector entities join forces to produce publicly available data sets that are crucial for basic biomedical research.

Like the efforts of The SNP Consortium (a group of pharmaceutical and technology companies that together with the Wellcome Trust are constructing a map of genetic variations that occur throughout human DNA) and the Merck-funded effort to generate a database of expressed sequence tags (DNA known to match regions of the genome that code for proteins), the MSC is a public-private partnership to generate data that will be freely available for the unrestricted use of biomedical researchers worldwide. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.

The desire to accelerate mouse genome sequencing builds on the completion in June 2000 of the working draft version of the human DNA sequence. With the working draft of the human genome sequence in hand, scientists in both industry and academia now seek to interpret its meaning. The DNA sequence of the mouse genome will provide an essential tool to identify and study the function of human genes.

Sequencing the mouse genome is now the next major goal of large-scale genomics and the Mouse Sequencing Consortium's effort will expand and accelerate the program to analyze the mouse genome begun by the National Human Genome Research Institute (NHGRI) in September 1999. That program already has generated most of the data for a "fingerprint" map of the mouse genome, including a set of sequences from the ends of cloned genomic DNA fragments, and is doing targeted sequencing of regions of the mouse genome that are of particularly high biological interest. The NHGRI effort also has begun to sequence the mouse genome in its entirety.

Mammals share many basic biological functions such as immune response, regulation of cell division, and development of major organ systems. The gene sequences in mouse and human that encode the proteins to carry out these functions also are shared to a high degree (85 percent sequence identity). The DNA sequences in the vast regions between genes are much less similar (50 percent sequence identity or less).

Since only about 5 percent of the human genome contain genes, sifting through the 3.1 billion DNA letters to find genes is an extremely challenging task. But, by comparing human and mouse genome sequences, the regions of high similarity are readily apparent and immediately identify protein coding regions and regulatory sequences. Thus, the mouse genome sequence will provide a powerful tool to interpret the newly available human genome sequence.

In addition to its use to aid the interpretation of the human genome, the mouse genome sequence also will increase the ability of scientists to use the mouse as a model system to study and understand human disease, and to develop and test new treatments in ways that can not easily be done with humans.

The genome of the mouse is the same size as that of the human, about 3.1 billion base pairs. As recommended by scientists studying the mouse, the genome sequencing effort will use a strain of mouse known as C57BL6/J, commonly called "Black 6." The sequencing strategy that will be used takes advantage of the best features of the map-based shotgun strategy used by the public sequencing consortium to produce the human sequence and the whole genome shotgun strategy used by the private sector effort that also produced a version of the human genome sequence in the past year. The melding of these two strategies promises to produce a high quality genome sequence more quickly than either strategy could alone.

The MSC's program will, by the end of February 2001, bring the overall depth of coverage of the mouse genome to 2.5X to 3X. This is the level of coverage at which shotgun genomic sequence first becomes useful to the typical scientist, with about 93 to 95 percent of the sequence of the mouse genome being available, albeit in small, unordered fragments. Subsequently, the mouse genome sequencing effort will generate the complete sequence coverage and assemble the entire sequence into a "finished," highly accurate form.

The data release practices of the MSC will continue the international Human Genome Project's sequencing program's objective of making sequence data available to the research community as soon as possible for free, unfettered use. In fact, the incorporation of the whole genome shotgun sequencing component has led to adoption of a new, even more rapid data release policy whereby the actual raw data (that is, individual DNA sequence traces, about 500 bases long, taken directly from the automated instruments) will be deposited regularly in a newly-established, public database operated by the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov) and a sister database operated by the European Bioinformatics Institute (EBI, www.ebi.ac.uk). These individual DNA sequences will be assembled into larger assemblies as soon as sufficient coverage is attained, which will be at about the point where working draft quality coverage of the genome is reached.

Public-Private Consortium to Accelerate Sequencing of Mouse Genome