Mouse Sequencing Consortium Completes Program To Accelerate Availability of Mouse Genome Data
Publicly Available Sequence Covers 95 Percent of Genome; Mouse Data Invaluable for Study of Human Genes and Disease
May 8, 2001
BETHESDA, Md. - The Mouse Sequencing Consortium (MSC), an international public-private effort to accelerate the sequencing of the mouse genome, announced today it has achieved its goal to generate three-fold coverage of the mouse DNA sequence. These data - produced over a six-month period - represent at least 95 percent of the full complement of mouse DNA, and are freely available for the unrestricted use of researchers worldwide.
"The ability of the Mouse Sequencing Consortium to deliver on time and on budget against pre-specified targets underscores the value of this new model of public-private collaboration" says Arthur Holden, co-chairman of the MSC. "The success of the MSC and other public-private research consortia no doubt will lead to future cooperative efforts to solve big problems quickly, especially when the resulting data belong in the public domain."
The MSC - comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust - was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome. The availability of these data is considered essential to the further understanding of the human genome. Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.
All mammals - including humans and mice - share a common ancestor that lived some 80 ? 100 million years ago, and as a result, their genomes are similar. The most important functional regions of the genome - the genes that contain the information to make proteins - have changed relatively little over the millennia since any change that would interfere with essential functioning would not be passed on to the next generation. Some genes shared by mice and men have changed so little that they remain 90 percent or more similar; whereas others have changed more and are now only 60 percent similar. The non-coding regions of the genome for mice and humans are much less similar. By comparing the mouse genome sequence to the human genome sequence, the regions of similarity can be recognized readily. Using computers to do the comparisons, scientists are rapidly able to find these regions of similarity, many of which contain previously unrecognized genes.
In addition to highlighting the coding sequences of genes, comparing mouse DNA and human DNA will help identify other functionally important genetic features of the human genome that have been conserved, such as regulatory regions of DNA that turn gene expression on and off.
"This is a great day for finding genes in the human," says Francis S. Collins, M.D., Ph.D., director of the National Human Genome Research Institute (NHGRI). "Comparing mouse sequence to human sequence will help identify previously unknown human genes, in essence using evolution's "lab notebook" to understand how the genome works. Now we need to finish the work so the mouse sequence is as accurate and complete as the human sequence."
The Mouse Sequencing Consortium used a whole genome shotgun sequencing approach to generate the initial coverage quickly. Since the shotgun approach produces random bits of sequence, scientists analyze each base several times over to ensure that nearly all of the bases in the genome are sampled. The MSC program sampled each base an average of three times, bringing the amount of mouse sequence available to about 95 percent of the total - albeit in small, unordered fragments.
The NHGRI, the Wellcome Trust, and the participating sequencing centers will continue work to provide greater depth of coverage. The next phase will utilize larger stretches of DNA of known map position, and will assemble the fragmentary pieces of sequence into the finished, highly accurate sequence of the mouse genome.
Rapid Access To Raw Data
To speed access to the data, the mouse sequence results have been continually released, as is the practice of the Human Genome Project's human sequencing effort. New mouse data have been posted on a weekly basis in a novel type of database that was established to make the individual (raw) sequence reads publicly available. To date, the MSC has deposited more than 15 million individual unique sequence traces. The mouse sequence data can be found in either of two public databases:
- The Trace Archive, a newly-established public database operated by the U.S. National Center for Biotechnology Information, which can be found at www.ncbi.nlm.nih.gov/Traces/trace.cgi.
- The Ensembl Trace Server, which can be found at trace.ensembl.org. Ensembl is a joint project between The European Bioinformatics Institute (EBI), an arm of the European Molecular Biology Laboratory, and the Sanger Centre to develop a software system that automatically annotates genomes.
Solving Research Problems
The mouse data is already finding multiple uses in research. For example, Merck & Co., Inc., of Whitehouse Station, NJ, has used the newly available MSC data to find the mouse equivalent of a human gene that may be related to schizophrenia. Previous work by the company had identified a human gene, located at a chromosomal break point, but Merck scientists had been unable to find the mouse equivalent. As the mouse sequence became available, Merck researchers found a match that helped them locate the mouse gene. In turn, the discovery will help the company develop a mouse model to study further the gene's association to this devastating mental disorder.
This is but one example of the potential power of using the mouse to advance the understanding of human biology and disease. Identifying disease-related genes in the mouse should make it simpler to develop and test new treatments in ways that cannot easily be done in people.
About the Mouse Sequencing Consortium
Co-chaired by Holden and NHGRI's Collins, the Mouse Sequencing Consortium was a six-month, $58 million program to produce a draft sequence of the mouse genome.
Members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.
The MSC funding principally supported work at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.
For more information:
Contact for the Consortium:
Phone: (312) 397-6604
Last Reviewed: May 16, 2010