NHGRI logo

Mouse Sequencing Consortium Completes Program To Accelerate Availability of Mouse Genome Data

Publicly Available Sequence Covers 95 Percent of Genome; Mouse Data Invaluable for Study of Human Genes and Disease

May 8, 2001

BETHESDA, Md. - The Mouse Sequencing Consortium (MSC), an international public-private effort to accelerate the sequencing of the mouse genome, announced today it has achieved its goal to generate three-fold coverage of the mouse DNA sequence. These data - produced over a six-month period - represent at least 95 percent of the full complement of mouse DNA, and are freely available for the unrestricted use of researchers worldwide.

"The ability of the Mouse Sequencing Consortium to deliver on time and on budget against pre-specified targets underscores the value of this new model of public-private collaboration" says Arthur Holden, co-chairman of the MSC. "The success of the MSC and other public-private research consortia no doubt will lead to future cooperative efforts to solve big problems quickly, especially when the resulting data belong in the public domain."

The MSC - comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust - was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome. The availability of these data is considered essential to the further understanding of the human genome. Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.

Comparing Genomes

All mammals - including humans and mice - share a common ancestor that lived some 80 ? 100 million years ago, and as a result, their genomes are similar. The most important functional regions of the genome - the genes that contain the information to make proteins - have changed relatively little over the millennia since any change that would interfere with essential functioning would not be passed on to the next generation. Some genes shared by mice and men have changed so little that they remain 90 percent or more similar; whereas others have changed more and are now only 60 percent similar. The non-coding regions of the genome for mice and humans are much less similar. By comparing the mouse genome sequence to the human genome sequence, the regions of similarity can be recognized readily. Using computers to do the comparisons, scientists are rapidly able to find these regions of similarity, many of which contain previously unrecognized genes.

In addition to highlighting the coding sequences of genes, comparing mouse DNA and human DNA will help identify other functionally important genetic features of the human genome that have been conserved, such as regulatory regions of DNA that turn gene expression on and off.

"This is a great day for finding genes in the human," says Francis S. Collins, M.D., Ph.D., director of the National Human Genome Research Institute (NHGRI). "Comparing mouse sequence to human sequence will help identify previously unknown human genes, in essence using evolution's "lab notebook" to understand how the genome works. Now we need to finish the work so the mouse sequence is as accurate and complete as the human sequence."

The Mouse Sequencing Consortium used a whole genome shotgun sequencing approach to generate the initial coverage quickly. Since the shotgun approach produces random bits of sequence, scientists analyze each base several times over to ensure that nearly all of the bases in the genome are sampled. The MSC program sampled each base an average of three times, bringing the amount of mouse sequence available to about 95 percent of the total - albeit in small, unordered fragments.

The NHGRI, the Wellcome Trust, and the participating sequencing centers will continue work to provide greater depth of coverage. The next phase will utilize larger stretches of DNA of known map position, and will assemble the fragmentary pieces of sequence into the finished, highly accurate sequence of the mouse genome.

Rapid Access To Raw Data

To speed access to the data, the mouse sequence results have been continually released, as is the practice of the Human Genome Project's human sequencing effort. New mouse data have been posted on a weekly basis in a novel type of database that was established to make the individual (raw) sequence reads publicly available. To date, the MSC has deposited more than 15 million individual unique sequence traces. The mouse sequence data can be found in either of two public databases:

  • The Trace Archive, a newly-established public database operated by the U.S. National Center for Biotechnology Information, which can be found at www.ncbi.nlm.nih.gov/Traces/trace.cgi.

  • The Ensembl Trace Server, which can be found at trace.ensembl.org. Ensembl is a joint project between The European Bioinformatics Institute (EBI), an arm of the European Molecular Biology Laboratory, and the Sanger Centre to develop a software system that automatically annotates genomes.

Solving Research Problems

The mouse data is already finding multiple uses in research. For example, Merck & Co., Inc., of Whitehouse Station, NJ, has used the newly available MSC data to find the mouse equivalent of a human gene that may be related to schizophrenia. Previous work by the company had identified a human gene, located at a chromosomal break point, but Merck scientists had been unable to find the mouse equivalent. As the mouse sequence became available, Merck researchers found a match that helped them locate the mouse gene. In turn, the discovery will help the company develop a mouse model to study further the gene's association to this devastating mental disorder.

This is but one example of the potential power of using the mouse to advance the understanding of human biology and disease. Identifying disease-related genes in the mouse should make it simpler to develop and test new treatments in ways that cannot easily be done in people.

About the Mouse Sequencing Consortium

Co-chaired by Holden and NHGRI's Collins, the Mouse Sequencing Consortium was a six-month, $58 million program to produce a draft sequence of the mouse genome.

Members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.

The MSC funding principally supported work at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.

For more information:

Mouse Sequencing Consortium Members Media Contacts
GlaxoSmithKline Graeme P. Holland
Phone: 44-12-7964-4269

Rick Koenig
Phone: (610) 270-5546
Merck Genome Research Institute Andrea F. Kollath, DVM
Phone: 908-423-6492
Affymetrix, Inc. Anne Bowdidge
Phone: (408) 731-5925
National Cancer Institute NCI Press Office
Phone: (301) 496-6641
National Human Genome Research Institute Larry Thompson
Phone: (301) 402-0911
National Institute on Deafness and Other Communication Disorders Marin Allen
Phone: (301) 496-7243
National Institute of Diabetes and Digestive and Kidney Diseases Joan Chamberlain
Phone: (301) 496-3583
National Institute of Mental Health Marilyn Weeks
Phone: (301) 443-4536
National Institute of Neurological Disorders and Stroke Margo Warren
Phone: (301) 496-5751
Wellcome Trust

Noorece Ahmed
Phone: 44-20-7611-8540

Genome Sequencing Centers Media Contacts
Whitehead Institute for Biomedical Research Seema Kumar
Phone: (617) 258-6153
Washington University School of Medicine Joni Westerhouse
Phone: (314) 286-0120
Sanger Centre Don Powell
Phone: 44-12-2349-4956
Foundation for the National Institutes of Health, Inc. Constance U. Battle, MD
Phone: (301) 402-5311
Other Contacts: Arthur Holden
Phone: (847) 317-9230

Contact for the Consortium:

Mary Prescott
Phone: (312) 397-6604
E-mail: mprescott@bsmg.com

Last updated: May 16, 2010