Mouse Genome Data Available in Public Databases
Coverage Now Exceeds Two-Thirds of Total Sequence Freely Available; Mouse Data Aids Study of Human Genes
BETHESDA, Md. - A public-private effort to accelerate the sequencing of the mouse genome has exceeded its own goal of achieving 66 percent coverage of the genome just three months into the six-month project. At its current pace, the Mouse Sequencing Consortium (MSC) expects to reach its target of three-fold coverage by April of this year.
At the same time, collaborators in the MSC have extended the practice of making sequence data available for the free and unrestricted use of researchers worldwide. A new repository that contains not only the letters of the DNA sequence (as has been customary for previous large-scale sequencing projects), but also raw data, including actual "traces" from sequencing machines, has been established to make the information rapidly and freely available to the scientific community.
The Mouse Sequencing Consortium (MSC) - comprising three private companies, six institutes of the National Institutes of Health and the Wellcome Trust - was formed in October 2000 to work collaboratively to produce a draft sequence of the mouse genome in six months. The availability of these data is considered essential to the further understanding of the human genome.
"Unrestricted access to the mouse sequence should enhance efforts to identify causative genes in mouse models of diseases as well as identify human genes responsible for various disorders," says Arthur Holden, chairman of the MSC. "The rapid progress toward making these data widely available will in turn speed the search for new ways to treat or even prevent disease."
The MSC approach to sequencing the mouse genome takes advantage of the best features of the map-based shotgun and the whole genome shotgun strategies. Sequence data generated by the MSC are in short fragments (500 to 700 base pairs), and these so-called "raw reads" are now deposited weekly in new data repositories. The quality of the deposited data has been checked and found to be very good.
Sequences, quality scores, and traces from sequencing machines are accessible in databases maintained by the National Center for Biotechnology Information (NCBI), and the European Bioinformatics Institute (EBI) in a joint project with the Sanger Centre called Ensembl [ensembl.org]. At present, approximately 6.4 million traces from the MSC's whole genome shotgun sequencing effort have been deposited into the archives.
Researchers now have the opportunity, for example, to compare a sequence of interest against the available mouse traces in the archives using software programs such as "megaBLAST" [ncbi.nlm.nih.gov] and "SSAHA" [sanger.ac.uk]. Matching mouse traces can then be downloaded for further analysis.
Additionally, the EBI-Sanger Ensembl database provides direct views of homologies between the mouse traces and the human genome, which should facilitate interpretation of the human code (for example, mouse sequence matches to the human cystic fibrosis gene).
The draft sequence, when completed in April, will bring the amount of mouse sequence available to about 93 to 95 percent - albeit in small, unordered fragments. The National Human Genome Research Institute (NHGRI) will go on to complete the highly accurate, "finished" sequence of the mouse genome.
Why sequence the mouse genome?
With the working draft of the human genome sequence in hand, scientists in both industry and academia now seek to interpret its meaning.
Not only is the genome of the mouse about the same size as that of the human (approximately 3.1 billion base pairs), mice and humans share virtually the same set of genes. Thus, the DNA sequence of the mouse genome is an essential tool to identify and study the function of human genes.
For example, the gene sequences in mice and humans that encode proteins to carry out important biological functions - such as regulation of cell division, and development of major organ systems - are shared to a high degree (85 percent sequence identity). Thus, by comparing human and mouse genome sequences, the regions of high similarity are readily apparent and immediately identify protein coding regions and regulatory sequences.
In addition to its use to aid the interpretation of the human genome, the mouse genome sequence will increase the ability of scientists to use the mouse as a model system to study and understand human disease, and to develop and test new treatments in ways that can not easily be done with humans.
As recommended by scientists studying the mouse, the MSC effort is using a strain of mouse known as C57BL/6J, commonly called "Black 6."
About the Mouse Sequencing Consortium
The MSC is another example of an emerging model for supporting large-scale genomics research in which public and private sector entities join forces to produce publicly available data sets that are crucial for basic biomedical research.
The National Institutes of Health, the Wellcome Trust and three private companies formed the consortium to speed up the determination of the DNA sequence of the mouse genome. The MSC is co-chaired by Arthur Holden [Chairman and CEO, The SNP Consortium Ltd.] and Francis Collins, MD, PhD [Director, NHGRI]. The members of the Mouse Sequencing Consortium are GlaxoSmithKline, the Merck Genome Research Institute, Affymetrix, Inc., the Wellcome Trust, and six of the National Institutes, including the National Cancer Institute, the National Human Genome Research Institute, the National Institute on Deafness and Other Communication Disorders, the National Institute of Diabetes and Digestive and Kidney Disease, the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health. Private sector participation in the MSC has been facilitated by the Foundation for the National Institutes of Health, Inc., a non-profit, charitable organization founded to support the NIH in its mission.
MSC funds are supporting mouse genome sequencing at three DNA sequencing laboratories: the Whitehead Institute for Biomedical Research in Cambridge, Mass., Washington University School of Medicine in St. Louis, and the Sanger Centre in the U.K.
Contact for the Consortium:
Phone: (312) 397-6604
Mouse Sequencing Consortium Members Media Contacts:
Graeme P. Holland
Phone: (610) 270-5546
Merck Genome Research Institute
Andrea F. Kollath, DVM
Phone: (908) 423-6492
Phone: (408) 731-5925
National Cancer Institute
NCI Press Office
Phone: (301) 496-6641
National Human Genome Research Institute
Kathy Hudson, Ph.D.
Phone: (301) 402-0955
National Institute on Deafness and other Communication Disorders
Phone: (301) 496-7243
National Institute of Diabetes and Digestive and Kidney Diseases
Phone: (301) 496-3583
National Institute of Mental Health
Phone: (301) 443-4536
National Institute of Neurological Disorders and Stroke
Phone: (301) 496-5751
Genome Sequencing Center Media Contacts:
Whitehead Institute for Biomedical Research
Phone: (617) 258-6153
Washington University School of Medicine
Phone: (314) 286-0120
Foundation for the National Institutes of Health, Inc.
Constance U. Battle, MD
Phone: (301) 402-5311
Phone: (847) 317-9230