NHGRI logo

Science Papers on the Genome Sequence of Fruitfly (Drosophila Melanogaster)

March 2000

BETHESDA, Md. - A consortium of public scientists working together with a private company has released a substantially complete genome sequence of the fruitfly, Drosophila melanogaster.

"This is a remarkable achievement," said Dr. Francis Collins, director of the National Institutes of Health's (NIH) National Human Genome Research Institute (NHGRI), about the sequencing, or decoding, of the genetic makeup of the fruitfly. "It's a phenomenal milestone because of the fruitfly's pivotal role in research, ranging from aging and cancer to learning and memory."

Deciphering the genetic makeup, or genome, of the fruitfly, was a combined effort by Dr. Gerry Rubin of the University of California at Berkeley and Lawrence Berkeley National Laboratory, with Dr. Craig Venter and colleagues at Celera Genomics, Dr. Richard Gibbs and colleagues at Baylor College of Medicine in Houston, as well as laboratories in other countries. The two U.S. university labs were funded by grants from NHGRI.

"By working together, these scientists were able to reach this stage of the sequencing effort earlier and at lower cost than had been anticipated," Dr. Collins added. "For that the whole scientific community is grateful."

"The genome sequence will greatly accelerate the progress of the 5,000 scientists for whom Drosophila is already a major research tool," said Dr. Rubin. "It will also provide an easy access point to Drosophila for those who now work with other systems. Powerful tools for determining gene function, not available in more complex animals such as mouse and human, have been developed in model organisms like Drosophila. The fact that the majority of genes known to cause human disease have well-conserved counterparts in the fly argues that the information uncovered in Drosophila will have direct relevance to human health."

Adhering to a core principle of the Human Genome Project (HGP), the research collaborators sequencing the fruitfly genome have deposited their data into the public database (www.ncbi.nlm.nih.gov/Genbank) so that all scientists, in industry as well as academia, can use it at no cost and without restrictions.

The Science papers represent the outcome of a carefully designed collaboration that was announced in January 1999 (view announcement). Celera's willingness to deposit the sequence data in GenBank, which was agreed to in the original Memorandum of Understanding between Drs. Rubin and Venter (www.fruitfly.org/), was critical to the ability of the public and private groups to collaborate.

Already in GenBank are the essentially complete genome sequences of three other major research model organisms, the E. coli bacterium, yeast and the nematode worm.

The collaborators on the fruitfly genome project used different but very complementary strategies for sequencing the insect's genetic code, which consists of 165 million base pairs or chemical units. The Berkeley and Baylor groups contributed about 25 percent of the complete sequence in addition to very detailed genetic maps and a mapped scaffold of partial sequence. Celera contributed about 3 million random "whole genome shotgun" sequences and the computational expertise to assemble the results. The random sequences were assembled by computer both on their own and in combination with the Berkeley and Baylor groups' data. The results were verified by referring to the 25 percent of the sequence that already was completed. The combined approach yielded the sequence of about 120 million base pairs of the fly genome including the portion of the genome containing the vast majority of genes. The public and private groups also collaborated on the initial interpretation, or "annotation," of the fly sequence, identifying the location of the genes within the sequence.

Celera's work on the fly genome shows that the "whole shotgun" approach can be successfully applied to such an organism, said Dr. Collins who added that Celera scientist Dr. Gene Myers "deserves a lot of credit for developing software to deal with the massive amount of data" resulting from the whole genome shotgun sequencing strategy.

The Berkeley and the Baylor labs will spend next year refining and finishing the fly sequence. That primarily will involve closing the roughly 1,600 gaps remaining in the sequence.

Model organisms such as the fruitfly are crucial in identifying the function of human genes. For example, scientists previously have shown that a group of genes that act together to direct the formation of the early fly embryo are closely related to the same genes that can contribute to skin and colon cancer in humans. Three Nobel Prizes in medicine/physiology have been awarded for research using the fruitfly in studies of human development and the influence of genes on diseases. Although humans have 25 times more DNA than does the fruitfly and decoding the human genome is therefore much more complex than sequencing the fruitfly's, the majority of the insect's genes and their protein products are "conserved" - that is, represented by homologous genes - in the human genome. Thus, the genes in fruitflies are responsible for many of the same basic processes, like laying out a complex body plan, complete with a central nervous system. As a result, fruitflies enable scientists to conduct genetic studies impractical or too complicated or expensive to be performed with humans.

During the last century, fruitflies have yielded a wealth of information about how genes work. They have been used to discover the rules of inheritance and to study how a single cell, the fertilized egg, becomes a whole animal. In behavioral sciences research, they have advanced our knowledge of learning and memory and revealed much of what we know about circadian rhythms in humans.

NHGRI last fall launched its program to sequence the genome of the mouse, the first non-human mammal whose genetic instructions will be deciphered. In sequencing the mouse genome, scientists again will use the two complimentary approaches. The mouse, like the human, has about 3 billion base pairs.

Top of page

Last updated: September 01, 2006