Skip to main content
Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted.

The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit

Updates regarding government operating status and resumption of normal operations can be found at

DNA Sequences: Assembly Required

Genome Assembly Challenge Taps Wisdom of the Crowd

The Assemblathon banner
Leaders of the Genome 10K project, an effort to collect and sequence the DNA of 10,000 vertebrate species from fish to primates, have tapped into the relatively recent and innovative process of "crowdsourcing" to improve how all those eventual sequenced genomes are assembled for analysis. Selective crowdsourcing — or "smartsourcing" — in this case refers to the practice of giving a task to a group of qualified scientists instead of having a single person or organization try to solve it alone.

Last week, the initial results of "The Assemblathon," a crowdsourced research genome assembly challenge associated with the Genome 10K project, were presented at the Biology of Genomes meeting at Cold Spring Harbor Laboratory in New York. Seventeen teams from seven countries used their own computer software programs — called "genome assemblers" — to assemble the same genome. The challenge was organized by the University of California at Davis' Genome Center, in collaboration with the laboratory of David Haussler, Ph.D., at the University of California at Santa Cruz.

To determine the DNA sequence of a genome, scientists start with a laboratory technique called shotgun sequencing, which randomly breaks up a DNA molecule into numerous overlapping smaller segments that can be sequenced individually by next generation DNA sequencing machines. The output of this process is millions of sequenced DNA fragments that must be reassembled in the correct order to accurately represent the genome for scientists and researchers to analyze.

But, unlike Humpty Dumpty, the nursery rhyme character that couldn't be put back together again, scientists are able to reassemble genomes using innovative computational methods.

While there were no official winners of Assemblathon, the organizers did name three genome assemblers, in no particular order, as top of the class: BGI's (formerly the Beijing Genomics Institute) SOAPdenovo; The Broad Institute's ALLPATHS-LG; and the Sanger Institute's "string graph assembler." Organizers plan to publish the results in a peer-reviewed journal in the coming months. According to Ian Korf, Ph.D., one of the referees for Assemblathon and associate director of bioinformatics at the University of California, Davis Genome Center, there could have been 11 different top assemblers, depending on the metrics on which each assembly was evaluated.

"We didn't state clearly how we would judge performance, mostly because this is a research [effort]," said Dr. Korf. "In the future, we will definitely give some kind of official recognition and, hopefully, prizes." He suggested that perhaps a DNA sequencing machine vendor might donate a prize, such as an iPad 2, for example.

The genome for the first Assemblathon was made up of simulated sequenced reads from a 'virtual' genome. By starting with a complete genome that was generated by a computer, the organizers were certain of the final assembly solution.

However, the assemblers wanted to work with real genomes for Assemblathon 2, which will occur this year from June 1, when the data can be downloaded by teams, to September 1 when the data must be submitted for evaluation. The genome assemblers want the additional challenge of assembling real genomes "to make a real contribution to genomic biology."

The three genomes selected for assembly are from the initial 101 species that Genome 10K plans to sequence in the next 2 years. They include a cichlid species of fish, sequenced using Illumina technology; the red-tailed boa snake, sequenced using Illumina technology; and the colorful parrot, a bird sequenced using the 454 and Illumina technology platforms. The results of Assemblathon 2 will be presented in November at the genome informatics meeting at Cold Spring Harbor Laboratory.

"I think one of the most useful aspects of the Assemblathon was getting together such a large group of genome assemblers," said Dr. Korf. "They're all extremely clever and ... I think they are learning a lot from each other. Going forward, friendly competitions like Assemblathon are very useful for improving the state of the art."

Top of page

Last Reviewed: November 14, 2012

On Other Sites:

Genome 10K Project

The Assemblathon