Applied math and physics, signal and image processing, computer science and statistics: these are a few areas of Dr. Mullikin's educational background that have enabled him to tackle a diverse range of projects, from measuring cardiac output from saline dilution curves (Voorhees et al., Med Instrum 1985) to measuring the surface area of objects detected in 3-D images (Mullikin and Verbeek, Bioimaging 1993), and from automated tracking of sample lanes on ABI377 gel images to assembly of whole mammalian genomes from Sanger sequence data. His primary interests are in developing novel and efficient algorithms using high-performance compute clusters to reduce large data sets into meaningful results.
He first started working in the field of genomics at the Sanger Center in 1997. Over five years there, Dr. Mullikin's research group improved the restriction digest fragment analysis package called "Image", custom-modified 60 ABI377 sequencing machines to scan 96 lanes, developed a blazingly fast sequence aligner called SSAHA and its genomic sequence variation detection version called SSAHA-SNP (Ning et al., Genome Res 2001), and developed a whole-genome assembly algorithm called PHUSION (Mullikin and Ning, Genome Res 2003). Dr. Mullikin was also the head of the production software group, which developed the laboratory information management system for tracking DNA sequence samples, barcode driven sample loading, automated data transfer off the sequencing machines to the central computer cluster, and automated assembly of the sequence data for the BAC clone-based sequencing of the C. elegans, human, mouse and zebrafish genomes. Since joining NHGRI in 2003, he continued his involvement in large-scale collaborative projects (The International HapMap Consortium, Nature 2005 and 2007; ENCODE Project Consortium, Nature 2005).
At NHGRI, Dr. Mullikin's research group developed a medical sequencing analysis pipeline, which at its peak in 2008 processed over three million Sanger-based sequence reads across tens of projects (Lagresle-Peyrou et al., Nat Genet 2009; Biesecker et al., Genome Res 2009; Kang et al. N Engl J Med 2010; McLaughlin et al., Am J Hum Genet 2010; Davis et al., Nat Genet 2011; Bell et al, PLoS Genet 2011). Now, with Illumina sequencing machines, his research group has developed secondary analysis software for variant detection; variant annotation (e.g., coding sequence changes resulting in synonymous or non-sysnonymous amino-acid changes, splice-site, etc); and has predicted effects of those changes. They have developed a specialized java user-interface, called VarSifter, for reviewing, sorting and filtering these annotated variants (Teer et al., Bioinformatics 2011). This secondary analysis pipeline is now integrated into NISC production operations for high-volume processing of whole-exome sequence data sets.
Other research in his group includes development of targeted capture methods (Teer et al., Genome Res 2010), RNA sequence analysis, and whole-genome sequence assembly (Pontius et al., Genome Res 2007; Mullikin et al., BMC Genomics 2010;Young et al. Genome Res 2010; Ryan et al., Science 2013). He has also collaborated with investigators outside of NIH on projects, such as the Neanderthal Genome Project with Svante Paabo, where his contribution to the analyses indicated that out-of-Africa modern human populations show ~3% admixture with Neanderthals (Green et al., Science 2010; Prüfer et al., Nature 2014). In collaboration with Stephan Schuster, he de novo assembled the genome of a Kalahari Bushman from 454 sequence data (Schuster et al., Nature 2010). Dr. Mullikin's research group is also involved at the advisory stage as projects come to NISC from numerous collaborators, and they actively participate in other large projects, like ClinSeq, the Undiagnosed Diseases Program, and the autism sequencing project.
Since becoming the acting director of NISC in December 2009, then director in September 2011, the interactions between his Comparative Genomics Analysis Unit and NISC complemented each other. As NISC's (acting) director, he effectively managed this large center during a particularly challenging time of rapid technology change, which has now seen next-generation sequencing throughput increase 20-fold. This growth has complemented Comparative Genomics Analysis Unit research directions, which continue to focus on the algorithmic reduction of large datasets into meaningful results.
Presently, Dr. Mullikin's research group is developing analysis methods for a number of collaborative projects. Together with Julie Segre, Ph.D., NHGRI, the group is developing whole-genome assembly methods for bacterial genomes. Unique genome peculiarities and a continually changing backdrop of sequencing technologies and methodologies present new assembly challenges with bacterial genomes.
In collaboration with Paul Liu, Ph.D., NHGRI, his group is looking at a whole-genome view of DNA sequence variations in human induced pluripotent stem cells generated with non-integrating plasmid vectors. Here they were able to detect somatic mutations locked into the stem cell lines at a rate of about one mutation per two-million bases, or around 1,300 mutations genome-wide.
With Yardena Samuels, Ph.D., formerly with NHGRI and currently at the Weizmann Institute of Science, Rehovot, Israel, and Daphne Bell, Ph.D., NHGRI, his group is developing statistically robust methods for detecting mutation patterns in tumors when compared to their matched normal samples. In collaboration with Patrick Duffy, M.D., at the National Insitute of Allergy and Infectious Diseases, his group assembled the genome of Plasmodium coatneyi from a single lane of HiSeq2000 data, which provided more than 500-fold coverage.
This assembly will enable future genomics research into this important non-human primate malaria species, which is an excellent model for the biology, immunology and pathology of humans infected with P. falciparum. In collaboration with Paul Wade at the National Institute of Environmental Health Science, his group is working on detecting whole-genome methylation patterns using Illumina sequencing of bisulfite treated DNA from mouse liver tissue.
This work will add this important new analytical capability for future research on methylation patterns in the study of organismal development, cellular development and cancer progression. RNA sequence analysis is another area of rapid analytical methods development, and they have been applying these methods to discover genes involved in coronary artery disease by examining the expression profiles and splice variant differences between RNA extracted from lymphoblastoid cell lines derived from ClinSeq subjects with high and low levels of coronary artery calcification.
The list of projects like these continues to grow as advances in genomic sequence technology coupled with an expanding array of analytical methods bring researchers to NISC and to his research group to find new ways to unlock the genomic mysteries currently hidden in their valuable samples. Dr. Mullikin looks forward to continuing these collaborative efforts and entering into many exciting new efforts as the genomics field rapidly evolves.
Last Updated: April 23, 2015