Dr. Mullikin develops and utilizes computer programs to analyze large data sets generated by systematic DNA sequencing projects. A highly skilled computational geneticist, he collaborates extensively with biomedical researchers, analyzing data produced by others or that are available in public databases.
His main research focus involves the development of algorithms for performing complex computations. One such program, called Sequence Search and Alignment by Hashing Algorithm (SSAHA), is used to dramatically accelerate the speed at which gigabases of DNA sequence are searched for single-nucleotide poymorphisms (SNPs). Even though this program was first developed several years ago, Dr. Mullikin continually refines SSAHA in response to the changing needs of genomic scientists, and SSAHA remains the key tool that he and others use to detect sequence variants. He also developed a program called Phusion (pronounced "fusion"), which is used to assemble genome sequences from whole-genome shotgun data. Both the mouse and nematode genome sequences were assembled using Phusion.
Dr. Mullikin's group provides computational support for major NHGRI efforts such as the International Haplotype Map (HapMap) Project, which is primarily focused on determining genes and genetic variants that affect health and disease susceptibility. During the initial phase of this project, investigators produced a working haplotype map, consisting of ~600,000 polymorphic sites spaced an average of ~5 kilobase pairs apart. With the second phase of the project completed, investigators can now access a map of human variation in three populations, which contains over three million polymorphic sites across the human genome. Indeed, this landmark project has provided the foundation for the rapid completion of a large number of genome-wide association studies (GWAS; a list of published GWAS studies is available at www.genome.gov/gwastudies.
Dr. Mullikin's group also provides critical computational support and guidance for a large-scale medical sequencing (LSMS) program based at the NIH Intramural Sequencing Center (NISC). Dr. Mullikin works with collaborating investigators to generate preliminary feasibility assessments for their projects by evaluating the genomic regions that they wish to target, whether it be a specific list of genes or entire genomic intervals. He then develops an initial design of PCR assays across the regions of interest. If a project is deemed feasible, it is then entered into the NISC sequencing pipeline which, in the end, produces a large number of DNA sequence reads. The reads are then automatically analyzed for the presence of genetic variants.
Dr. Mullikin's group is currently preparing for a flood of new data that will be produced by "next-generation" DNA sequencers. These new sequencing machines, which utilize novel approaches significantly different from traditional "Sanger-based" instruments, are capable of exponentially higher throughput than previously possible. Some medical sequencing projects will be adapted to capitalize on the strengths of these new sequencing platforms, and many more projects will become feasible as sequencing costs decrease. In order to utilize these new instruments most effectively, Dr. Mullikin's group is testing methods for genomic enrichment that can be used in purifying specific regions of the genome prior to sequencing. His group is also developing new analytical methods to accurately detect genetic variants using data generated by these next-generation instruments.
Top of page
Last Updated: May 18, 2014