NHGRI logo
Senior Investigator

Computational and Statistical Genomics Branch

Head

Genome Informatics Section

Education

B.S. Loyola University Maryland, 2002

M.S. University of Maryland, College Park, 2009

Ph.D. University of Maryland, College Park, 2010

Biography

Dr. Phillippy is head of the Genome Informatics Section and a senior investigator in the Computational and Statistical Genomics Branch at NHGRI. He is a bioinformatician who bridges the fields of computer science and genomics, and his lab has developed numerous widely used tools for the problems of genome assembly, alignment, clustering, forensics and metagenomics.

Early in his career, Dr. Phillippy developed some of the first sequence alignment and variant calling methods able to compare whole genomes. These methods were integral to the FBI’s investigation of the 2001 anthrax attacks and demonstrated the potential of whole-genome sequencing for outbreak tracing and forensics. After completing his Ph.D., he pioneered the use of single-molecule sequencing for the assembly of complete genomes and min-hashing for the comparison of large genomic datasets. He joined NHGRI in 2015 and co-founded the Telomere-to-Telomere (T2T) consortium with the goal of finishing the human reference genome. Under his leadership, the consortium successfully completed this project 2021, revealing approximately 200 million bases of newly mapped human genomic sequence.

Dr. Phillippy received a B.S. in computer science from Loyola University Maryland in 2002, where he was advised by Dr. Arthur Delcher. He first worked as a bioinformatics engineer at The Institute for Genomic Research (TIGR) with Dr. Mihai Pop, and later received a Ph.D. in computer science from the University of Maryland in 2010 with Dr. Steven Salzberg. After graduate school he led a bioinformatics group at the National Bioforensics Analysis Center before joining NHGRI in 2015. In 2019, he was awarded tenure by the NIH and received the U.S. Presidential Early Career Award for Scientists and Engineers.

  • Biography

    Dr. Phillippy is head of the Genome Informatics Section and a senior investigator in the Computational and Statistical Genomics Branch at NHGRI. He is a bioinformatician who bridges the fields of computer science and genomics, and his lab has developed numerous widely used tools for the problems of genome assembly, alignment, clustering, forensics and metagenomics.

    Early in his career, Dr. Phillippy developed some of the first sequence alignment and variant calling methods able to compare whole genomes. These methods were integral to the FBI’s investigation of the 2001 anthrax attacks and demonstrated the potential of whole-genome sequencing for outbreak tracing and forensics. After completing his Ph.D., he pioneered the use of single-molecule sequencing for the assembly of complete genomes and min-hashing for the comparison of large genomic datasets. He joined NHGRI in 2015 and co-founded the Telomere-to-Telomere (T2T) consortium with the goal of finishing the human reference genome. Under his leadership, the consortium successfully completed this project 2021, revealing approximately 200 million bases of newly mapped human genomic sequence.

    Dr. Phillippy received a B.S. in computer science from Loyola University Maryland in 2002, where he was advised by Dr. Arthur Delcher. He first worked as a bioinformatics engineer at The Institute for Genomic Research (TIGR) with Dr. Mihai Pop, and later received a Ph.D. in computer science from the University of Maryland in 2010 with Dr. Steven Salzberg. After graduate school he led a bioinformatics group at the National Bioforensics Analysis Center before joining NHGRI in 2015. In 2019, he was awarded tenure by the NIH and received the U.S. Presidential Early Career Award for Scientists and Engineers.

Scientific Summary

The Genome Informatics Section develops and applies computational methods for the analysis of massive genomics datasets, focusing on the challenges of genome sequencing and comparative genomics.

As one example, high-quality reference genomes form a fundamental basis of all genomics research, but the construction of these references, known as genome sequencing and assembly, is a difficult process that can leave numerous gaps and errors that affect the accuracy of all downstream analyses. The section aims to improve such foundational processes and translate emerging genomic technologies into practice.

Bioinformatics has a long history of bridging fields, and the incredible advances in genomics have been inextricably linked to similar advances in algorithms and computing. Recognizing this, the section specializes in the development of methods for new genomic technologies and seeks to foster open and interdisciplinary collaboration between the computational, biological and medical sciences for the advancement of global health. Members of the section are at the forefront of these fields and have made important contributions to the problems of genome assembly, whole-genome alignment, read mapping, variant detection, information visualization, microbial forensics and metagenomics.

Another major focus of the section is the development of genomic resources such as the human reference genome. From its initial release in 2000 to further updates through 2020, the human reference genome covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing this remaining 8% of the genome, the section assembled the first truly complete sequence of a human genome in 2021, unlocking all regions of the genome to variational and functional studies for the first time. However, this complete genome represents only a single haplotype and does not capture the full diversity of the human genome. This limitation can affect the accuracy of all genomic analyses, especially for underrepresented variants and populations. For the benefits of personalized genomics to be inclusive of everyone, the section is working towards extending the human reference to a pangenome that is more representative of human genomic variation.

Lastly, the section’s continued development of sequencing and assembly technologies has resulted in a flood of newly complete genomes from across the tree of life. The section is now using these whole genomes to study the functional, comparative and population genomics of multiple species at unprecedented resolution. Even for a well-studied species, the de novo assembly of multiple individuals can reveal complex structural variation missed by prior re-sequencing approaches. Characterizing this variation is important for fully understanding the function and evolutionary history of life’s genomic code.

  • Scientific Summary

    The Genome Informatics Section develops and applies computational methods for the analysis of massive genomics datasets, focusing on the challenges of genome sequencing and comparative genomics.

    As one example, high-quality reference genomes form a fundamental basis of all genomics research, but the construction of these references, known as genome sequencing and assembly, is a difficult process that can leave numerous gaps and errors that affect the accuracy of all downstream analyses. The section aims to improve such foundational processes and translate emerging genomic technologies into practice.

    Bioinformatics has a long history of bridging fields, and the incredible advances in genomics have been inextricably linked to similar advances in algorithms and computing. Recognizing this, the section specializes in the development of methods for new genomic technologies and seeks to foster open and interdisciplinary collaboration between the computational, biological and medical sciences for the advancement of global health. Members of the section are at the forefront of these fields and have made important contributions to the problems of genome assembly, whole-genome alignment, read mapping, variant detection, information visualization, microbial forensics and metagenomics.

    Another major focus of the section is the development of genomic resources such as the human reference genome. From its initial release in 2000 to further updates through 2020, the human reference genome covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing this remaining 8% of the genome, the section assembled the first truly complete sequence of a human genome in 2021, unlocking all regions of the genome to variational and functional studies for the first time. However, this complete genome represents only a single haplotype and does not capture the full diversity of the human genome. This limitation can affect the accuracy of all genomic analyses, especially for underrepresented variants and populations. For the benefits of personalized genomics to be inclusive of everyone, the section is working towards extending the human reference to a pangenome that is more representative of human genomic variation.

    Lastly, the section’s continued development of sequencing and assembly technologies has resulted in a flood of newly complete genomes from across the tree of life. The section is now using these whole genomes to study the functional, comparative and population genomics of multiple species at unprecedented resolution. Even for a well-studied species, the de novo assembly of multiple individuals can reveal complex structural variation missed by prior re-sequencing approaches. Characterizing this variation is important for fully understanding the function and evolutionary history of life’s genomic code.

Publications

Rhie A, McCarthy SA, Fedrigo O, et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Miga KH, Koren S, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020 Sep;585(7823):79-84. doi: 10.1038/s41586-020-2547-7. Epub 2020 Jul 14.

Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116. Epub 2017 Mar 15.

Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016 Jun 20;17(1):132.

Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12.

Genome Informatics Section Staff

Sergey Koren
Sergey Koren, Ph.D.
  • Associate Investigator
  • Computational and Statistical Genomics Branch
Arang Rhie
Arang Rhie, Ph.D.
  • Staff Scientist
  • Genome Informatics Section
Generic Profile Photo
Brian P. Walenz, M.S.
  • Bioinformatics Engineer
  • Genome Informatics Section
Sergey Nurk
Sergey Nurk, Ph.D.
  • Postdoctoral Fellow
  • Genome Informatics Section
Generic Profile Photo
Mikko Rautiainen, Ph.D.
  • Postdoctoral Fellow
  • Genome Informatics Section
Generic Profile Photo
Ann McCartney, Ph.D.
  • Postdoctoral Fellow
  • Genome Informatics Section
Brandon Pickett
Brandon Pickett, Ph.D.
  • Postdoctoral Fellow
  • Genome Informatics Section
Generic Profile Photo
Alex Sweeten, M.S.
  • Graduate Student
  • Genome Informatics Section

Last updated: April 19, 2022