NHGRI logo
Associate Investigator

Genome Informatics Section

Staff Scientist

Genome Informatics Section

Education

Ph.D, University of Maryland

Biography

Dr. Koren is an associate investigator in the Genome Informatics Section, Computational and Statistical Genomics Branch at the National Human Genome Research Institute. After completing his M.S., Dr. Koren joined the J. Craig Venter Institute (JCVI) as a bioinformatics engineer under the supervision of Dr. Granger Sutton. During his three years at JCVI, he contributed to the development of the Celera Assembler, which has been used to assemble both the Drosophila melanogaster and human genomes. In parallel, Dr. Koren worked under the supervision Dr. Mihai Pop at the University of Maryland, College Park, where he developed several tools for metagenome assembly and analysis. In 2010, Dr. Koren joined the National Biodefense Analysis and Countermeasures Center (NBACC) where he led genome assembly development and pioneered the use of single-molecule sequencing for the reconstruction of complete genomes. In 2015, Dr. Koren joined NHGRI as a founding member of the Genome Informatics Section.

Scientific Summary

Dr. Koren researches algorithms for the efficient analysis of large-scale genomic datasets with a focus on genome assembly. 

Genome assembly is the process of reconstructing a complete sequence from relatively short-range data generated by sequencing instruments, like a giant jigsaw puzzle with billions of pieces. It is sometimes considered routine due to the wide availability of short-read sequencing. However, no complex genome is truly complete. The challenge are repetitive regions within genomes which cannot be properly localized. Another limitation of current genome assembly software is that it represents every genome as one sequence. This is despite the fact that sequenced genomes are usually diploid, meaning they contain two copies of each chromosome, one from each parent. Trying to represent both copies as a single sequence results in a mosaic, as if you superimposed two images on top of each other, and leads to both data loss and errors in analysis. To mitigate these issues, the first step of an assembly project is often a laborious and year-long effort of inbreeding, to make the two copies as similar as possible. This is error-prone and often infeasible, due to long generation times (e.g. cattle and other agricultural species). 

Dr. Koren pioneered the use of newer noisy long-read data (such as generated by Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) instruments) for high-quality assembly. This work has revolutionized bacterial genome assembly, giving complete and accurate sequences, something which previously required years of manual effort and has helped optimize algorithms at all steps of the assembly process. To assemble diploid genomes, Dr. Koren led the development of an approach which uses parental genomic information to identify sequences belonging to either the maternally or paternally inherited genome in the child to generate two complete sequences for a single individual. This approach is the current state of the art for accurately reconstructing a complete diploid genome. Most recently, Dr. Koren supervised a project to update the popular Canu assembler to optimally utilize newly available highly accurate (over 99% accurate) long reads. His research showed that, through several algorithms, data accuracy can be increased even further, essentially to perfection. These breakthroughs allowed the study of previously invisible human genomic regions, corrected errors in the current reference, and led to the first truly complete human genomic sequence. Dr. Koren’s work continues to build on this success so that complete and accurate genomes become routine.

Publications

Nurk S*, Koren S*, Rhie A*, Rautiainen M*, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook J, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.

Nurk S*, Walenz BP*, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM , Koren S . HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020 Sep;30(9):1291-1305.

Miga KH*, Koren S*, Rhie A, Vollger MR, Gershman A, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020 Sep;585(7823):79-84.

Koren S*, Rhie A*, Walenz BP, Dilthey AT, Bickhart DM, et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol. 2018 Oct 22.

Jain M*, Koren S*, Miga KH*, Quick J*, Rand AC*, Sasani TA*, Tyson JR*, Beggs AD, Dilthey AT, Fiddes IT, Malla S, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018 Apr;36(4):338-345.

Koren S*, Walenz BP*, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017 May;27(5):722-736. doi: 10.1101/gr.215087.116.

Berlin K*, Koren S*, Chin CS, Drake JP, Landolin JM, et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015 Jun;33(6):623-30.

Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM, et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 2013;14(9):R101.

* denotes co-first or co-corresponding

Last updated: June 21, 2023