NHGRI logo
Program Director

Division of Genomic Medicine


M.P.H. University of California, Berkeley, 1987

Ph.D. University of Wisconsin, Madison, 1993


Dr. Heidi Sofia joined NHGRI in 2010 as part of the team responsible for The Cancer Genome Atlas (TCGA) initiative in cancer genomics, jointly managed by NHGRI and the National Cancer Institute (NCI). TCGA was instrumental in setting the foundation for understanding cancer as a genomic disease and driving innovation in large-scale genomics, analysis tools, and data sharing in building a community resource.

Dr. Sofia manages a genomics, data science, and informatics portfolio, which includes research grants and small business awards, and participates in NIH data science initiatives such as BD2K and BISTI. Her interests include enabling technologies for challenging problems in genomic biology and health such as secure data sharing with privacy, cloud computing at scale, semantic models for data integration, and graph and network models and representations. She engages in community efforts to develop next-generation standards such as in the Global Alliance for Genomics and Health (GA4GH).

Prior to joining NHGRI, Dr. Sofia worked at the Pacific Northwest National Laboratory (PNNL), one of the Department of Energy (DOE) science labs, on projects in genomic analysis, information visualization and novel toolkit development. She contributed to research in cyberinfrastructure, advanced analytics, information management, data-intensive and high-performance computing, in teams interested in the novel features of driving biological problems.

Dr. Sofia received a B.A. in Biochemistry and a Masters in Public Health from University of California at Berkeley, a Ph.D. from the University of Wisconsin, Madison, and was a National Center for Biotechnology Information (NCBI) GenBank Fellow.


Ellrott K, Bailey MH, Saksena G, Covington KR, Kandoth C, Stewart C, Hess J, Ma S, Chiotti KE, McLellan M, Sofia HJ, Hutter C, Getz G, Wheeler D, Ding L, MC3 Working Group, Cancer Genome Atlas Research Network. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Systems, 6(3): 271-281. 2018 [PubMed]

Cabili MN, Carey K, Dyke SOM, Brookes AJ, Fiume M, Jeanson F, Kerry G, Lash A, Sofia H, Spalding D, Tasse AM, Varma S, Pandya R. Simplifying research access to genomics and health data with Library Cards. Scientific Data, 5. 2018 [PubMed]

Wang S, Jiang X, Tang H, Wang X, Bu D, Carey K, Dyke SOM, Fox D, Jiang C, Lauter K, Malin B, Sofia H, Telenti A, Wang L, Wang W, Ohno-Machado L. A community effort to protect genomic data sharing, collaboration and outsourcing. Nature Genomic Medicine, 2(1): 33. 2017 [PubMed]

Raisaro JL, Tramèr F, Ji Z, Bu D, Zhao Y, Carey K, Lloyd D, Sofia H, Baker D, Flicek P, Shringapure S, Bustamante C, Wang S, Jiang X, Ohno-Machado L, Tang H, Wang X, Hubaux JP. Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks. JAMIA, 24(4): 799-805. 2017 [PubMed]

Tang H, Jiang X, Wang X, Wang S, Sofia H, Fox D, Lauter K, Malin B, Telenti A, Xiong L, Ohno-Machado L. Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Medical Genomics, 9(1): 63. 2016. [PubMed]

TCGA and Pancancer Atlas (Cancer Genome Atlas Research Network): https://cancergenome.nih.gov/publications

Staron, A., Sofia, H.J., Dietrich, S., Ulrich, L.E., Liesegang, H., Mascher, T. The third pillar of bacterial signal transduction: classification of the extracytoplasmic function (ECF) sigma factor protein family. Molecular Microbiology, 74: 557-581. 2009. [PubMed]

Oehmen, C.S., Sofia, H.J., Baxter, D., Szeto, E., Hugenholtz, P., Kyrpides, N., Markowitz, V., Straatsma, T.P. Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond. Lawrence Berkeley National Laboratory. LBNL Paper, LBNL-62882. 2009. [Full Text]

Chin, G., Jr., Chavarria, D.G., Nakamura, G.C., Sofia, H.J. BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment. BMC Bioinformatics, 9:S6. 2008. [PubMed]

Chavarria-Miranda, D., Gracio, D., Marquez, A., Nieplocha, J., Scherrer, C., Sofia, H., Bader, D.A., Madduri, K., Berry, J., Hendrickson, B. Cray. XMT brings new energy to high-performance computing. SciDAC Review, 9: 36-41. 2008. [Full TextPDF file]

Wong, P.C., Foote, H., Mackey, P., Chin, G., Jr., Sofia, H., Thomas, J. A dynamic multiscale magnifying tool for exploring large sparse graphs. Information Visualization, 7: 105-117. 2008. [Full TextPDF file]

Campbell, E.A., Greenwell, R., Anthony, J.R., Wang, S., Lim, L., Sofia, H.J., Donohue, T.J., Darst, S.A. A conserved structural module regulates transcriptional responses to diverse stress signals in bacteria. Molecular Cell, 27: 793-805. 2007. [PubMed]

Chin G, Jr., Stephan, E.G., Klicker, K.R., Corrigan, A.L., Sofia, H.J. Supporting computational visual theories in biology. Proceedings of the IEEE 2004 Symposium on Visual Languages and Human-Centric Computing (VLHCC'04), September 26-29, Rome, Italy, pp 69-71. 2004.

Korner, H., Sofia, H.J., and Zumft, W.G. Phylogeny of the bacterial superfamily of Crp-Fnr transcription regulators: exploiting the metabolic spectrum by controlling alternative gene programs. FEMS Microbiol, Rev.27: 559-592. 2003. [PubMed]

Sofia, H.J., Chen, G., Hetzler, B.G., Reyes-Spindola, J.F., and Miller, N.E. Radical SAM: A novel protein superfamily that links unresolved steps in familiar biosynthetic pathways with radical mechanisms: Discovery and functional characterization using new analysis and information visualization methods. Nucleic Acids Res, 29: 1097-1106. 2001. [PubMed]

Kajkowski, E.M., Lo, C.F., Ning, X., Walker, S., Sofia, H.J., Wang, W., Edris, W., Chandra, P., Wagner, E., Vile, S., Ryan, K., McHendry-Rinde, B., Smith, S.C., Wood, A., Rhodes, K.J., Kennedy, J.D., Bard, J., Jacobsen, J.S., and Ozenberger, B.A. Beta-amyloid peptide-induced apoptosis regulated by a novel protein containing a G protein activation module. J Biol Chem, 276: 18748-18756. 2001. [PubMed]

Rudd, K.E., Sofia, H.J., Koonin, E.V., Plunkett, G. III, Lazar, S., and Rouviere, P.E. A new family of peptidyl-prolyl isomerases. Trends Biochem Sci, 20: 12-14. 1995. [PubMed]

Sofia, H.J., Burland, V., Daniels, D.L., Plunkett, G., III, and Blattner, F.R. Analysis of the Escherichia coli genome V. DNA sequence of the region from 76.0 to 81.5 minutes. Nucleic Acids Res, 22: 2576-2586. 1994. [PubMed]

Last updated: September 23, 2018