NHGRI logo

Computational Genomics and Data Science Program

Extracting knowledge from data is a defining challenge of science.


Computational genomics has been an important area of focus for NHGRI since the beginning of the Human Genome Project. Today, however, advances in tools and techniques for data generation are rapidly increasing the amount of data available to researchers, particularly in genomics. This increase requires researchers to rely ever more heavily on computational and data science tools for the storage, management, analysis, and visualization of data. NHGRI’s commitment to computational genomics and data science is in alignment with the NIH Strategic Plan for Data Science, which provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.

NHGRI Support

The NHGRI 2011 strategic plan identifies bioinformatics and computational biology as a cross-cutting area “broadly relevant and fundamental across the entire spectrum of genomics and genomic medicine.” Projects involving a substantial element of computational genomics or data science account for almost a quarter of NHGRI’s FY2018 budget; these areas are key components of many NHGRI grants and programs.

NHGRI’s support for computational genomics and data science follows the general principles and priorities identified in the NHGRI Funding Policy. Particular priority is placed on “approaches generalizable across diseases and biological systems of higher order organisms.” Projects focusing on a single disease are less likely to be relevant to NHGRI than those generalizable across multiple diseases.

Program Breadth

Grants supported under this program span many scientific topics. These grants can be categorized usefully, though neither exhaustively nor perfectly, into "Genome Analysis Tools and Software Resources" and "Data Management Resources." This structure is further explained in the text below and illustrated in Figure 1. The program structure described below should be considered as a general and not exclusive framework for organizing grants into broad scientific categories of interest to NHGRI.



Tools and Resources

The links below lead to NIH RePORTER, a database containing information concerning NIH funded grants. Each link associated with a category, will display the relevant portfolio of grants that receive funding from the NHGRI Computational Genomics and Data Science Program.

Genome Analysis
  • Genetic variation, clinical and phenotype analyses.
    • Variation and association analyses: These projects seek to develop new and improved methods for interpreting genetic variation, associating variation with phenotypes, and analyzing population data. Key types of genetic variation include single nucleotide polymorphisms (SNPs), insertions and deletions (indels), short tandem repeats (STRs), copy-number variants (CNVs), and structural variants. Associating genetic variation with diseases and traits may require diverse analytical approaches, including analysis of population data, gene-by-gene (GxG) interactions, and gene-by-environment (GxE) interactions.
    • Clinical and phenotype analyses: Projects in this area develop new and improved methods for the management and analysis of clinical phenotype and electronic health record (EHR) data.
  • Genomic data processing and analysis tools
    • Sequencing informatics: These projects develop new and improved methods for processing, aligning, and formatting sequence reads, performing genome assembly, and extracting sequence features.
    • Function analyses: Gene regulation, gene expression, epigenetic modifications, and methylation all shape the relationships between genes and phenotypes. Projects in this category seek to facilitate the use of these and similar datatypes in genomics. This could involve anything from development of new or improved methods for handling diverse datatypes to the development and refinement of mathematical models of networks and pathways to aid in predicting functional effects of variants.
    • General genome data analysis tools: This category includes grants performing genome data analysis not covered in the other categories. Topics in this area include, among others, statistical methods for pattern recognition, applications to make genomics analysis more secure and efficient, and other tools to improve the usability and impact of genomics data.
  • Informatics platforms for genome analyses: These projects develop informatics systems and integrated computational environments. These software suites and web-based platforms enable the management, analysis, and visualization of genomic data using advanced statistical and informatics approaches.
Data Management Resources

NIH Strategic Plan for Data Science

Storing, managing, standardizing and publishing the vast amounts of data produced by biomedical research is a critical mission for the National Institutes of Health. In support of this effort, on June 4 2018 NIH released its first Strategic Plan for Data Science that provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem.  Over the course of the next year, NIH will begin implementing its strategy, with some elements of the plan already underway. NIH will continue to seek community input during the implementation phase.

Further information on this NIH strategic plan for data science can be found at: https://datascience.nih.gov/strategicplan.

CGDS Program Workshop Report

NHGRI held an Informatics and Data Science-focused workshop on Sept 29-30, 2016, in Bethesda, MD. The goal of the workshop was to identify and prioritize opportunities of significance to the NHGRI Computational Genomics and Data Science Program over the next 3-5 years. A report outlined the opportunities that were identified through the course of this workshop and this was presented to the NHGRI council in May 2017.

Genomic Data Science Working Group

The NHGRI Genomic Data Science Working Group is a subcommittee of the National Advisory Council for Human Genome Research (NACHGR). The working group was created in 2017 to facilitate a deeper engagement of the NACHGR in the numerous and increasingly complex issues at the interface between genomics and d2ata science.

Funding Opportunities

Investigators interested in submitting applications to NHGRI are encouraged to contact NHGRI program staff before submission to discuss their specific aims and their choice of Funding Opportunity Announcement (FOA). Contact information for NHGRI program staff is at the bottom of this page. 

Investigator Initiated Research in Computational Genomics and Data Science (R01, R21, and R43/R44): PAR-18-844PAR-18-843, and PAR-19-061, invite applications for a broad range of research efforts in computational genomics, data science, statistics, and bioinformatics relevant to one or both of basic or clinical genomic science, and broadly applicable to human health and disease.

Genomic Resource Grants for Community Resource Projects (U24): PAR-17-273 is tightly focused on supporting major genomic resources, including those in informatics. Potential applicants are strongly encouraged to contact NHGRI Program Staff before developing an application.

Parent NIH Solicitations: Many applications are received through the NIH Parent R01 (PA-18-484 and PA-18-345) and Parent R21 (PA-18-489 and PA-18-344) solicitations. These investigator-initiated grants allow researchers to target their specific area of science relevant to NHGRI's mission (per the NHGRI Funding Policy). Additionally, NIH funding opportunities for Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) grants can be found at https://sbir.nih.gov/funding.

Other Relevant NIH Funding Opportunities 

NHGRI's Funding Opportunities page links to various NHGRI funding opportunities and provides instructions for signing up for NHGRI's funding opportunities email list.

The webpage of the Biomedical Information Science and Technology Initiative (BISTI) provides links to various informatics-related funding opportunities across NIH and other Federal agencies.

Genomic Analysis, Visualization, and Informatics Lab-space (AnVIL)

AnVIL (RFA-HG-17-011) aims to create an interoperable cloud based resource for the research community by co-locating data, storage and computing infrastructure with commonly used services and tools for analyzing and sharing data.

Learn More
Data Nodes

The Alliance of Genome Resources, Model Organism Databases, and the Gene Ontology Consortium

The primary mission of the Alliance of Genome Resources (the Alliance) is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease. This understanding is fundamental for advancing genome biology research and for translating human genome data into clinical utility.

Learn More
Alliance of Genome Resources

Program Staff

Program Directors

Valentina Di Francesco, M.S.
Valentina Di Francesco, M.S.
  • Lead Program Director Computational Genomics and Data Science
  • Division of Genome Sciences
Daniel A. Gilchrist, Ph.D.
Daniel A. Gilchrist, Ph.D.
  • Program Director Computational Genomics and Data Science
  • Division of Genome Sciences
Shurjo Sen
Shurjo K. Sen, Ph.D.
  • Program Director
  • Division of Genome Sciences
Chris Wellington, B.S.
Chris Wellington, B.S.
  • Program Director Computational Genomics and Data Science
  • Division of Genome Sciences

Scientific Program Analysts

Joanna C. Chau
Joanna C. Chau
  • Scientific Program Analyst
  • Division of Genomic Medicine
Generic Profile Photo
Natalie Kucher
  • Scientific Program Analyst
  • Division of Genome Sciences

Last updated: July 16, 2019