NHGRI logo

How does NHGRI define metadata and phenotypic data?

NHGRI uses the NIH definition of metadata provided in the Final NIH Policy for Data Management and Sharing.

NIH defines metadata as “data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).”

Phenotypic data are the observable characteristics or traits of an organism (i.e., the physical manifestation of a genotype).

NOT-HG-21-022 articulates expectations for capturing and sharing both metadata and phenotypic data.

  • How does NHGRI define metadata and phenotypic data?

    NHGRI uses the NIH definition of metadata provided in the Final NIH Policy for Data Management and Sharing.

    NIH defines metadata as “data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).”

    Phenotypic data are the observable characteristics or traits of an organism (i.e., the physical manifestation of a genotype).

    NOT-HG-21-022 articulates expectations for capturing and sharing both metadata and phenotypic data.

What is the scientific value of sharing metadata?

Providing sufficient and well-structured metadata is a key component of abiding by the FAIR Principles  for scientific data management and sharing. Findable, Accessible, Interoperable, and Reusable datasets maximize investments in biomedical research.  

In order to ensure that data generated with funding by NHGRI is maximally useful to the broader research community, the Institute is emphasizing comprehensive metadata and phenotypic data sharing along with genomic datasets that are required to be shared, in accordance with the FAIR Principles.

  • What is the scientific value of sharing metadata?

    Providing sufficient and well-structured metadata is a key component of abiding by the FAIR Principles  for scientific data management and sharing. Findable, Accessible, Interoperable, and Reusable datasets maximize investments in biomedical research.  

    In order to ensure that data generated with funding by NHGRI is maximally useful to the broader research community, the Institute is emphasizing comprehensive metadata and phenotypic data sharing along with genomic datasets that are required to be shared, in accordance with the FAIR Principles.

How do the expectations of NOT-HG-21-022 relate to the requirements of the NIH Genomic Data Sharing (GDS) Policy?

The NIH GDS Policy states that NIH-funded researchers generating genomic data are expected to deposit “relevant associated data (e.g., phenotype and exposure data).” However, researchers often share the minimum metadata and phenotypic data required for submission to publicly accessible data repositories, rather than a comprehensive set of information to make the shared data more useful to secondary users. NOT-HG-21-022 builds upon the expectation outlined in the NIH GDS Policy and emphasizes the importance of sharing comprehensive metadata and phenotypic data associated to the dataset.  Importantly, this Notice applies to all NIH data sharing policies, not just the NIH GDS Policy. This Notice states that NHGRI-funded and supported researchers will be expected to:

  1. share the metadata and phenotypic data associated with the study.
  2. use standardized data collection protocols and survey instruments for capturing data, as appropriate.
  3. use standardized notation for metadata (e.g., controlled vocabularies or ontologies) to enable the harmonization of datasets for secondary research analyzes.  
     

NHGRI is working with NHGRI-funded data resources and data coordination centers, to ensure that adequate metadata and phenotypic data are deposited by NHGRI-funded researchers.

  • How do the expectations of NOT-HG-21-022 relate to the requirements of the NIH Genomic Data Sharing (GDS) Policy?

    The NIH GDS Policy states that NIH-funded researchers generating genomic data are expected to deposit “relevant associated data (e.g., phenotype and exposure data).” However, researchers often share the minimum metadata and phenotypic data required for submission to publicly accessible data repositories, rather than a comprehensive set of information to make the shared data more useful to secondary users. NOT-HG-21-022 builds upon the expectation outlined in the NIH GDS Policy and emphasizes the importance of sharing comprehensive metadata and phenotypic data associated to the dataset.  Importantly, this Notice applies to all NIH data sharing policies, not just the NIH GDS Policy. This Notice states that NHGRI-funded and supported researchers will be expected to:

    1. share the metadata and phenotypic data associated with the study.
    2. use standardized data collection protocols and survey instruments for capturing data, as appropriate.
    3. use standardized notation for metadata (e.g., controlled vocabularies or ontologies) to enable the harmonization of datasets for secondary research analyzes.  
       

    NHGRI is working with NHGRI-funded data resources and data coordination centers, to ensure that adequate metadata and phenotypic data are deposited by NHGRI-funded researchers.

What types of research does NOT-HG-21-022 apply to?

NOT-HG-21-022 applies to all NHGRI-funded research, including investigator-initiated research projects. Studies that do not result in a publication are also expected to share data and associated metadata.

Where should I deposit metadata?

Metadata should be submitted along with the dataset to a publicly accessible data repository. The Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) maintains a useful list of NIH-supported data repositories, including NHGRI-supported repositories such as AnVIL and various model organism databases.

How much metadata and phenotypic data do I need to share?

As stated by Wilkinson et al., data and metadata should be “richly described with a plurality of accurate and relevant attributes.” Statisticians have also published guidelines that may be helpful for data generators to consider when submitting a dataset and its associated metadata to a repository (e.g., How to share data for collaboration, Ellis & Leek). At a minimum, the submitted metadata and phenotypic data should be sufficient for a secondary user to fully replicate the analysis/findings of the original study.  

Does NHGRI endorse specific data standards and ontologies?

NHGRI strongly encourages the use of existing data standards and ontologies that are generally endorsed by the community of your research area, although it does not require the use of any particular one. Investigators should use data standard(s) and ontologies that facilitate comparison across similar studies within their research community.

  • Does NHGRI endorse specific data standards and ontologies?

    NHGRI strongly encourages the use of existing data standards and ontologies that are generally endorsed by the community of your research area, although it does not require the use of any particular one. Investigators should use data standard(s) and ontologies that facilitate comparison across similar studies within their research community.

Where should I start?

If you have questions about what data standard(s) or collection protocols to use, or which metadata and phenotypic data to share, contact your NHGRI Program Director.

Here are some useful links with additional information that may help you to get started:

  • BioPortal is a repository of biomedical ontologies.
     
  • The PhenX Toolkit is an online catalog of well-established and vetted phenotypic measurement protocols, to promote the collection of comparable data across studies.
     
  • The NIH Common Data Elements (CDE) Repository provides access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers for use in research.
  • Where should I start?

    If you have questions about what data standard(s) or collection protocols to use, or which metadata and phenotypic data to share, contact your NHGRI Program Director.

    Here are some useful links with additional information that may help you to get started:

    • BioPortal is a repository of biomedical ontologies.
       
    • The PhenX Toolkit is an online catalog of well-established and vetted phenotypic measurement protocols, to promote the collection of comparable data across studies.
       
    • The NIH Common Data Elements (CDE) Repository provides access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers for use in research.

Last updated: February 9, 2021