NHGRI logo
Explainer

Use of Population Descriptors in Genomics

The Big Picture

Appropriate use of population descriptors in research is a critical scientific issue that is important for advancing genomic science and improving healthcare across human populations. Thoughtful use by researchers and other stakeholders is important given the ethical, legal and social implications of their historical and current use.

 

  • This explainer discusses and differentiates three common population descriptors — race, ethnicity and genetic ancestry — which are often used to distinguish groups of people participating in research and to inform some healthcare decisions.
     
  • The inaccurate belief that human populations are biologically distinct has contributed to harms, such as justifying eugenics, promoting scientific racism, and marginalizing groups. In turn, misapplication of concepts of population groups has contributed to health disparities, alienated marginalized groups from research participation, and led to harmful stereotypes that have reinforced inequities.
     
  • More work is needed to educate researchers, clinicians, policymakers and the public on the distinctions between race, ethnicity and genetic ancestry, and to advance the use of population descriptors in genomics and biomedical research.
     
  • The National Academies of Sciences, Engineering and Medicine (NASEM) assessed the methods, benefits and challenges in a review of the use of population descriptors in genomics research. The NASEM Report includes 13 recommendations designed to transform how population descriptors are used in human genetics and genomics research.
     

What are population descriptors?

Types of population descriptors

Population descriptors are ways of describing or distinguishing people from each other based on perceived or actual differences. They capture the various ways in which people can differ from one another.


A wide variety of population descriptors are used to describe groups of people in research, healthcare or society. Examples of population descriptors include race, ethnicity, genealogical ancestry, genetic ancestry, indigenous, primary language spoken, nationality, geographic origin, sex at birth, gender identity, disability status and age. Each population descriptor captures a different aspect of a group or individual. One population descriptor is not enough to fully describe or distinguish any individual or group. Depending on the situation, some population descriptors may be more relevant than others.


People commonly use population descriptors and their corresponding categories or numerical scales to describe themselves and others. For example, we use categories like female, male or intersex when referring to biological sex assigned at birth.When referring to age, we use numerical values like months and years. We also use categories like newborn, adolescent or older adult.


Researchers in genomics and healthcare also use population descriptors and corresponding categories to describe who is participating in a research study, what groups are being compared as part of the study and to whom their study findings may apply. Researchers can obtain information about population descriptors in many ways, for instance, by asking participants how they identify, looking in an electronic medical record, using data from a prior research study that was shared, or searching public records. Researchers may also assign a population descriptor to an individual or group using a specific analytical approach, such as using statistics to look at the frequency of DNA variants across the genome.


The definitions, measurement, uses and interpretations of population descriptors have varied over time, across users and around the world. Human rights movements or social and political action can bring about such changes. In addition, new scientific discoveries or knowledge, such as in the fields of genomics, archaeology or social science, can lead to changes. New scientific discoveries along with well-established facts present an opportunity to improve our understanding of human genetic variation, and our understanding of what types of differences between or across groups may be important for health. For example, the first modern humans lived somewhere in Africa approximately 300,000 years ago,2 and physical barriers to the migration of humans, such as oceans and mountains, led to geographical differences in the frequency of genetic variants we see within and between populations.3


While we’re often searching for difference, it is important to remember that human beings are far more similar than they are different. When identifying groups that differ genetically, researchers have found that most of the variation occurs within groups of people rather than between them. What this means is that nearly all differences are not specific to a group. Rather, they are sometimes found at different frequencies between groups.

  • Types of population descriptors

    Population descriptors are ways of describing or distinguishing people from each other based on perceived or actual differences. They capture the various ways in which people can differ from one another.


    A wide variety of population descriptors are used to describe groups of people in research, healthcare or society. Examples of population descriptors include race, ethnicity, genealogical ancestry, genetic ancestry, indigenous, primary language spoken, nationality, geographic origin, sex at birth, gender identity, disability status and age. Each population descriptor captures a different aspect of a group or individual. One population descriptor is not enough to fully describe or distinguish any individual or group. Depending on the situation, some population descriptors may be more relevant than others.


    People commonly use population descriptors and their corresponding categories or numerical scales to describe themselves and others. For example, we use categories like female, male or intersex when referring to biological sex assigned at birth.When referring to age, we use numerical values like months and years. We also use categories like newborn, adolescent or older adult.


    Researchers in genomics and healthcare also use population descriptors and corresponding categories to describe who is participating in a research study, what groups are being compared as part of the study and to whom their study findings may apply. Researchers can obtain information about population descriptors in many ways, for instance, by asking participants how they identify, looking in an electronic medical record, using data from a prior research study that was shared, or searching public records. Researchers may also assign a population descriptor to an individual or group using a specific analytical approach, such as using statistics to look at the frequency of DNA variants across the genome.


    The definitions, measurement, uses and interpretations of population descriptors have varied over time, across users and around the world. Human rights movements or social and political action can bring about such changes. In addition, new scientific discoveries or knowledge, such as in the fields of genomics, archaeology or social science, can lead to changes. New scientific discoveries along with well-established facts present an opportunity to improve our understanding of human genetic variation, and our understanding of what types of differences between or across groups may be important for health. For example, the first modern humans lived somewhere in Africa approximately 300,000 years ago,2 and physical barriers to the migration of humans, such as oceans and mountains, led to geographical differences in the frequency of genetic variants we see within and between populations.3


    While we’re often searching for difference, it is important to remember that human beings are far more similar than they are different. When identifying groups that differ genetically, researchers have found that most of the variation occurs within groups of people rather than between them. What this means is that nearly all differences are not specific to a group. Rather, they are sometimes found at different frequencies between groups.

Understanding genetic ancestry, race and ethnicity

There is not one agreed-upon definition for these terms. The descriptions below highlight key differences across them.

 

Genetic ancestry

Genetic ancestry refers to the biological relationships between individuals that result from inheriting DNA from common ancestors. These common ancestors are tied to their geographical origins from many centuries ago when long-distance travel was extremely difficult. Parents do not pass down all their DNA to their children, therefore genealogical ancestry and genetic ancestry can be different. Genetic ancestry is based on a statistical measure of genetic similarity across individuals.

 

Race

People created the concept of race. Race is typically used to divide human populations into groups based on perceived physical appearance (such as skin color), social factors and cultural backgrounds.4 Race has been used to inappropriately group people into a hierarchical system to "establish and justify systems of power, privilege, disenfranchisement and oppression." 5

 

Ethnicity

Ethnicity tends to refer to a group of people with shared language, religion, customs, beliefs, heritage and history, even though such attributes are not always confined to a single ethnic group. Ethnicity may also refer to groups that are considered indigenous to an area. Ethnicity is not a biological characteristic.

  • Understanding genetic ancestry, race and ethnicity

    There is not one agreed-upon definition for these terms. The descriptions below highlight key differences across them.

     

    Genetic ancestry

    Genetic ancestry refers to the biological relationships between individuals that result from inheriting DNA from common ancestors. These common ancestors are tied to their geographical origins from many centuries ago when long-distance travel was extremely difficult. Parents do not pass down all their DNA to their children, therefore genealogical ancestry and genetic ancestry can be different. Genetic ancestry is based on a statistical measure of genetic similarity across individuals.

     

    Race

    People created the concept of race. Race is typically used to divide human populations into groups based on perceived physical appearance (such as skin color), social factors and cultural backgrounds.4 Race has been used to inappropriately group people into a hierarchical system to "establish and justify systems of power, privilege, disenfranchisement and oppression." 5

     

    Ethnicity

    Ethnicity tends to refer to a group of people with shared language, religion, customs, beliefs, heritage and history, even though such attributes are not always confined to a single ethnic group. Ethnicity may also refer to groups that are considered indigenous to an area. Ethnicity is not a biological characteristic.

How well can researchers determine genetic ancestry?

Methods for estimating genetic ancestry are evolving. To determine an individual’s genetic ancestry, researchers compare DNA variants in that individual to the frequency of those DNA variants in groups of people from around the world who have provided samples of their DNA. These groups of people form what is referred to as "reference populations." Genetic ancestry is estimated using statistical techniques and is typically based on some measure of genetic similarity. An individual with a collection of genetic variants that appear in highest frequency within a reference population is estimated to have ancestors from that reference population. Individuals may have a collection of genetic variants that appear in more than one reference populations, which indicates they likely have ancestors from more than one group. Some have argued that instead of thinking about genetic ancestry as broad groups or categories, genetic ancestry should be considered a continuum.


Currently, genomic researchers do not have DNA samples from many groups of people around the world, which means genetic ancestries for some geographical locations cannot be estimated accurately. In addition, as mentioned above, scientists use reference datasets to calculate ancestries. From one analysis to another, genetic ancestry estimations can be different due to differences in the frequencies of genetic variants in the datasets used. Furthermore, when someone is estimated to have ancestors from more than one group, researchers sometimes lump individuals together into a single group to simplify analyses. Therefore, determining genetic ancestry is a statistical estimate, based on available data, and is not always consistent across studies.


More recently, companies offering ancestry-related services directly to consumers have combined genetic ancestry information with family history information.


Regardless of the outcome of a genetic ancestry test, people will choose how they want to be identified by others. These choices may be informed by DNA, social factors, personal or familial preferences, or lived experiences.

  • How well can researchers determine genetic ancestry?

    Methods for estimating genetic ancestry are evolving. To determine an individual’s genetic ancestry, researchers compare DNA variants in that individual to the frequency of those DNA variants in groups of people from around the world who have provided samples of their DNA. These groups of people form what is referred to as "reference populations." Genetic ancestry is estimated using statistical techniques and is typically based on some measure of genetic similarity. An individual with a collection of genetic variants that appear in highest frequency within a reference population is estimated to have ancestors from that reference population. Individuals may have a collection of genetic variants that appear in more than one reference populations, which indicates they likely have ancestors from more than one group. Some have argued that instead of thinking about genetic ancestry as broad groups or categories, genetic ancestry should be considered a continuum.


    Currently, genomic researchers do not have DNA samples from many groups of people around the world, which means genetic ancestries for some geographical locations cannot be estimated accurately. In addition, as mentioned above, scientists use reference datasets to calculate ancestries. From one analysis to another, genetic ancestry estimations can be different due to differences in the frequencies of genetic variants in the datasets used. Furthermore, when someone is estimated to have ancestors from more than one group, researchers sometimes lump individuals together into a single group to simplify analyses. Therefore, determining genetic ancestry is a statistical estimate, based on available data, and is not always consistent across studies.


    More recently, companies offering ancestry-related services directly to consumers have combined genetic ancestry information with family history information.


    Regardless of the outcome of a genetic ancestry test, people will choose how they want to be identified by others. These choices may be informed by DNA, social factors, personal or familial preferences, or lived experiences.

A closer look: Genetic ancestry and identification

How do I identify?

Imagine that your friend received an ancestry test for the holidays and was surprised by some of their results. After taking the test, your friend had a primary care appointment with a new doctor. When completing the required forms, they answered questions about their race and ethnicity differently than in the past based on the new information provided in their ancestry test. While their physical body and health status did not change, their social identity did change. People vary in their response to ancestry testing. For some, the outcome may lead to an identity change. For others, they may maintain their original identity.6


But how meaningful is this change for health care decision making? For some conditions, an estimate of genetic ancestry (not race) can be informative. For example, some heritable cancers are more common in certain groups than others. Should self-identified race or ethnicity change a doctor’s decision about medical treatment? The answer can depend on a variety of factors.


How am I identified?


As another example, the U.S. government has changed the reporting of race and ethnicity over time with categories being renamed, merged, removed or expanded.7 Was this change due to some new information about how people differ genetically or biologically? No. The change was made to better reflect perceptions of growing diversity across groups in the country, to better reflect how different people identity themselves and to improve the quality of available demographic data. The way people self-identify can change in their lifetime or across generations, along with the questions and forms intended to capture this information.

 

 

For more on how people understand and think about their identity, see NHGRI’s Virtual Event: Why am I Irish yesterday and Italian today?

  • A closer look: Genetic ancestry and identification

    How do I identify?

    Imagine that your friend received an ancestry test for the holidays and was surprised by some of their results. After taking the test, your friend had a primary care appointment with a new doctor. When completing the required forms, they answered questions about their race and ethnicity differently than in the past based on the new information provided in their ancestry test. While their physical body and health status did not change, their social identity did change. People vary in their response to ancestry testing. For some, the outcome may lead to an identity change. For others, they may maintain their original identity.6


    But how meaningful is this change for health care decision making? For some conditions, an estimate of genetic ancestry (not race) can be informative. For example, some heritable cancers are more common in certain groups than others. Should self-identified race or ethnicity change a doctor’s decision about medical treatment? The answer can depend on a variety of factors.


    How am I identified?


    As another example, the U.S. government has changed the reporting of race and ethnicity over time with categories being renamed, merged, removed or expanded.7 Was this change due to some new information about how people differ genetically or biologically? No. The change was made to better reflect perceptions of growing diversity across groups in the country, to better reflect how different people identity themselves and to improve the quality of available demographic data. The way people self-identify can change in their lifetime or across generations, along with the questions and forms intended to capture this information.

     

     

    For more on how people understand and think about their identity, see NHGRI’s Virtual Event: Why am I Irish yesterday and Italian today?

Are population descriptors social constructs?

A social construct is an idea or collection of ideas that has been created, agreed upon, accepted, or acknowledged by groups of people in a society. Social constructs offer ways to organize, explain or make sense of the world. Many population descriptors are social constructs.  A socially constructed population descriptor can change over time and be used and defined differently in different parts of the world.8


Race and ethnicity are social constructs. There is no clear or consistent way to place people into racial or ethnic groups using biology or innate characteristics. For example, people with similar skin color or hair texture have been defined as different races. Skin color variation has arisen over time from people adapting to varying levels of sun exposure 9 and people of similar skin color may have very little in common genetically.


For centuries, definitions of race and ethnicity were overly simplistic, unscientific, unethical and regularly used to support colonialism, slavery, imperialism, scientific racism and eugenics. Race has been used to group people into a hierarchical system that identifies, distinguishes and marginalizes some groups across nations, regions and the world. Race also has been used to “establish and justify systems of power, privilege, disenfranchisement and oppression."5


The U.S. federal government notes that the race and ethnicity categories established by the Office of Budget and Management (OMB) are sociopolitical constructs and are not an attempt to define race and ethnicity biologically or genetically. Furthermore, these categories reflect a social definition of race and ethnicity recognized in the U.S. and they do not conform to any biological, anthropological or genetic criteria.10


While genetic ancestry involves analyzing DNA variants and is tied to biology, as described above, the availability of reference populations can influence the ancestral group(s) a person is categorized into. Scientists must decide what level of resolution they will use to group people and what terms they will use to label these genetic ancestry groups. The categories that result when calculating genetic ancestry are sometimes aligned with social constructs of race and ethnicity. Broad genetic ancestry categories are sometimes labeled using the continents from where people are believed to have origins or roots, such as African, European, and Asian. In this way, descriptors for race, ethnicity and genetic ancestry are often intertwined and misused. Because of these factors, some have described genetic ancestry as socially constructed, too.11

  • Are population descriptors social constructs?

    A social construct is an idea or collection of ideas that has been created, agreed upon, accepted, or acknowledged by groups of people in a society. Social constructs offer ways to organize, explain or make sense of the world. Many population descriptors are social constructs.  A socially constructed population descriptor can change over time and be used and defined differently in different parts of the world.8


    Race and ethnicity are social constructs. There is no clear or consistent way to place people into racial or ethnic groups using biology or innate characteristics. For example, people with similar skin color or hair texture have been defined as different races. Skin color variation has arisen over time from people adapting to varying levels of sun exposure 9 and people of similar skin color may have very little in common genetically.


    For centuries, definitions of race and ethnicity were overly simplistic, unscientific, unethical and regularly used to support colonialism, slavery, imperialism, scientific racism and eugenics. Race has been used to group people into a hierarchical system that identifies, distinguishes and marginalizes some groups across nations, regions and the world. Race also has been used to “establish and justify systems of power, privilege, disenfranchisement and oppression."5


    The U.S. federal government notes that the race and ethnicity categories established by the Office of Budget and Management (OMB) are sociopolitical constructs and are not an attempt to define race and ethnicity biologically or genetically. Furthermore, these categories reflect a social definition of race and ethnicity recognized in the U.S. and they do not conform to any biological, anthropological or genetic criteria.10


    While genetic ancestry involves analyzing DNA variants and is tied to biology, as described above, the availability of reference populations can influence the ancestral group(s) a person is categorized into. Scientists must decide what level of resolution they will use to group people and what terms they will use to label these genetic ancestry groups. The categories that result when calculating genetic ancestry are sometimes aligned with social constructs of race and ethnicity. Broad genetic ancestry categories are sometimes labeled using the continents from where people are believed to have origins or roots, such as African, European, and Asian. In this way, descriptors for race, ethnicity and genetic ancestry are often intertwined and misused. Because of these factors, some have described genetic ancestry as socially constructed, too.11

How are population descriptors used?

The categories used to describe racial and ethnic groups around the world vary. Different ways of categorizing race and ethnicity arise from different historical and modern-day experiences. Some of these experiences include colonization, forced and voluntary migration, racial or ethnic stratification and systems of governance. For example, Australia’s census form12 in 2021 asked residents to indicate whether they are “of Aboriginal or Torres Strait Islander origin” and to indicate their “ancestry” using no more than two categories. Some of the ancestry categories included “English,” “Italian,” “Chinese,” “Maori,” and “Australian South Sea Islander.” There is no mention of “race” on the 2021 form, but it did appear in prior years.13  In some countries, language spoken, origins or religion matter more than race or ethnicity. France and Italy are examples of countries that do not include race nor ethnicity in their census. Some have argued that including race and ethnicity on censuses is essential for understanding and addressing inequities or racism.

  • How are population descriptors used?

    The categories used to describe racial and ethnic groups around the world vary. Different ways of categorizing race and ethnicity arise from different historical and modern-day experiences. Some of these experiences include colonization, forced and voluntary migration, racial or ethnic stratification and systems of governance. For example, Australia’s census form12 in 2021 asked residents to indicate whether they are “of Aboriginal or Torres Strait Islander origin” and to indicate their “ancestry” using no more than two categories. Some of the ancestry categories included “English,” “Italian,” “Chinese,” “Maori,” and “Australian South Sea Islander.” There is no mention of “race” on the 2021 form, but it did appear in prior years.13  In some countries, language spoken, origins or religion matter more than race or ethnicity. France and Italy are examples of countries that do not include race nor ethnicity in their census. Some have argued that including race and ethnicity on censuses is essential for understanding and addressing inequities or racism.

timeline graphic showing changing terms used for describing race

 


When the U.S. government established racial categories around 1790, they were tied to colonialism and flawed science. They were used in population surveys for purposes of taxation, government representation, counting enslaved persons and maintaining power.14 The names and number of categories changed over time due to shifts in scientific, political and social thinking about race and ethnicity.7


The major categories used in the U.S. 2020 Census15  included Hispanic, Latino or Spanish for ethnicity, and White, Black or African American, American Indian or Alaskan Native and Asian or Pacific Islander.


In addition to their use in the census, race and ethnicity have been used to measure racial and ethnic health disparities and to track progress in lessening disparities. Race and ethnicity are also commonly used as a proxy.16 These uses may be helpful for research and public health, especially when other data are not available.

Why should researchers be intentional about how population descriptors are used in genomics research and health?

Advances in genomic medicine greatly amplify the urgency of ensuring the field exemplifies scientific and social accuracy in all of the work that we do. Simply stated, the design of some genomic research studies has exacerbated scientific flaws due to how data are being analyzed, interpreted, reported and aligned across data sets. In no small part, this is because of how we misuse population descriptors.


Race and ethnicity are not valid or reliable proxies for genetic ancestry. In addition, genetic ancestry is a poor proxy for the geographic area where someone is from, where they currently live or things that may be part of their surrounding environment. Relying on race, ethnicity or genetic ancestry as a proxy for something that is not measured in research often hides underlying biological, environmental or social factors that may contribute to health and disease. In healthcare, race and ethnicity have been improperly treated as biological or innate characteristics.


In society, there are real and measurable impacts of one’s racial or ethnic identity on health, wellness and status in the United States, whether self-identified or assigned by someone else. Thus, race and ethnicity may be useful for examining social or political issues; documenting racial/ethnic health disparities; examining the impact of racial bias in health service delivery17 and monitoring diversity, equity and inclusion efforts within the biomedical workforce. Directly measuring and analyzing social determinants of health (SDOH), such as racism, violence, access to nutritious food or safe water, or exposure to trees and nature, would improve the rigor and usefulness of research. A growing collection of SDOH measures are available in a toolkit for researchers.18


In all types of research, when using population descriptors, researchers should be clear and transparent about which population descriptor(s) they are using, how they are measured and why they were chosen. Researchers should have a reasonable hypothesis for why specific descriptors may or may not be important to their research questions. Research should use labels and categories that accurately reflect what is being measured. Researchers should carefully consider whether race, ethnicity or genetic ancestry is the direct cause of the health differences we see across individuals or groups. If proxies are used in research because data of interest are not available or cannot be collected, then the challenges and limitations of doing so should be acknowledged.

A closer look: Measuring “race” in heart disease research

Imagine three different studies that look at severity of heart disease among people living in three different regions of the United States. “Race” is one of several variables analyzed in each study:
 

  • The first study measures race by asking participants to select a category that best describes their race by checking a box on a form. 
     
  • The second study measures race by asking each participant for a sample of their saliva and uses genetic analysis to group study participants into different races.
     
  • The third study uses the birth certificates of participants and their parents to assign a race to each participant.


All three studies use similar labels — Black, White, Native American, Hispanic, Asian and Other — when reporting their findings of heart disease across groups. After completing their analysis, all three studies conclude that “race” is a key factor in severity of heart disease.

 

Image of a Study Survey, Saliva DNA Test, and Birth Certificates

 


In this scenario, the same population descriptor and group labels are used in each study, but their measurements are different and range from self-report to DNA analysis to use of vital records. In the second study, race and genetic ancestry appear to be merged as if they are similar or equal. We don’t know from this scenario why each study is including race as a variable. The reasons may be varied.


If studies are unclear or inconsistent in the labels, definition, measurements or justifications used for population descriptors in research, then our ability to advance science and improve health outcomes is compromised. For example, when research approaches are not clearly specified, it makes it hard to repeat a study to confirm its accuracy, or to see if the same outcome occurs in a different population or part of the world. Furthermore, broad categories for genetic ancestry can obscure DNA variation that may be relevant to understanding certain health conditions.19


Poor use of population descriptors can also cause harm to communities. Findings from such studies are more likely to be misinterpreted and misused. For example, readers may believe that when a study uses DNA analysis to analyze “race” differences that there is something biological about race. Over the last seven decades, a wide range of population descriptors have been used in genomic research studies and have varied over time.20

  • A closer look: Measuring “race” in heart disease research

    Imagine three different studies that look at severity of heart disease among people living in three different regions of the United States. “Race” is one of several variables analyzed in each study:
     

    • The first study measures race by asking participants to select a category that best describes their race by checking a box on a form. 
       
    • The second study measures race by asking each participant for a sample of their saliva and uses genetic analysis to group study participants into different races.
       
    • The third study uses the birth certificates of participants and their parents to assign a race to each participant.


    All three studies use similar labels — Black, White, Native American, Hispanic, Asian and Other — when reporting their findings of heart disease across groups. After completing their analysis, all three studies conclude that “race” is a key factor in severity of heart disease.

     

    Image of a Study Survey, Saliva DNA Test, and Birth Certificates

     


    In this scenario, the same population descriptor and group labels are used in each study, but their measurements are different and range from self-report to DNA analysis to use of vital records. In the second study, race and genetic ancestry appear to be merged as if they are similar or equal. We don’t know from this scenario why each study is including race as a variable. The reasons may be varied.


    If studies are unclear or inconsistent in the labels, definition, measurements or justifications used for population descriptors in research, then our ability to advance science and improve health outcomes is compromised. For example, when research approaches are not clearly specified, it makes it hard to repeat a study to confirm its accuracy, or to see if the same outcome occurs in a different population or part of the world. Furthermore, broad categories for genetic ancestry can obscure DNA variation that may be relevant to understanding certain health conditions.19


    Poor use of population descriptors can also cause harm to communities. Findings from such studies are more likely to be misinterpreted and misused. For example, readers may believe that when a study uses DNA analysis to analyze “race” differences that there is something biological about race. Over the last seven decades, a wide range of population descriptors have been used in genomic research studies and have varied over time.20

Why does NHGRI care about this issue?

The use of population descriptors in genomic and biomedical research is a critical scientific issue with varied ethical, legal and social implications (ELSI). NHGRI will continue its focus on this issue to promote the ethical, responsible and scientifically rigorous advancement of genomic science, genomic medicine and ELSI research. NHGRI is also focusing on this issue to:

 

  • Recognize that people have been and continue to be harmed by the misuse of race in genomic research and the misinterpretation of research findings.
     
  • Avoid repeating mistakes of the past, which has caused immediate and long-lasting harm to minoritized and disenfranchised groups, here in the U.S. and around the world.
     
  • Earn the public’s trust by ensuring that researchers thoughtfully consider whether, when and how to use population descriptors; and ensuring that they are used in an ethical way.
     
  • Build and maintain trust in science among those we hope will participate in genomic research.
     
  • Ensure a more complete understanding of the diversity that exists across people who participate in research.
     
  • Ensure that all populations benefit from advances in genomic and biomedical research.
     
  • Improve health equity and eliminate disparities in genomic medicine.
     

NHGRI strongly encourages researchers to move beyond population descriptors based on historic social constructs such as race and includes this shift as part of its “Bold Predictions for Human Genomics by 2030.” To help achieve these objectives, NHGRI supported The National Academies of Sciences, Engineering, and Medicine (NASEM) in its review and assessment of existing methods, benefits and challenges in the use of population descriptors in genomics research. The NASEM Report includes 13 recommendations designed to transform how population descriptors are used in human genetics and genomics research. Continued efforts are needed to implement and test the practices identified by the report. Ultimately, NHGRI’s goal is to strengthen the rigor and reproducibility of genetics and genomics research and produce discoveries that are broadly applicable and will benefit all.

Looking forward

Understanding the true role that genomics plays in health and wellness will require careful attention to the full spectrum of potential contributing factors, including genomic, biological or clinical traits; components of the natural, built or social environment in which people live; and larger systemic or structural issues. Clarity and specificity around population descriptors used in genomic research can improve the scientific integrity of research while also showing respect for the people represented in genomic research.

DNA people

Further Reading

Understanding population descriptors
 

 

Understanding use over time
 

 

Population descriptors in society
 

 

Population descriptors and healthcare
 

 

Guidelines and recommendations for use
 

Bibliography

 

  1. Protocol - Biological Sex Assigned at Birth. PhenX.
     
  2. Hublin et al .New fossils from Jebel Irhoud, Morocco and the pan-African origin of Homo sapiens. Nature, 2017.
     
  3. Rosenberg, Noah. A Population-Genetic Perspective on the Similarities and Differences among Worldwide Human Populations. Human Biology, 2021.
     
  4. Race: Talking Glossary of Genetic and Genomic Terms. NHGRI, 2022.
     
  5. Race and Racial Identity. National Museum of African American History and Culture. 
     
  6. Roth, W. D., & Ivemark, B. Genetic options: The impact of genetic ancestry testing on consumers’ racial and ethnic identities. American Journal of Sociology, 124(1), 150–184. https://doi.org/10.1086/697487
     
  7. Measuring Race and Ethnicity Across the Decades: 1790–2010, U.S. Census.
     
  8. Historical Foundations of Race. National Museum of African American History and Culture.
     
  9. Nina G. Jablonski ,George Chaplin. Human skin pigmentation as an adaptation to UV radiation. PNAS, 2010.
     
  10. Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity. 1997.
     
  11. Bege Dauda, Santiago J. Molina, Danielle S. Allen, Agustin Fuentes, Nayanika Ghosh, et al. Ancestry: How researchers use it and what they mean by it. Frontiers in Genetics, 2023.
     
  12. Sample copies of the 2021 Census paper forms. Australia Census.
     
  13. Reflecting a Nation: Stories from the 2011 Census, July 2011. Australia Census.
     
  14. Anna Diamond. The Enumerated Story of the Census. Smithsonian Magazine.
     
  15. 2020 Census Informational Questionnaire. U.S. Census.
     
  16. Proxy. Oxford Reference.
     
  17. Brian D. Smedley, Adrienne Y. Stith, and Alan R. Nelson. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. 2003, The National Academies Press. doi: 10.17226/12875.
     
  18. Social Determinants of Health Collections. PhenX.
     
  19. Charles N. Rotimi, Ph.D., and Lynn B. Jorde, Ph.D.  Ancestry and Disease in the Age of Genomic Medicine. New England Journal of& Medicine. 363:1551-1558, 2010.  doi: 10.1056/NEJMra0911564.
     
  20. Language used by researchers to describe human populations has evolved over the last 70 years. NHGRI, 2021.

Last updated: September 6, 2023