Diversity in Genomic Research
Diversity among genomics research participants is essential for improving the health of everyone.
The Big Picture
Human DNA sequences (that is, our genomes) are more than 99.9% identical among people.
The 0.1% genomic differences come from variations among the nearly 3 billion bases (or “letters”) in our DNA; sometimes these variations can influence our chances of developing a disease.
So far, most people who have given permission for their DNA to be used for research are from European ancestry, making many populations from across the globe underrepresented in genomics research.
The National Human Genome Research Institute (NHGRI) is working to enhance the diversity of people who participate in genomics research, thereby improving our knowledge of human genomic variation and genomic information for all populations.
How it affects you
The code embedded within the human genome is complex, and genomics research has only scratched the surface of determining everything there is to know about what makes us all different at the DNA level.
Historically, the people who have provided their DNA for genomics research have been overwhelmingly of European ancestry, which creates gaps in knowledge about the genomes from people in the rest of the world. Scientists are now expanding their data collection to better understand how genomics can be used to improve the health and wellbeing of all people.
What makes human genomes diverse?
While humans are similar in most ways, our biological processes make each of us unique. Many aspects of those processes are encoded in our DNA, which is based on the sequence of the four letters of life — A, C, G, and T. The dissimilarities among human genomes, referred to as variants, come from differences in our DNA sequences.
The human genome is more than 3 billion letters long, which means that a variant could reside at any place among those letters. These variants occur at different frequencies across different human populations; some are rare and unique to specific families, while others are common and found across populations.
Genomes from distinct populations differ due to multiple factors, including who people decide to reproduce with, as well as human migration patterns. Also, in specific populations, certain variants became widespread as they provided an advantage that helped them adapt to environmental changes.
Factors displayed in the graphic include: reproduction, migration and random fluctuations.
Why should researchers collect genomic data from diverse populations?
Based on work completed before and after the Human Genome Project, researchers found that the genome sequences of human populations have changed significantly over 250,000 years of our species’ expansion and migration across the Earth.
Even with the high degree of similarity between any two human genomes, enough differences exist that it is not appropriate to use a single, or even a few, genomes to represent the world’s populations. This highlights that the original human genome reference sequence, produced by the Human Genome Project and based on just a handful of research participants, was just the starting point for human genomics.
To address this limitation, efforts are underway to create human reference genome sequences that better represent diverse populations. NHGRI funds the Human Pangenome Reference Program, which is generating a collection of reference genome sequences that better represent human diversity.
The following graphic displays ancestry populations included in large-scale genomic studies by percentages (from highest to lowest): 78% European; 10% Asian; 8.5% Unreported; 2% African; 1% Hispanic; and 0.5% Other Countries
The percentage of ancestry populations included in large-scale genomic studies is overwhelmingly European.
How does studying diverse human genomes improve health outcomes?
Every human has some baseline genetic risk of developing a given disease. Extensive research has been performed to both understand and learn how to respond to these risks. In some cases, the same variant consistently causes a disease (e.g., Huntington's disease and cystic fibrosis), but this might not be the case for more complex diseases (e.g., coronary artery disease, obesity, cancer and Alzheimer’s disease).
By including populations that reflect the full diversity of human populations in genomic studies, researchers can identify genomic variants associated with various health outcomes at the individual and population levels. This way, researchers can better define a person's risk of developing a specific disease and design a clinical management strategy that is tailored to the individual. In addition, they can pursue genomic medicine strategies that benefit specific populations.
Why has enhancing diversity in genomics research been a difficult task?
Increasing the representation of diverse participants in genomics research requires an investment of both resources and time to intentionally establish trusting and respectful long-term relationships between communities and researchers. To ensure that genomics research is both equitable and inclusive, it is crucial for the genomics research workforce to reflect a similar diversity as the communities that the research is intended to serve.
In the past, both inaccessible and insufficient communication left some research participants unclear about the benefits of their participation and how their data would be used after the studies concluded. To overcome this, researchers must seek to understand people’s reasons for not participating in genomic studies and to communicate with participants in a more accessible manner. This can take additional time, effort and resources, which may discourage some researchers from including these important, diverse populations in their studies. However, such exclusion can lead to notable gaps in scientific understanding and potentially reenforce existing disparities in genomics research.
What are some genomics research projects that are enhancing the diversity?
Genomics researchers have initiated dozens of research projects to enhance the representation of research participants in genomics research. These studies are addressing a variety of research topics, including the effects of genomic diversity on disease risk, how to tailor genomic medicine for underrepresented populations, the impact of genomics research on diverse and the history of the human population.
NIH's All of Us Research Program is working to build a diverse health resource by collecting genome-related data and other information from about 1 million people. The Global Alliance for Genomics and Health (GA4GH) is developing a framework for storing, analyzing and sharing genomic data among international researchers. The Human Cell Atlas aims to be a resource that includes in-depth information about all cell types found in people across the world.
How is NHGRI helping to improve diversity in genomics research?
NHGRI is dedicated to increasing diversity of the genomics workforce. In addition, NHGRI supports projects that work to increase the diversity of people participating in genomics research, including:
- The 1,000 Genomes Project (2002 - 2015)
The most extensive public catalog of human variation and genomic data, with over 2,000 genomic samples from 26 populations across the North and South America, Africa, Asia and Europe.
- Human Heredity and Health in Africa (H3Africa) (2012 - 2022)
The largest pan-African genomic research consortium that investigates the genomics of disease in Africa. The project also aims to build a sustainable African genomics research enterprise. This project is a collaborative effort that also involves the NIH Common Fund, the Wellcome Trust and the African Academy of Sciences.
- Polygenic Risk Score (PRS) Diversity Consortium (2021 - 2027)
The consortium uses insights from genomic diversity to predict health and disease risk across diverse populations using a PRS approach.
- Implementing Genomics in Practice (IGNITE) Network (2018 - 2022)
This network assesses approaches for real-world applications of genomic medicine in diverse clinical settings.
- Electronic Medical Records and Genomics (eMERGE) Network (2020 - 2025)
This network establishes protocols and methodologies for improved genomic risk assessments for diverse populations and to integrate their use in clinical care.
Last updated: December 7, 2021