NHGRI logo

Genomic databases weakened by lack of non-European populations

Precision medicine will largely be built on vast troves of genomic information, but diverse populations are still underrepresented in public genomic databases, according to a new study by researchers from the National Institutes of Health and Partners Healthcare/Harvard Medical School.*

Diverse Groups Underrepresented in Genomic Databases

They found significantly fewer studies of African, Latin American and Asian ancestral populations compared to European populations in two public databases, the Genome-Wide Association Study Catalog (GWAS) and the database of Genotypes and Phenotypes (dbGaP). Findings were published in the May 7 issue of Health Affairs.

The gap of non-European populations in genomic databases means that researchers may miss gene-disease relationships, particularly when a gene variant is rare in Europeans. It also limits applicability of the databases to genomic research and to the clinical care of diverse populations.

"Decisive action is needed now to implement changes necessary for realizing the promise of precision medicine for all," said Vence L. Bonham, Jr., J.D., associate investigator in the Social and Behavioral Research Branch at NIH's National Human Genome Research Institute (NHGRI). Mr. Bonham is also the Senior Advisor to the NHGRI Director on Genomics and Health Disparities. "Because genomic information is so important to the translation of precision medicine research into clinical care, research must include individuals from diverse ancestral populations."

In their study, the researchers downloaded the GWAS study catalog as of May 8, 2017, and dbGaP as of Feb. 27, 2017. These databases are designed to archive genomic research data and make it available to qualified researchers while maintaining research participants' privacy. The researchers categorized studies by ancestral group (Europeans, Asian and underrepresented minorities, including Africans, African-Americans, Native Americans and Hispanics) and disease focus. By identifying disparities in genomic information by disease area, they highlighted which patient populations and disease areas were least represented.

Of 2,817 GWA studies, researchers first analyzed 413 studies on cancer. Of those, only 4 percent included underrepresented minorities, 29 percent included Asian populations and 67 percent included European populations. The remaining 2,404 GWA studies were not related to cancer. Of those, only 8 percent were of underrepresented minorities, 20 percent of Asian populations and 71 percent of European populations.

The dbGaP database also had weaknesses. Researchers excluded 394 studies because they contained no ancestral information. Of the 113 genome sequencing studies analyzed, 23 were focused on cancer (23 percent in underrepresented minorities, 15 percent in Asian populations and 62 percent in Europeans). Of the non-cancer studies, 11 were on cardiovascular disease (27 percent of underrepresented minorities, 18 percent of Asians and 55 percent of Europeans).

"We're seeing parallels between the underrepresentation of diverse populations and the underuse of genetic services in diverse populations, including genetic testing and counseling," said Dr. Latrice Landry, the study's first author and a genetics fellow at Harvard Medical School.

The researchers noted that policies and guidance have begun to emerge. For example, NIH has a policy on including women and members of minority groups in research and the FDA has issued guidance on collecting and reporting race and ethnicity information in clinical trials. But more can and should be done. The researchers recommended that:

  • Scientific journals' editorial boards specify inclusion standards or justification for lack of diversity as a requirement for publication.
  • Implementation of innovative and culturally respectful strategies for recruiting underrepresented groups in research.
  • Patients, providers and researchers be educated about the value of participating in research, including contributing biological samples to biobanks.
  • Genomic information, such as test results, must be accessible, approachable and useful for diverse patient populations.
  • Databases include information on disease prevalence, disparities in disease morbidity and death and pertinent genetic factors in underrepresented populations.
  • Submissions to databases should include ancestral information in the submission so that differences and progress across populations can be tracked.

"We must make our genomic databases more diverse and inclusive so that researchers can contribute new knowledge of the role of genomics - with lifestyle and environment - on human health," said Dr. Heidi L Rehm, co-author and chief genomics officer in the Center for Genomic Medicine and Department of Medicine at Massachusetts General Hospital. She is also medical director of the Broad Institute Clinical Research Sequencing Platform.

Read the study: Lack Of Diversity In Genomic Databases Is A Barrier To Translating Precision Medicine Research Into Practice

*Dr. David R. Williams, Florence Sprague Norman and Laura Smart Norman Professor of Public Health at the Harvard T.H. Chan School of Public Health, also contributed to the study.

Last updated: May 8, 2018