NHGRI logo

How a small group of scientists and educators are enhancing the diversity of the genomics workforce

The NIH-funded Genomic Data Science Community Network unites institutions from across the country to diversify young researchers working in genomic data science.

In sunny central California, Rosa Alcazar, Ph.D., spends her days teaching biology at Clovis Community College. For the past few years, she has been bringing an unusual addition to her students: genomic data science research. 

Genomic data scientists use computational and statistical methods to study genomics and biology in hopes of understanding living systems, biological processes, and the roles that genomics plays in health and disease. The last two decades have brought breathtaking advances in genomics technologies, such as high-throughput DNA sequencing, which has led to massive increases in the availability of genomic data. 

One area of genomic data science that is underpowered is the presence of students and early career researchers from diverse institutions and backgrounds who can analyze and interpret the data. The idea of teaching genomic data science and research skills to community college students was an obvious solution for Dr. Alcazar, a biology instructor at Clovis Community College.

“At many community colleges, the professors typically don’t have research laboratories, so students may not realize that science is more than just preparing for a test,” she says.

Genomic data science is unique since it does not require specialized and expensive laboratory equipment — just a laptop, wi-fi and a curiosity for unraveling the complexities of genomics. 

We built the Genomic Data Science Community Network because we envisioned a diverse scientific community engaged in genomic data science in which researchers, educators and students are not limited by the institution’s resources, location, infrastructure, or reputation.

Widening the network of genomic data scientists

Dr. Alcazar is not the only professor bringing genomic data science to community colleges and underserved institutions. 

She is part of the Genomic Data Science Community Network, a group of faculty members and educators from various U.S. institutions that aims to broaden access to genomic data science. Funded by the National Human Genome Research Institute (NHGRI), the Community Network supports programs in education and research at diverse institutions, including those that traditionally have limited access to infrastructure, access to large datasets and research funding. Often, these sites include community colleges, historically Black colleges and universities, Hispanic-serving institutions and tribal colleges and universities.

According to a report from the National Science Foundation, only around 17% of bachelor's degree recipients in the science and engineering field were Hispanic, Latinx, Black, African American, multiracial or Indigenous Americans even though these groups represent around 30% of the American population. In the data science field, the disparities between different racial and ethnic groups are even greater, according to a study

The Genomic Data Science Community Network, which started in 2020, consists of 20 educators and researchers from various institutions across the country. Through a series of symposia and meetings, organizers from NHGRI, Johns Hopkins University, Fred Hutchinson Cancer Research Center and the Carnegie Institution for Science collectively work to provide a platform for faculty members at underserved institutions to engage their students in data science research.

“We built the Genomic Data Science Community Network because we envisioned a diverse scientific community engaged in genomic data science in which researchers, educators and students are not limited by the institution’s resources, location, infrastructure, or reputation,” says Shurjo Sen, Ph.D., a program director within NHGRI’s Office of Genomic Data Science and an organizer of the Genomic Data Science Community Network.


Students in the lab at Clovis Community College

Caption: Dr. Rosa Alcazar (right) incorporates genomic data science research into her courses at Clovis Community College. Credit: Clovis Community College.

Removing barriers to scientific research

Dr. Alcazar’s passion for bringing more scientific research to community colleges came from her own experiences and challenges gaining research experience.

After graduating high school, she went straight to work at an office. “I was the first person in my family to graduate high school, so there was no expectation that I would do anything more than that. It was already a huge accomplishment that I was working at an office,” she says.

She started taking classes at a local community college, Pasadena City College, in the evenings after work. Her curiosity in biology classes led her to become a biology major. But working at a nine-to-five job prevented her from taking laboratory classes since they were not offered in the evenings.

“I had to quit my job to be able to take laboratory classes. But to do that, I had to save money for a year and apply for financial aid,” she said. “It was really the support and inspiration from my mentor Jose Macias, a math instructor, that gave me the courage to do this. Many people would not be willing to take that risk because of even greater responsibilities.”

Dr. Alcazar transferred to the University of California, Riverside, where she had the opportunity be part of the McNair Scholars program over a summer. “That really opened a lot of opportunities for me,” she says. “I had mentors and other faculty who could see my potential and recommended me for graduate school.”

After completing her Ph.D. in biology at Johns Hopkins University and a postdoctoral fellowship at Stanford University, she intended to stay in research but was then pulled towards education, with an aim to remove barriers for community college students to enter the scientific field.

Since 2017, Dr. Alcazar has been an instructor with tenure at Clovis Community College, where she teaches biology to science and non-science majors and brings genomic data science to her students. In addition to being part of the Genomic Data Science Community Network, she co-founded C-MOOR at Clovis Community College, which provides genomics modules embedded in required courses that allows students to use genomics data in a small research project and present their own scientific poster at an annual symposium.

“One significant challenge I encountered upon transferring to University of California, Riverside was my lack of experience and knowledge about basic research,” she said. “Recognizing the potential barrier this poses for people like me, especially those from community colleges, prompted my interest in offering early research opportunities and mentorship within institutions serving students who may have limited resources or our first-generation college students. I firmly believe that by doing so, we can enhance retention rates in the sciences and increase resilience among this population of students.

Building resources and connecting educators

GDCSN educators and researchers at Fred Hutch Cancer Center

Caption: Educators and researchers in the Genomic Data Science Community Network. Credit: Genomic Data Science Community Network


Members of the Genomic Data Science Community Network are building modules of coursework in genomic data science specifically targeted towards students in underserved institutions. Through NHGRI, students can access cloud computing platforms, such as the Genomic Analysis, Visualization and Informatics Lab-space (AnVIL), to access and analyze large genomic datasets in a collective and collaborative way.

The network is also developing coursework in collaboration with Open Case Studies Project, an educational resource for teaching data science.

Miguel Méndez-González, Ph.D., an assistant professor at the University of Puerto Rico at Aguadilla and a member of the Genomic Data Science Community Network, has been working towards improving the biology program and bringing bioinformatics and genomic data science into his institution.

“A few years ago, we didn’t have the tools or knowledge to offer a minor in bioinformatics at the university,” he said. “I have used modules developed by the Genomic Data Science Community Network in my classes to teach students genomics, data science and bioinformatics skills. Through these resources, I hope that the students can understand the importance of genomic data science and see it as a possible career for them.”

Many more faculty members have shared success stories of how the network’s resources have enriched their teaching, such as helping them develop course-based undergraduate research experiences to immerse the students in research. The network has also helped develop and launch a bioinformatics specialization at the University of Texas El Paso and is helping researchers at the Northwest Indian College and Salish Sea Research Center develop free and public online genetics courses for tribal colleges that emphasizes Indigenous genetics research. 

Importantly, the Genomic Data Science Community Network connects educators from various institutions across the country with the same vision for genomic data science. 

“I have had the privilege of collaborating with exceptional leaders and scientists committed to enriching the field of genomics through diversity,” Dr. Alcazar said. “The network organizers, notably Mike Schatz, Jeff Leek and Shurjo Sen, have been pivotal in providing a platform that has allowed me to share my teaching philosophy and equip me with the necessary resources and speaking opportunities to disseminate my materials.”

“Their support has enabled me to connect with individuals across the network—people that I might never have met otherwise—who share a common goal of advancing genomic data science and promoting diversity in science. Being part of this community has been incredibly enriching,” she said.

Creating stepping stones towards a more diverse genomic data science workforce

The network hopes to expand its reach and include more institutions, forming a larger community of faculty engaging students in genomic data science.

Many faculty in the network are working with their students on a project known as Biodiversity and Informatics for Genomic Scholars (BioDIGS), which is using genomics tools to document biodiversity in the soil and investigating connections between the environment and human health. By analyzing the genomes of microbes in the soil all across the country, they are already discovering many species, genes, and environmental relationships.


Students at Johns Hopkins

Caption: Students are using genomic approaches to study biodiversity in the soil and to find connections between the environment and human health. Credit: Michael Schatz, Johns Hopkins University, for BioDIGS.


The Genomic Data Science Community Network recently published a paper outlining the ways in which the research and educational community can help broaden access to genomic data science for students from traditionally underrepresented groups. This includes supporting the professional development of faculty members at underserved institutions and incentivizing research collaborations to broaden access to research equipment

"The Genomic Data Science Community Network is one of many key efforts led by NHGRI to enhance the diversity of the genomics workforce,” says Dr. Sen. “A more diverse and inclusive genomics community ensures that all populations can benefit from scientific research and innovations.”


Last updated: June 6, 2024