Free Online Tutorials Teach Anyone How to Use Genome Databases

Free Tutorials on Model Organism Databases image with puzzle piece Sequence data from numerous genomic projects are pouring out of the sequence centers and into public databases at an unprecedented rate. The sequencing projects flooding the free, online databases, such as the Entrez Genome Browser [ncbi.nlm.nih.gov], include the cancer tumor sequencing projects from the National Institutes of Health (NIH), such as The Cancer Genome Atlas, a host of model organism sequencing projects, The 1000 Genomes Project, and related genetic-information producing projects such as the genome-wide association studies (GWAS) that look for genetic risk factors that cause anything from tooth decay to heart disease.

For researchers not trained in the relatively new field of bioinformatics - which applies information technology and computer programming to the field of molecular biology - the complexity of the information flooding the public databases can be overwhelming. Moreover, there is a shortage of bioinformatics experts, making it difficult for some researchers to even form a collaboration that would allow them to use the freely available sequence data in their studies.

This has led a number of federal research centers and institutes to increase the amount of Web-based training that will allow any molecular biologist to use the public databases to advance genomic research. The latest tutorials, funded by the National Human Genome Research Institute, one of the 27 institutes and centers that make up the NIH, provide essential training on the use of model organism genome databases. The tutorials were developed by OpenHelix, LLC, which already has published several free training sessions [openhelix.eu] on its Web site, with more coming in the next year.

"While computers are widely used in society today, there is a gap in understanding how to use bioinformatic tools associated with genome databases," says Peter Good, Ph.D., a program director for NHGRI's Genome Informatics and Computational Biology Program, which is funding the tutorials. "We're really pleased to be able to fund these tutorials that enable scientists without prior training in bioinformatics to ask both simple and complex scientific questions that can be answered by these resources."

Publicly available genome data from an array of model organisms, such as yeast, worms and mice, is widely used to gain critical insights into human biology. NHGRI funded the tutorials to empower the average researcher and encourage even greater exploration of these publicly available genome resources.

The first tutorials focus on GBrowse [gmod.org], a customizable genome browser tool; the Rat Genome Database [rgd.mcw.edu]; the Mouse Genomics Informatics [informatics.jax.org] resource; and WormBase [wormbase.org]. Tutorials will also be freely available in the coming weeks on the Zebrafish Information [zfin.org] network; FlyBase [flybase.org], and the Saccharomyces (Yeast) Genome Database [yeastgenome.org].

Each narrated tutorial, which can be viewed online or downloaded to a user's computer, introduces the resource and shows researchers how to use its features and functions. Resources connected to each tutorial, including PowerPoint slides, handouts, and user exercises are also available. The model organism tutorials are available at OpenHelix [openhelix.com].

Although the human genome sequence is not the focus of the newly funded tutorials, there are numerous publicly available databases that provide both the sequence itself, or data from genome-wide association studies, as well as online tutorials. One such database is the Genome Browser [genome.ucsc.edu] developed by University of California at Santa Cruz (UCSC). The UCSC Bioinformatics group is also funding a free tutorial that is available through OpenHelix on how to navigate their genome browser, which has data from many model organisms that can be compared to the human genome.

One of the most widely used group of databases used by biomedical researchers are those from the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine at NIH. NCBI hosts a one-stop shop to easily access more than a dozen human genome sequence resources at NCBI's Human Genome Resources [ncbi.nlm.nih.gov].

"NCBI databases are used by approximately two million people each day," said NCBI Director David Lipman, M.D. "We are constantly striving to make our resources more intuitive for researchers and scientists, as well as the general public, so they can make the best use of them without extensive training,"

In addition to their user-friendly genome resources, they also have The NCBI Handbook [ncbi.nlm.nih.gov], a comprehensive guide providing summary information of each database that includes basic tutorials. NCBI also hosts a set of interactive tutorials of how their databases were used in recent discoveries called Coffee Break [ncbi.nlm.nih.gov].

With more than 300 genetic risk factors for common diseases found as a result of genome wide association studies, or GWAS, over the last two years, there is also a rising demand for data from NCBI's dbGaP (database of Genotypes and Phenotypes). NCBI offers a tutorial on dbGaP as well.

Additionally, NHGRI hosts multimedia with video and slides from past short courses to help researchers use GWAS data. The Web-based programs include Epidemiology for Researchers Performing Genetic/Genomic Studies and Genetics for Epidemiologists: Application of Human Genomics to Population Sciences.

So, while you may not have a bioinformatician at your disposal, Dr. Good emphasized that "any scientist from any discipline or anyone else who wants to pursue their ideas can easily take advantage of these important databases. Our hope is that anyone can use these resources to make the breakthroughs that could dramatically change our understanding of biology or lead to better treatments for disease."

Last Reviewed: February 26, 2012

Last updated: February 26, 2012