NHGRI logo

Whole Genome Association Studies

Whole genome association studies can identify specific points of variation in human DNA that underlie particular diseases or effects of medicines. Identifying the genetic factors that influence health, disease and response to treatment is central to discovering and developing next generation medicines that target diseases with increased precision and reduced risks.

Virtually all diseases have a hereditary component, transmitted from parent to child through the 3 billion pairs of DNA letters that make up the human genome. Expanding the knowledge of the hereditary components of health and illness could expedite the development of new therapeutics.

When researchers completed the final analysis of the Human Genome Project in April 2003, they confirmed that the 3 billion base pairs of genetic letters in humans were 99.9 percent identical in every person. It also meant that individuals are, on average, 0.1 percent different genetically from every other person on the planet. And in that 0.1 percent lies the mystery of why some people are more susceptible to a particular illness or more likely to be healthy than their neighbor - or even another family member.

Genetic variation explains the physical differences among people, such as eye color and blood group. Genetic variation also explains why some people inherit relatively rare disorders, such as cystic fibrosis and muscular dystrophy, or inherit an increased risk of common illnesses such as cancer, heart disease and asthma. Understanding how that 0.1 percent of human genetic variation influences health and disease is one of medical science's highest priorities.

To study the scope of human variation, the National Institutes of Health led the International HapMap Project, a collaboration of researchers in many countries that began work in October 2002 to map common human variation in four population groups: Yoruba in Ibadan Nigeria; Japanese in Tokyo; Han Chinese in Beijing and Utah residents with ancestry from northern and western Europe. Thus, this project focused on the 0.1 percent of differences in the human genome, whereas the Human Genome Project focused on the 99.9 percent of similarities.

Genetic information is physically inscribed in a linear molecule called deoxyribonucleic acid (DNA). DNA is composed of four chemicals, called bases, which are represented by the four letters of the genetic code: A, T, C and G. The Human Genome Project determined the order, or sequence, of the 3 billion A's, T's, C's and G's that make up the human genome. The order of genetic letters is as important to the proper functioning of the body as the order of letters in a word is to understanding its meaning.

Researchers have known that exact order of the letters in human genome can change, commonly at a single location. An A, for example, may become a C or a G. This kind of variation is called a single nucleotide polymorphism, or a SNP. Switching a letter in a gene can be compared to how a misspelling can change a word. Most of the time, a SNP is biologically unimportant, just as the meaning of a word can sometimes be unaffected by a single letter change, such as "advisor" and "adviser." Sometimes, a genetic misspelling can slightly change the function of a gene, just as a single letter change in a word can slightly alter the meaning of the word, such as "ensure" and "insure." And, sometimes, letter changes can completely change the meaning, such as "reveal" and "repeal."

Researchers believe there are some 10 million common SNPs in the human genome. Scanning the genomes of large numbers of patients for such a large number of variants would be prohibitively expensive. Fortunately, a major shortcut has been discovered that reduces the workload about 30-fold. When the International HapMap Project was completed in October 2005, the researchers demonstrated that the 10 million variants cluster into local neighborhoods, called haplotypes, and that they can be accurately sampled by as few as 300,000 carefully chosen SNPs. New technological systems allow these SNPs to be systematically studied in high-throughput facilities that dramatically lower the cost.

In genome-wide association studies, researchers compare the genomes of people with an illness, who are referred to as cases, to unaffected people, who are referred to as controls. Through this comparison, it becomes possible to identify the genetic differences between sick and healthy people, even when the genetic differences are subtle. It is likely that for common diseases, the genetic differences will individually have mild impacts on a person's risk. However the combination of many slightly altered genes and an environmental trigger may add up to a major risk for an individual. By identifying these genetic risks, researchers should be able to identify clues to new targets for the development of therapies that treat or even prevent illness.

Until recently, progress in the field of genomic association has been slow and difficult. Researchers had to depend on having a hunch about what genes were involved in disease, and most hunches turned out to be wrong. In a few instances where they were right, however, the consequences of discovery have been exciting. For example, a new class of medicines under development for the treatment of human immunodeficiency virus (HIV) infection is based on the observation that people who lack functioning copies of the gene that encodes the CCR5 protein have an inherited protection from HIV.

Now, with the results of the HapMap and other technological advances, science has powerful research tools for identifying variants that contribute to common diseases. An early example of success is the recent discovery of a variant in the complement factor H gene that represents a major risk factor for age-related macular degeneration, which is a common cause of blindness in the elderly. This finding, which was made possible by a genome association study, has raised the possibility of a whole new approach to preventing this devastating disease. This new initiative, utilizing a unique public-private partnership, will employ these recent scientific advances to begin identifying the genetic roots of common diseases. This landmark collaborative effort will speed the day when safe and effective therapies can be identified and made available to people who need them.

Last updated: July 15, 2011