National Institutes of Health U.S. Department of Health and Human Services
A Haplotype Map of the Human Genome
Harvard Medical School, Massachusetts General Hospital, Whitehead Institute
Whitehead Institute and MIT
The next key step of the Human Genome Project (HGP) (following the creation of the genetic, physical, sequence and SNP maps) is the generation of a "haplotype" map of the human genome. Such a "haplotype" map consists of a high density of SNPs defining the small number of ancestral haplotypes (blocks of tightly correlated genetic variants) in each region of the human genome. Knowledge of these haplotypes will allow comprehensive and efficient testing of the association of human genes with human diseases. The haplotype map can and should be generated rapidly and should be made freely available to researchers worldwide.
A haplotype map of the human genome has become both justified and practical due to significant advances over the last two years.
Specifically, these advances include:
Genomic Sequence: The development of a complete genome sequence - integrated with human genes and annotations - providing a reference framework on which to layer knowledge about allelic variation.
Genetic Variants: The development of a dense (and rapidly growing) map of 1.4 million human SNPs provides a genome-wide resource of genetic variation adequate to uniquely tag the vast majority of human haplotypes.
Genotyping Technology: The development of high-throughput methods, allowing a rapid, efficient and cost-effective experimental approach to a project of the required scale.
Long-range LD: The discovery that human SNPs display strong linkage disequilibrium (LD or allelic association) over large distances. LD is detectable over distances in the range of 100kb and is extremely strong over regions spanning several tens of kb (the size of typical genes). For such regions, the vast majority of chromosomes in the population carry one of a handful of highly conserved haplotypes. As a result, genetic diversity in the region can be represented by a small number of well-chosen SNPs.
Impact on biomedical research
The availability of a haplotype map of the human genome will have a substantial impact on human genetic studies.
Specifically, these studies include:
Comprehensive association studies of individual genes. The association of genes with disease has traditionally been probed by testing individuals SNPs one-at-a-time. The drawback to this approach is that the task is never-ending: one can exclude particular SNPs as playing a role, but one cannot exclude a gene. Once the haplotype structure of the genome is defined, one can (1) comprehensively test all significant haplotypes in the gene, and (2) decrease the number of SNPs needed by selecting a subset that defines the population variability. This will allow haplotype studies of individual genomic loci in an unbiased manner, without assumption about the locations of causal mutations in coding regions, promoters or regulatory sites at significant distance away. And, it will greatly decrease the technical and financial barriers faced by laboratories in undertaking such work
Genome-wide association studies. A genome-wide haplotype map will make possible whole-genome scans for association in the population. Rather than focusing only on 'candidate' genes, it will become possible to search the genome in an unbiased manner for genes whose common variation contributes to disease in the population. Routine use of genome-wide association studies will also require further decreases in genotyping costs, but such decreases are likely to be driven by the development of the haplotype map.
Human population structure and history. Knowledge of haplotypes will transform our understanding of human population structure and history. The LD pattern turns out to be an extremely sensitive indicator of population history, because the multi-allelic nature of haplotypes provides rich detail and because the breakdown of haplotypes follows a predictable clock set by recombination rates. In particular, LD patterns are more powerful than traditional studies of allele frequencies per se. Information about human population history is interesting in its own right, but is also very valuable in the design of medical studies (such as admixture mapping).
Generating a haplotype map would involve the following components:
Population Samples. Development of appropriate population samples, consisting of parent-offspring trios (to allow inference of haplotypes). We estimate that a total of about 300 samples will be needed, representing major ethnic groups in a manner appropriate for generating a map that can be used for medical studies in all populations. The population samples should be a renewable resource (i.e., immortalized cell lines).
Sample and Data Availability. The samples should be made freely available so that any interested scientific group can contribute data (in the manner of the CEPH panel and the DNA Polymorphism Discovery Resource). Conversely, all data generated by the project should be immediately released into the public domain without restrictions of any kind.
Numbers of SNPs to be genotyped. It is estimated that generating the haplotype map will require successful genotyping of 450,000 SNPs, which will in turn require initial testing of some 800,000 to 900,000 SNPs. The required scale is now well within reach: the Whitehead and Sanger Centre are each currently engaged in pilot projects involving 25,000 SNPs using automated genotyping setup and MALDI-TOF-based detection. Given the required scale and efficiencies, it is likely that the bulk of the work should be performed by a few large groups, but all groups should be encouraged to participate in the project by analyzing genes and regions of interest.
Analytical Tools. The project will require various analytical tools to readily define haplotype blocks from genotype data, software systems to aid in the hierarchical selection of SNPs to fill in blocks, and databases to make the information maximally useful to the community. Prototype systems have been developed, but focused effort will be needed to develop mature systems.