Last updated: April 02, 2012
Genome Advance of the Month
Discovering the Mutants Among Us
By Joy Yang
LoF is defined as a genetic variant that is predicted to cause a loss of function in protein-coding genes; in other words, some change in the genome sequence that prevents the production of a normal protein. Because LoF variants can result in debilitating diseases such as cystic fibrosis and Duchenne muscular dystrophy, they are usually thought to be rare; however, many scientists suspect that LoF variants may actually be quite common, even among healthy people. In fact, the first few whole genome sequences produced following the Human Genome Project each contained several hundred LoF variants in apparently normal individuals.
Using data from the pilot phase of the 1000 Genomes Project, a team of researchers lead by Daniel MacArthur, Ph.D., then a postdoctoral researcher at the Wellcome Trust Sanger Institute in Hinxton, UK, set off to take a closer look at this class of variants and measure how rare they truly are.
In the decade since the end of the Human Genome Project, collaborations such as the International HapMap Project have tried to measure how much genomes vary among populations worldwide. The current international collaboration, called the 1000 Genomes Project, aims to sequence the genomes of more than 2,500 individuals to find all variants occurring with a frequency of at least 1 percent in the populations studied. The pilot phase of the project is based on 185 individuals selected from three main populations: the Yoruba from Ibadan, Nigeria; the Utahans of Northern or Western European descent; and the Chinese from Beijing, as well as the Japanese from Tokyo.
MacArthur and his colleagues identified 2,951 candidate LoF variants in the dataset, and used bioinformatics tools to remove variants that may be the result of sequencing errors or annotation artifacts. They then experimentally validated and manually re-annotated the remaining candidates to obtain a set of "high-confidence LoF alleles" — LoF variants that are almost certainly real.
The result from this filtered dataset is surprising: the researchers estimate that genomes of healthy individuals each contain about 100 LoF variants, and approximately 20 of these genes are completely inactivated. How can a person carry so many LoF variants and still be healthy?
By examining the details for one particular individual of European ancestry, identified only as NA12878, the genome carries 97 LoF variants: 18 were in a homozygous state (meaning no functional protein would be expected), and 28.7 percent of LoF variants affect only a subset of known transcripts from their respective genes.
And in the larger picture, genes containing high-confidence LoF alleles appear to be less evolutionarily conserved than gene regions that do not lose their function. It makes sense that the more important a gene's function, the less likely that evolution will tolerate broken versions of the gene; the individual would not survive to reproduce.LoF-containing genes have more paralogs (genes in the same family that are likely the result of duplication events) and higher sequence similarity to paralogs, suggesting redundancy. In other words, if a gene is inactive, its function may be somewhat compensated for by that of a paralogous gene.
LoF-containing genes also have fewer interactions with other genes, and the proteins they encode for have fewer interactions with other proteins, which suggests that they are part of fewer essential pathways within cells.
LoF variants are more common in genes that code for olfactory receptors (the receptors in the nose that produce the sense of smell) and less common in genes that are involved in critical functions such as protein-binding, gene regulation, and anatomical development.
Speaking about the study, Dr. MacArthur said: "It certainly came as a surprise to see so many loss-of-function variants remaining in a "typical" genome even after applying such stringent filters. It's a useful reminder that the human genome isn't static; there are many genes that are present in some people and knocked out in others, including some that will eventually disappear from the human population altogether."
In one respect, this study of loss-of-function variants is quite humbling; it shows just how many gaps remain in understanding genome biology. Seventy percent of high-confidence LoF variants found in the 1000 Genomes Project pilot data did not exist in dbSNP (a database of gene variants slowly accumulated since the start of large-scale sequencing in the late 1990s). It also demonstrates how much can be learned by systematically combing through large datasets. The analytical strategies used in this study are likely to be highly valuable with the increasing number of whole genome sequencing studies. And since LoF variants that are highly deleterious exist at low frequencies (which is why rare diseases are rare), data from additional studies and larger scale sequencing efforts will be instrumental in studying these alleles and their effects on human development and biology in greater detail.
Read the study:
MacArthur, D. G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J., Walter, K., Jostins, L., et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science, 335(6070), 823-828. 2012. [PubMed]
Posted: April 2, 2012