Annotating the Human Genome

Summaries of Working Group Proposals

The following descriptions of the scientific rationales for the working groups' proposals are based primarily on the original working group reports submitted in April, and are provided to give detailed scientific justification for the choice of organisms. Choosing additional sequencing targets for the NHGRI sequencing program is an ongoing activity. The working groups will continue to discuss priorities for sequencing as the data from these new sequencing targets accumulates and are analyzed. The working groups will periodically submit updated proposals to the NHGRI. When new proposals are approved, they will be posted on this Web page.

I. Working Group on Annotating the Human Genome (AHG) (Bob Waterston, Chair)

Links to roster and full text of the approved proposal:

The proposal identified several important scientific problems that require additional sequence data to address thoroughly:
  • Accurate description of all exons in the human genome.
  • Identification of conserved sequences involved in gene regulation and other genomic functions.
  • Identification of human-specific functional sequences (i.e., those that have been substituted or modified during primate evolution and which have undergone recent selection).
  • Elucidation of sequence variation in the human population.
The working group proposed a tripartite approach that separately addresses each of those issues:

Component 1: To identify sequences that are broadly conserved across mammalian genomes, a low-redundancy (2-fold coverage) whole genome shotgun sequence of the genomes of eight mammals should be produced. A set of specific organisms was proposed to sample the major branch points of the mammalian tree and to maximize branch length. The underlying analytic foundation for this component (manuscript in preparation) was prepared by a sub-committee of the working group that concluded that low sequence coverage of more mammalian genomes would be more effective in the identification of conserved sequences than deeper sequence coverage of fewer genomes. The working group proposed that 45% of total sequencing capacity for annotating the human genome be directed toward this goal.

Component 2: To identify the important differences between the human and other primate genomes to provide insight into the uniquely human features in the human genome, the working proposed that an initial effort be made to obtain high quality sequence of the genomes of two great apes, the orangutan and gorilla. The working group proposed that 35% of total sequencing capacity be directed toward this goal.

Component 3: To obtain a broader and more complete assessment of the extent of genetic variation in the human population, the working group recommended that additional coverage of the human sequence be obtained, and suggested that 100-fold additional sequence coverage would be appropriate. The working group proposed that 20% of total sequencing capacity be directed toward this goal.

II. Comparative Genome Evolution (CGE) (Laura Landweber and John Gerhart, Working Group Co-Chairs)

Links to roster and full text of the approved proposal:

The working group's purview included:

  • Identification of organisms in critical phylogenetic positions from which genomic sequence data would illuminate the advent of major morphogenetic or physiological innovations in human evolution.
  • Identification of organisms from which genomic sequence data would allow the identification of conserved functional regions in the existing genome sequences of important non-mammalian model systems.
  • Provision of information that would allow investigators to address basic questions about genome evolution, e.g., evolutionary rates; speciation; genome reorganization and origins of variation.
The proposal put forward by this working group also had three components:

Component 1: The metazoan origins of the human genome. The proposal presented a largely phylogenetic approach focused on eukaryotic clades that are currently unrepresented in the genomic sequence data sets. These data would illuminate the genomic basis for major innovations in the evolutionary lineage leading to humans, beginning with jawed fish and moving down to protists. Nine organisms were proposed for genomic sequencing. The proposal discussed a number of additional organisms whose genomes are not currently ready for sequencing for one reason or another, and proposed that preliminary data be obtained about them to allow subsequent decision making on a more informed basis.

Component 2: Linking genomic change to life history and behavior. The working group was interested in analyzing and comparing the gene and genome structures of organisms with large and small genomes. However, the initial discussion did not proceed into enough depth to produce any candidate organisms for the initial proposal; it will be considered in future working group discussions.

Component 3: Protist origins of the human genome. Although the evolutionary history of protist genomes indicates a very wide range of genetic diversity, based on experimental rRNA sequence data, it appears that some protist clades are ancestral to all other eukaryotes. Understanding their genomes may be pivotal in illuminating evolutionary activity at the base of the tree of life. However, there are concerns about complexity and size of many protist genomes. The working group's proposal included representatives of three well-characterized genera for genomic sequencing and several others for preliminary sequence investigation.

To view the PDFs on this page you will need Adobe Acrobat Reader.Download Adobe Acrobat Reader

Top of page

Last Reviewed: February 28, 2012