Conclusions, Agenda, Participants: Annotation Meeting

National Human Genome Research Institute

National Institutes of Health
U.S. Department of Health and Human Services

Conclusions, Agenda and Participants for the
Annotation Meeting

January 7, 2000
Bethesda, Md.


Needs identified:
  1. Clone annotation: to be done by NCBI and genome centers.
  2. International gene index: to be developed by NCBI and EBI.
  3. Definitive sequence in a computable form: to be provided by NCBI/EBI/DDBJ and genome centers.
  4. Biological curated annotation: requires further study. NCBI proposed a sequence-linked reviews program to be funded by NHGRI.
  5. Controlled vocabularies for function: this work has started but needs to be developed more and extended to humans.

Resources for assembling the sequence: We need to annotate clones and contigs as the sequence is being assembled. Much of this need will disappear when the genome sequence is finished, although researchers will still need information about individual sequenced clones so they can select particular ones to study.

International Gene Index: A major immediate need is an authoritative list of human genes. This should be a collaborative effort of NCBI and EBI, which should produce the one central list providing canonical gene models. It is essential to have standard names for genes and other genomic objects. There can be aliases, but everybody must be able to use the standard name. (These names are for tracking the genes. Functionally informative names can be provided later.) NCBI and EBI can set up a process to compare gene predictions by contig, and produce a correspondence table:

IGI name, NCBI id, EBI id, Evidence (cDNA, EST, gene-predicting program, etc.).

NCBI and EBI should announce this project soon. They expect to have the process for reconciling gene models set up by April 1, and full production efforts a month or so later. At the May CSHL Genome meeting they will describe the process and the first version across the genome. The draft sequence can be used, although we do not want a draft IGI. There should be two-way links to the lists of other groups.

Computational methods of gene-finding are quite error prone. Some exons of a given gene can typically be identified, but the likelihood of finding all exons and of correctly stitching them together into the correct predicted coding sequences or mRNA is low. Thus, the gene models will change over time, and a robust nomenclature and versioning system needs to be in place so that users can move between old and new datasets. For example, the genes used on a chip should be trackable five years later. As the list is updated, we need to be able to transfer the biological information that has accumulated.

NCBI will work with its advisory committee. The IGI needs its own advisory group, with academic, industry, and international representatives.

Definitive sequence: We need one definitive sequence, with annotation. The international sequence databases and the genome sequencing centers need to work together to produce it. The sequence needs to be computable; researchers need to be able to access it so they can query the sequence many ways. Some types of information can be pre-computed, and there should be views to answer popular questions. The sequence and annotation data have to be freely available in an easily downloadable way.

Automatic annotation: We need to have one authoritative view, for users who want just one view, provided by the international sequence databases. Many other users will be interested in various views, in order to validate models, to understand differences among models, and to choose particular useful views. Various methods of viewing the data will have different advantages. It will be important that the international sequence databases link to and from views provided by other databases, to make them easily accessible. Annotation needs to be done uniformly across the genome, and the annotation must be updateable.

Function annotation: The hard part will be keeping track of gene function. Some function information is predicted computationally; some is verified experimentally. Some researchers will be interested in any information on a gene; others will want only verified information. There is much interest in proteins, gene structure, and regulatory regions.

Biological curatorial annotation: We need to determine what the community needs and the best ways of meeting those needs. Models in use or proposed include:

Database editor: An editor of a database summarizes what is known about a gene. This model is used by OMIM (Online Mendelian Inheritance in Man), FlyBase, and the yeast database Saccharomyces Genome Database (SGD). The editors need to be Ph.D. level biologists, to synthesize the literature. In yeast, the community contacts the editors about problems in these entries. In humans, the community does not do a good job of pointing out mistakes.

Annotation meeting: The Celera annotation jamboree of the completed Drosophila sequence is an interesting model of how to get the community involved in annotation. The software and analytical tools for doing the annotation need to be provided. Some of the information coming out of the meeting was put directly into the database; some of the information was more complex biology that will be published.

Sequence-linked reviews program: NHGRI could have a small competitive grants program where PIs write book-chapter-type reviews on genomic topics. Such reviews would summarize the biology, and would be closely linked to the database. This approach allows flexibility, PI initiation of topics, PI credit, diversity of views, and scalability.

Controlled vocabularies for function: FlyBase, SGD, and the Mouse Genome Database (MGD) are working out a common controlled vocabulary for cellular location, biochemical function, and biological process. The vocabularies should be extended to include information on humans and other organisms.

Top of page


Annotation Meeting

Ramada Inn
January 7, 2000
8:30 a.m. Introduction: Collins
Chair: Gelbart

What have we learned?
8:40 a.m. Experience with Drosophila annotation: Gelbart
9:00 a.m. Discussion
9:10 a.m. Experience with yeast annotation: Botstein
9:30 a.m. Discussion
9:40 a.m. What annotation do we want, on what time-scales?
Discussion, led by Lander
11:00 a.m. Lunch

What are we doing already?
12:00 noon NCBI plans for human annotation: Lipman
12:20 p.m. EBI plans for human annotation: Ashburner
12:40 p.m. Discussion
1:20 p.m. What still needs to be done, and how?
Discussion, led by Gelbart
2:10 p.m. What can NHGRI contribute to the effort?
Discussion, led by Collins
3:00 p.m. Adjourn

Top of page

Annotation Meeting Participants

Visiting Scientists
Michael Ashburner
Wellcome Trust Genome Campus
Hinxton, Cambridge
United Kingdom
Bus: 011-44 1223 494648
Fax: 011-44 1223 494470
Eric Lander
Whitehead Inst. for Biomedical Research
One Kendall Square
Bldg. 300
Cambridge, MA 02139-1433
Bus: (617) 252-1906
Fax: (617) 258-0903
David Botstein
Department of Genetics
Stanford University School of Medicine
Room L-321
Stanford, CA 94305-5120
Bus: (650) 723-3488
Fax: (650) 723-7016
Suzanna Lewis
Department of Molecular and Cell Biology
University of California, Berkeley
Room 545 LSA Bldg. 3200
Berkeley, CA 94720
Bus: (510) 643-0514
Fax: (510) 643-9947
John Bouck
Human Genome Sequencing Center
Baylor College of Medicine
Room N1521
1 Baylor Plaza
Mail Stop BCM226
Houston, TX 77030
Bus: (713) 798-7206
Fax: (713) 798-5741
G. Christian Overton
Center for Bioinformatics
University of Pennsylvania
1312 Blockley Hall (6021)
418 Guardian Drive
Philadelphia, PA 19104-6145
Bus: (215) 573-3105
Fax: (215) 573-3111
William Gelbart
Department of Molecular and Cellular Biology
Harvard University
16 Divinity Avenue
Cambridge, MA 02138
Bus: (617) 495-2906
Fax: (617) 496-1354
David Valle
Johns Hopkins University
School of Medicine PCTB, Room 802
725 N. Wolfe Street Baltimore, MD 21205
Bus: (410) 955-4260
Fax: (410) 955-7397
Steven Henikoff
HHMI-Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N.
Room A1-162
P.O. Box 19024
Seattle, WA 98109-1024
Bus: (206) 667-4515
Fax: (206) 667-5889

Owen White
The Institute for Genomic Research
9712 Medical Center Drive
Rockville, MD 20850
Bus: (301) 838-0200
Fax: (301) 838-0209

NCBI Staff
David Lipman (Director)
Bldg. 38A, 8N803
Bethesda, MD 20894
Bus: (301) 496-2475
Fax: (301) 480-9241
James Ostell
Bldg. 38A, 8N813
Bethesda, MD 20894
Bus: (301) 296-2475
Fax: (301) 480-9241
Mark Boguski
Bldg. 38A, 5N503
Bethesda, MD 20894
Bus: (301) 435-6015
Fax: (301) 480-9241
Gregory Schuler
Bldg. 38A, 5N509
Bethesda, MD 20894
Bus: (301) 594-4931
Fax: (301) 480-9241
Eugene Koonin
Bldg. 38A, 5N505A
Bethesda, MD 20894
Bus: (301) 435-5913
Fax: (301) 480-9241

Stephen Sherry
Bldg. 38A, B2N21
Bethesda, MD 20894
Bus: (301) 435-7799
Fax: (301) 480-9241

Office of the Director
Francis Collins (Director)
Elke Jordan (Deputy Director)
Bldg. 31, 4B09
Bethesda, MD 20892
Bus: (301) 496-0844
Fax: (301) 403-0837
Office of Scientific Review
Ken Nakamura
Rudy Pozzatti
Bldg. 31, B2B37
Bethesda, MD 20892
Bus: (301) 402-0838
Fax: (301) 435-1580
Program Staff
Lisa Brooks
Yasmin Cypel
Elise Feingold
Adam Felsenfeld
Bettie Graham
Mark Guyer
Jane Peterson
Jeffery Schloss
Kris Wetterstrand
Bldg. 31, B2B07
Bethesda, MD 20892
Bus: (301) 496-7531
Fax: (301) 480-2770
Center for Inherited Disease Research
Jerry Roberts
Bldg. 31, B2B37
Bethesda, MD 20892
Bus: (301) 402-0838
Fax: (301) 435-1580

Top of page

Last Reviewed: May 2006