Conclusions, Agenda and Participants for the - Annotation Meeting

Conclusions, Agenda and Participants for the
Annotation Meeting

January 7, 2000
Bethesda, Md.

	Conclusions
	Agenda
	Participants

Conclusions

Needs identified:

Clone annotation: to be done by NCBI and genome centers.
International gene index: to be developed by NCBI and EBI.
Definitive sequence in a computable form: to be provided by NCBI/EBI/DDBJ and genome centers.
Biological curated annotation: requires further study. NCBI proposed a sequence-linked reviews program to be funded by NHGRI.
Controlled vocabularies for function: this work has started but needs to be developed more and extended to humans.

Resources for assembling the sequence: We need to annotate clones and contigs as the sequence is being assembled. Much of this need will disappear when the genome sequence is finished, although researchers will still need information about individual sequenced clones so they can select particular ones to study.

International Gene Index: A major immediate need is an authoritative list of human genes. This should be a collaborative effort of NCBI and EBI, which should produce the one central list providing canonical gene models. It is essential to have standard names for genes and other genomic objects. There can be aliases, but everybody must be able to use the standard name. (These names are for tracking the genes. Functionally informative names can be provided later.) NCBI and EBI can set up a process to compare gene predictions by contig, and produce a correspondence table:

IGI name, NCBI id, EBI id, Evidence (cDNA, EST, gene-predicting program, etc.).

NCBI and EBI should announce this project soon. They expect to have the process for reconciling gene models set up by April 1, and full production efforts a month or so later. At the May CSHL Genome meeting they will describe the process and the first version across the genome. The draft sequence can be used, although we do not want a draft IGI. There should be two-way links to the lists of other groups.

Computational methods of gene-finding are quite error prone. Some exons of a given gene can typically be identified, but the likelihood of finding all exons and of correctly stitching them together into the correct predicted coding sequences or mRNA is low. Thus, the gene models will change over time, and a robust nomenclature and versioning system needs to be in place so that users can move between old and new datasets. For example, the genes used on a chip should be trackable five years later. As the list is updated, we need to be able to transfer the biological information that has accumulated.

NCBI will work with its advisory committee. The IGI needs its own advisory group, with academic, industry, and international representatives.

Definitive sequence: We need one definitive sequence, with annotation. The international sequence databases and the genome sequencing centers need to work together to produce it. The sequence needs to be computable; researchers need to be able to access it so they can query the sequence many ways. Some types of information can be pre-computed, and there should be views to answer popular questions. The sequence and annotation data have to be freely available in an easily downloadable way.

Automatic annotation: We need to have one authoritative view, for users who want just one view, provided by the international sequence databases. Many other users will be interested in various views, in order to validate models, to understand differences among models, and to choose particular useful views. Various methods of viewing the data will have different advantages. It will be important that the international sequence databases link to and from views provided by other databases, to make them easily accessible. Annotation needs to be done uniformly across the genome, and the annotation must be updateable.

Function annotation: The hard part will be keeping track of gene function. Some function information is predicted computationally; some is verified experimentally. Some researchers will be interested in any information on a gene; others will want only verified information. There is much interest in proteins, gene structure, and regulatory regions.

Biological curatorial annotation: We need to determine what the community needs and the best ways of meeting those needs. Models in use or proposed include:

Database editor: An editor of a database summarizes what is known about a gene. This model is used by OMIM (Online Mendelian Inheritance in Man), FlyBase, and the yeast database Saccharomyces Genome Database (SGD). The editors need to be Ph.D. level biologists, to synthesize the literature. In yeast, the community contacts the editors about problems in these entries. In humans, the community does not do a good job of pointing out mistakes.

Annotation meeting: The Celera annotation jamboree of the completed Drosophila sequence is an interesting model of how to get the community involved in annotation. The software and analytical tools for doing the annotation need to be provided. Some of the information coming out of the meeting was put directly into the database; some of the information was more complex biology that will be published.

Sequence-linked reviews program: NHGRI could have a small competitive grants program where PIs write book-chapter-type reviews on genomic topics. Such reviews would summarize the biology, and would be closely linked to the database. This approach allows flexibility, PI initiation of topics, PI credit, diversity of views, and scalability.

Controlled vocabularies for function: FlyBase, SGD, and the Mouse Genome Database (MGD) are working out a common controlled vocabulary for cellular location, biochemical function, and biological process. The vocabularies should be extended to include information on humans and other organisms.

Top of page

Agenda

Annotation Meeting

Ramada Inn
Bethesda

January 7, 2000
8:30 a.m.	Introduction: Collins Chair: Gelbart What have we learned?
8:40 a.m.	Experience with Drosophila annotation: Gelbart
9:00 a.m.	Discussion
9:10 a.m.	Experience with yeast annotation: Botstein
9:30 a.m.	Discussion
9:40 a.m.	What annotation do we want, on what time-scales? Discussion, led by Lander
11:00 a.m.	Lunch What are we doing already?
12:00 noon	NCBI plans for human annotation: Lipman
12:20 p.m.	EBI plans for human annotation: Ashburner
12:40 p.m.	Discussion
1:20 p.m.	What still needs to be done, and how? Discussion, led by Gelbart
2:10 p.m.	What can NHGRI contribute to the effort? Discussion, led by Collins
3:00 p.m.	Adjourn

Top of page

Annotation Meeting Participants

Visiting Scientists
Michael Ashburner Wellcome Trust Genome Campus Hinxton, Cambridge United Kingdom Bus: 011-44 1223 494648 Fax: 011-44 1223 494470 ashburner@ebi.ac.uk	Eric Lander Whitehead Inst. for Biomedical Research One Kendall Square Bldg. 300 Cambridge, MA 02139-1433 Bus: (617) 252-1906 Fax: (617) 258-0903 lander@genome.wi.mit.edu
David Botstein Department of Genetics Stanford University School of Medicine Room L-321 Stanford, CA 94305-5120 Bus: (650) 723-3488 Fax: (650) 723-7016 botstein@genome.stanford.edu	Suzanna Lewis Department of Molecular and Cell Biology University of California, Berkeley Room 545 LSA Bldg. 3200 Berkeley, CA 94720 Bus: (510) 643-0514 Fax: (510) 643-9947 suzi@fruitfly.bdgp.berkeley.edu
John Bouck Human Genome Sequencing Center Baylor College of Medicine Room N1521 1 Baylor Plaza Mail Stop BCM226 Houston, TX 77030 Bus: (713) 798-7206 Fax: (713) 798-5741 bouck@bcm.tmc.edu	G. Christian Overton Center for Bioinformatics University of Pennsylvania 1312 Blockley Hall (6021) 418 Guardian Drive Philadelphia, PA 19104-6145 Bus: (215) 573-3105 Fax: (215) 573-3111 coverton@cbil.humgen.upenn.edu
William Gelbart Department of Molecular and Cellular Biology Harvard University 16 Divinity Avenue Cambridge, MA 02138 Bus: (617) 495-2906 Fax: (617) 496-1354 gelbart@morgan.harvard.edu	David Valle Johns Hopkins University School of Medicine PCTB, Room 802 725 N. Wolfe Street Baltimore, MD 21205 Bus: (410) 955-4260 Fax: (410) 955-7397 dvalle@jhmi.edu
Steven Henikoff HHMI-Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N. Room A1-162 P.O. Box 19024 Seattle, WA 98109-1024 Bus: (206) 667-4515 Fax: (206) 667-5889 steveh@howard.fhcrc.org	Owen White The Institute for Genomic Research 9712 Medical Center Drive Rockville, MD 20850 Bus: (301) 838-0200 Fax: (301) 838-0209 owhite@tigr.org
NCBI Staff
David Lipman (Director) Bldg. 38A, 8N803 Bethesda, MD 20894 Bus: (301) 496-2475 Fax: (301) 480-9241 Lipman@ncbi.nlm.nih.gov	James Ostell Bldg. 38A, 8N813 Bethesda, MD 20894 Bus: (301) 296-2475 Fax: (301) 480-9241 ostell@ncbi.nlm.nih.gov
Mark Boguski Bldg. 38A, 5N503 Bethesda, MD 20894 Bus: (301) 435-6015 Fax: (301) 480-9241 boguski@ncbi.nlm.nih.gov	Gregory Schuler Bldg. 38A, 5N509 Bethesda, MD 20894 Bus: (301) 594-4931 Fax: (301) 480-9241 schuler@ncbi.nlm.nih.gov
Eugene Koonin Bldg. 38A, 5N505A Bethesda, MD 20894 Bus: (301) 435-5913 Fax: (301) 480-9241 ek.34s@nih.gov	Stephen Sherry Bldg. 38A, B2N21 Bethesda, MD 20894 Bus: (301) 435-7799 Fax: (301) 480-9241 ss483o@nih.gov
NHGRI Staff
Office of the Director Francis Collins (Director) Elke Jordan (Deputy Director) Bldg. 31, 4B09 Bethesda, MD 20892 Bus: (301) 496-0844 Fax: (301) 403-0837	Office of Scientific Review Ken Nakamura Rudy Pozzatti Bldg. 31, B2B37 Bethesda, MD 20892 Bus: (301) 402-0838 Fax: (301) 435-1580
Program Staff Lisa Brooks Yasmin Cypel Elise Feingold Adam Felsenfeld Bettie Graham Mark Guyer Jane Peterson Jeffery Schloss Kris Wetterstrand Bldg. 31, B2B07 Bethesda, MD 20892 Bus: (301) 496-7531 Fax: (301) 480-2770	Center for Inherited Disease Research Jerry Roberts Bldg. 31, B2B37 Bethesda, MD 20892 Bus: (301) 402-0838 Fax: (301) 435-1580

Top of page

Last updated: May 01, 2006

Conclusions, Agenda and Participants for the - Annotation Meeting