National Institutes of Health U.S. Department of Health and Human Services
The NIH/NIAID/Wellcome Trust
Workshop on Model Organism Databases
National Human Genome Research Institute
April 29-30, 2002
Objective: To provide a set of prioritized recommendations for developing a long-term strategy for model organism databases, focusing on interoperability and standardization.
Background: Model organism databases (MODs) were developed to store and display genetic data from several well-studied organisms. Recently these databases have expanded to include the genome sequence and related information. Currently a draft human sequence is available along with the genome sequences of several animals, several plants, and at least 30 microbes. The number of completed genomes will increase dramatically in the next few years. New types of genetic and genomic information are becoming available, such as expression data, metabolic pathways, genetic variation and systematic phenotypes. The MODs obtain some information from automated data deposition and analysis and some from curation of the literature. Integrating this information opens up new opportunities for research in biological and disease processes. Thus there is an accelerating need for integrating new types of data into existing MODs and for creating new MODs for additional organisms.
For the most part, MODs evolved independently, with a lack of standards for developing, curating and maintaining these databases. The potential exists for incompatibility and inconsistencies in ways in which genome data are annotated, stored and analyzed. A pressing issue is to develop methods to make MODs interoperable and to come up with standards for their development and operation.
Goals: The goals of the workshop are to:
Review the current status of different types of MODs.
Discuss needs for creating new databases and developing new features for established databases.
Discuss scientific issues related to genome databases including interoperability, curation, long-term sustainability, central database v. multiple databases, and the possible compilation of standard operating procedures for developing and maintaining genome databases.
Discuss the development of a generic model organism database that may be useful for scientists developing databases for additional organisms.
Determine a set of prioritized recommendations for a long-term strategy for genome databases, focusing on interoperability and standardization.
Monday, April 29, 2002
Registration and breakfast
Helen Fisher, Wellcome Trust
Maria Giovanni, NIAID
Peter Good, NHGRI
A large curated database: FlyBase - Bill Gelbart
A large automated database: Ensembl - Ewan Birney
A small database: ViruloGenome - Arshad Khan
Tools for data exchange: The OmniGene System - Brian Gilman
A newly-forming database: DictyDB - Rex Chisholm
Tools for interoperability: GO Consortium - Suzanna Lewis
Strategies for reducing costs for new MOD projects: Lincoln Stein and Steve Oliver, Session Leaders
"Give it to us; we'll do it for you".
SAS-like integrated application.
Horizontally-integrated (federated) toolkit.
Useful toolkit; some assembly required.
Needs analysis: What does a new databases need? (session chair: Michael Ashburner)
Interoperability: How important are interoperable architectures? What other methods work? (session chair: Ewan Birney)