National Institutes of Health U.S. Department of Health and Human Services
Proteomics Planning Workshop
National Human Genome Research Institute
Natcher Conference Center
National Institutes of Health
April 25-26, 2002
A workshop was convened by three National Institutes of Health institutes - the National Cancer Institute, the National Institute of General Medical Sciences, and the National Human Genome Research Institute - to review the current status of proteomics and to discuss how these three institutes could most effectively invest in proteomics research. Participants were asked to consider what kinds of proteomic techniques and information would be most useful to address biological and medical questions.
As explicitly defined at the outset of the workshop, proteomics is the study of proteomes, the collections of proteins encoded by genomes. The term 'proteomics' carries with it the connection to genomics and the implication of completeness. It is completeness, and the global view afforded by completeness, that distinguishes proteomics from rapid biochemistry. In fact, proteomics achieves its greatest power in the comparison and analysis of multiple, complete proteomic datasets.
The general goal of proteomics is to monitor the properties of the entire complement of proteins from a given cell or organism, and to determine how these properties change in response to various physiological states, such as signaling ligands, cell cycle, and disease. Much of the discussion in the workshop was on 'focused proteomics,' studies of portions of proteomes, which are undertaken as explorations of new methods or as pilots, often looking at specific tissues, cell types, biological pathways or diseases. Focused proteomics projects may represent a useful way to do limited studies on the way to full proteomics, although they can be informative explorations of specific problems in their own right.
Both short- and long-term goals for proteomics evolved from the discussions of the workshop. A major goal for the short term is to profile, or to take the census of, proteins present in particular cell types. Profiling is approaching reality for cells from some model systems, including numerous microorganisms. Challenges include profiling proteins present at low abundance and membrane proteins, and the proteins of higher cells. Learning the absolute abundance of each protein is the next step, including splice variants of each protein, and all modified forms. With over 200 known types of covalent modifications of proteins, profiling modified proteins presents a formidable challenge. In the short term, examining the protein profiles of diseased cells and comparing them to normal profiles can offer diagnostic and prognostic tools in medicine. Yet another goal of proteomics is to map the protein interactions in each cell type.
In the longer term, a complete interaction map of proteins in human cells will serve as an atlas for the biological and medical explorations of cellular metabolism. Along with this goal, scientists want to know the cellular location of each protein, and how it changes during a protein's lifetime and the cell cycle. The dependence of protein populations on cell cycle, and on perturbations by small molecules (nutrients and drugs) and by protein ligands needs to be catalogued. In the long run, such studies will provide the informational infrastructure for biology and medicine. It is conceivable that in the midterm, proteomic information from both humans and pathogens will provide crucial information for routine medical diagnosis and treatment.
A wider view of proteomics must include the cellular census of lipids, carbohydrates and other metabolites, the subfield of metabolomics. Changes of these populations of small molecules upon perturbations of various sorts, including those of drugs, will provide information crucial to enhanced pharmacology and nutrition.
Classical biomedical research, in which one system is studied thoroughly, can be greatly enhanced by proteomics. Proteomics aids in the discovery of protein function; it helps in understanding specific biological processes such as apoptosis and the cell cycle, and it helps in developing new tools for diagnosis and therapy. The approaches of both the broad view and the single intense view achieve their greatest clarity when both illuminate the other. Focused proteomics offers a middle ground between the intensive, classical view taken by biomedicine and the global proteomic view. This middle ground allows biological systems and clinical protocols to be explored with existing, albeit limited technologies as new technologies are tested.
The workshop developed broad consensus about the following major issues in proteomics:
Overall: Proteomics promises to play a major role in the future of clinical medicine. A strong case was made for a rapid 'focused proteomics' project to study at least one tissue, e.g., serum, in an attempt to test the conceptual basis for such an approach. Over a longer timeframe, similar studies might be conducted with tissue samples from patients with well defined and highly validated clinical phenotypes. It was emphasized that proteomics was unlike genome sequencing in that proteomes would need to be sampled numerous times to account for tissue type, age, cell-cycle, drug response and many other factors. Of all the prospects for beginning a large-scale proteomics project, participants felt that a profiling project was nearest to being feasible.
Central role of Mass Spectrometry: Mass spectrometry is the analytic tool presently best suited for profiling cellular proteins, de novo protein identification, as well as for related proteomic tasks. A need was seen for both large and small centers.
Large centers could be pilot programs for protein profiling for particular organisms, or for particular functional components of cells, such as cell signaling pathways. Large centers should advance the state of the art, and/or provide centers that are large enough to nucleate establishment of community standards and tools.
Smaller mass spectrometry centers should be available at every institution with significant proteomic research programs.
Clinical mass spectrometry will require its own hardware and specialized software.
Technology Development: Mass spectrometry currently has inadequate dynamic range to analyze low-abundance proteins that are likely to be critical for understanding biological processes. However, participants believed that this was a solvable problem in the relatively near term. At the same time, various microarray and other technologies that offer alternative paths to proteomic information must also be further developed to deal with the challenges of the complexity of protein biochemistry, large dynamic range of concentration, temporal and spatial variability, and the analysis of membrane proteins and those expressed at low levels. For those technologies that already had demonstrated feasibility, it was recommended that they next be applied to model organisms to test the ability to scale-up to high-throughput. Pilot studies in mammalian systems should also be supported. Participants identified the following specific aspects of proteomics for technology development:
Perturbations in Protein Function
Databases and computational biology: Computational biology and databasing occupy crucial positions in proteomics. Proteomic data must be accessible for the field to advance.
The organization and distribution of data must be improved, including standardized formats and ways of expressing uncertainties.
New kinds of proteomics databases must be developed and nurtured; several pilot databases should be supported.
Further algorithmic development is needed in many areas, especially for data analysis, integration of various data types, modeling, and simulation of cellular and organismal behavior.
Interfacing with other forms of biological and clinical informatics is crucial.
Resources and reagents: Various biological and chemical reagents for proteomics need to be made accessible to scientists. For each of these, to the extent possible, reagents and resources should be standardized, quality controlled, validated and, above all, accessible. Specific resources/reagents needed are:
Accurate protein sequence annotations on the genome.
A complete unigene set of full-length or full open reading frame cDNAs in flexible vectors. Since splice variants account for a significant amount of proteomic complexity they need to be considered, as well as a similar 'unidomain' set.
Antibodies and other affinity reagents.
Small molecules, as affinity reagents or as modifiers of protein function.
A comprehensive set of RNAis.
Standard cell and tissue banks.
Training and access: Advanced training and access to technologies need to be greatly enhanced in the proteomic community. Specific recommendations made were to:
Expand training opportunities in proteomics and techniques required for proteomics research, including the establishment of training programs in proteomics, mass spectrometry as applied to proteomics and proteomic bioinformatics.
Develop standards for protein identification and quantification.
Develop exportable and affordable technologies that could be used in a large number of research laboratories to apply proteomics to a wide range of biomedical research problems.
Establish community-accepted standards for all aspects of proteomics, especially data formats, reagent standards and data quality measures to allow for sharing and common analysis of data.
Adopt community standards for publication of proteomics data.
Standardize formats and software for automatic deposition and preliminary processing, including addressing statistical standards for data quality.