NHGRI 100 kb Map Workshop

August 27-28, 1995

The National Human Genome Research Institute (NHGRI) convened a workshop on August 27-28, 1995, to discuss the completion of the 100 kb resolution, STS-based physical maps of the human genome. The conclusions reached at this workshop were presented to the National Advisory Council for Human Genome Research (NACHGR) at its meeting on September 11, 1995.

Rationale for Convening the Workshop

NHGRI expected that projects already funded as of August 1995 would bring construction of the 100 kb resolution, STS map to completion. However, several issues needed to be discussed:

Is 100 kb resolution still the correct goal in view of the successful sustained operation of an automated system for cost-effective, large-scale STS-content mapping?
How will the developing genome-wide RH map be integrated with the YAC-based maps?
How can data produced by genome-wide mapping groups be efficiently integrated into efforts to map individual chromosomes?

Integrated 100 kb Resolution Maps

The participants concluded that the goal of building a map with an STS on average every 100 kb remains appropriate. The biochemical and molecular biological techniques that are currently available allow a variety of uses of such a map. While specific strategies for building bacterial clone maps (for gene-finding, DNA sequencing, etc.) might require an STS map of higher resolution, or permit one of lower resolution, information now available is not sufficient to justify departing from the stated goal. For example, if the 100 kb map were to be used to anchor and orient bacterial clone contigs, two STSs that fall within the same contig and are spaced at a reasonable distance from each other would be required. To obtain this situation frequently, contigs would have to be on average at least 200 kb long, and that is a reasonable size to expect from current approaches to building such contigs. Thus the 100 kb map appears to be a reasonable goal.

At the workshop, the director of the Stanford University Genome Center described that group's plans to map 24,000 STSs on a high resolution (100 kb) radiation hybrid (RH) panel ("TNG"). Similarly, a representative of the Whitehead Institute Genome Center described his group's plan to complete and refine maps consisting of 12,000 STSs mapped on the CEPH megaYACs. The Whitehead group also plans to map a large number of STSs on the Genethon-Cambridge ("Genebridge4") RH panel, and agreed to map those markers on the Stanford "TNG" panel as well. The meeting participants concluded that this extensive sharing of markers, together with ongoing whole-genome mapping progress at Whitehead, Stanford and several European laboratories, would result in well-integrated STS maps, and that complete integration (every marker mapped on every map) was not essential.

The efforts at the Whitehead and Stanford centers are expected to produce a RH map in which markers ordered at very high confidence (1000:1 odds) would provide better than 200 kb resolution. In addition, a larger number of markers will be mapped, albeit at lower confidence levels, bringing overall map resolution to better than 100 kb. Noting the primary importance of a user being able to select markers from multiple maps and to be confident that those markers fall either in the same bin or in an adjacent bin, the meeting attendees concluded that this requirement will be met when the maps described at this workshop are realized.

The participants also recommended that the best way to integrate the results of genome-wide and single-chromosome mapping efforts would be for the groups working on single chromosomes to incorporate markers from the genome-wide efforts into their maps; most are actually already doing so. In part, this recommendation was based on the recognition that considerable additional effort from all parties would be necessary for the genome-wide groups to incorporate into their maps STSs generated by the single-chromosome labs, while maintaining their data generation efficiency. For example, the single-chromosome labs would need to be willing to supply, at designated DNA concentrations, primers that have been validated under the specific reaction conditions used routinely by the genome-wide mapping efforts.

The utility of having both single-chromosome and genome-wide mapping groups is complicated by the fact that there is not a chromosome-specific group assigned to every chromosome. As a result, concern has been expressed that a 100 kb map may not be achieved for some regions of the genome. The meeting participants concluded that we will not know if there are truly any "orphan" regions of the genome until after the 100 kb maps being generated by the genome-wide mapping efforts are complete and their quality is assessed. Furthermore, even if some regions are not mapped to 100 kb resolution, it was recommended that any decision to invest in them further should be delayed until after it has been demonstrated that attempts to build sequence-ready templates in those regions have failed.

Most of the investigators present felt that, at present, there was no pressing need to build higher resolution maps on a genome-wide basis. Instead, it was considered to be more important to move forward to sequencing and find out what types of high resolution maps will be needed. The model in which sequencers get high resolution maps "just in time," i.e., as they are needed, was appealing to many of those present, even though this approach remains untested. It was also suggested that strategies for integrating map construction with sequencing may prove to be more efficient than independent mapping strategies and that, regardless of the exact strategies that may be chosen for map assembly, the cost per base pair of map building should be much less than state-of-the-art sequencing costs. Those map-building costs, including map validation, should now be considered to be part of the cost of sequencing.

Expressing confidence of marker order: RH mapping methods currently in use inherently provide confidence measures. The participants recommended that when results of RH mapping experiments are presented, the assumptions made and the limitations of the particular statistical method employed for calculating the RH map should be clearly stated. In all cases, the raw data should be readily available so that other researchers can recompute a map without retyping all the markers.

While LOD scores are used to express confidence of RH mapping orders, no formal confidence measure exists for STS content mapping with YACs. A common-sense criterion for reliability is that markers be double-linked (linked by at least two independent YAC clones). Additional information, such as support for the order from additional depth of YAC coverage, or from another method such as RH mapping, is also useful and should also be described. (Marker orders determined by genetic linkage mapping are useful, but are at a much lower level of resolution than the physical map.) Contradictory information should also be included, as the unconscious (or deliberate) suppression of data is a disservice. Software that can graphically present all such data is available; SEGMAP is an example.

Resources for the human genome sequencing community and rest of the biomedical research community: The maps under construction are democratic; most of the marker and map information is available on the Internet and the majority of STS primers can be purchased from companies, avoiding the need for investigators who want to use the maps or markers to form collaborations with the map builders. Similarly, the breakpoint resources (YAC and RH panel DNAs) are commercially available, so that anyone who wants to do additional mapping can have access to the same resources that were used to build the reference maps.

The participants discussed evidence showing that YACs from some regions of the genome have the potential to provide high fidelity sequencing substrate. However, it is not clear how extensive such regions are. This is a topic worth some additional exploration, for several reasons. First, if one could sequence larger inserts, the number of subclone libraries that must be generated could be reduced. Second, sequencing of larger inserts would reduce the amount of redundant sequencing necessary to build sequence overlaps between small insert clones. And third, some genomic regions may be clonable in YACs but not in bacterial clones, and the YAC material may therefore be needed to fill gaps. Neto obtaining sequencing templates still relies, at present, on bacterial clones.

Until recently, most sequencing schemes have been based on the assumption that cosmids would provide the template. The development of other bacterial vectors, such and PACs and BACs, has raised the possibility that the use of these larger-insert clones may have advantages. However, high quality, deep P1, PAC and BAC libraries which are publicly available are needed. The several libraries that exist, or are under construction, might satisfy this need. While the individual libraries are currently of insufficient depth to support some sequencing strategies, participants reported that there are plans to prepare additional clones for some of these libraries.

Additionally, use of the combined libraries could provide both sufficient depth and flexibility to fill gaps that might occur in any one library, although use of multiple clone types might be suboptimal for some sequencing strategies. More information is needed on the quality of the clones, not only from these libraries, in particular, but from these vector systems, before a decision can be made regarding the advisability of constructing additional libraries in these vectors, or in cosmids. The small amount of information that is currently available is quite encouraging. Additional information should be available in about a year, at which time we anticipate having more data from ongoing projects and from the initial data from the grants that are anticipated from the NHGRI Pilot DNA Sequencing Projects RFA. Other things being equal, it is advantageous to have clones whose insert length is not less than the spacing between STSs. If additional libraries need to be made, there are potential advantages to having this done by, or in collaboration with, the groups that will sequence the clones. Just as current libraries should be readily available, any such new clones should also be made readily available to other researchers. The status of libraries should be monitored because the best possible clone resources are an essential ingredient for efficient mapping and sequencing. If the libraries cited above do not have the necessary qualities, NHGRI will need to move quickly to ensure that good libraries are prepared.

Clone library storage and management, and access to those libraries by the wider biomedical research community, are important, related issues. Several companies have set up library screening and clone and library distribution services. Many of the genome centers and researchers have made arrangements for their materials to be distributed by these companies. This appears to be a high quality, cost-effective access method for at least the casual users of these materials. Larger scale users may also be able to utilize these services, or may prefer to include in their grant requests the resources needed to store, screen and retrieve the clones required for their specific research strategies.

The biomedical community also needs much more convenient and up-to-date access to map information than is currently available. The participants suggested that NHGRI should explore ways to collect maps generated by various groups and disseminate basic map information via the Internet in a stable, user-friendly manner, from a single site.

Participants in NHGRI 100 kb Map Workshop
August 27-28, 1995

David Botstein, Ph.D.
Stanford University School of Medicine

David R. Cox, M.D., Ph.D.
Stanford University School of Medicine

Glen A. Evans, M.D., Ph.D.
University of Texas Southwestern Medical School

Eric D. Green, M.D., Ph.D.
National Human Genome Research Institute

Philip Green, Ph.D.
Univerity of Washington

Thomas J. Hudson, Ph.D.
Whitehead Institute/MIT

Raju S. Kucherlapati, Ph.D.
Albert Einstein College of Medicine

Richard M. Myers, Ph.D.
Stanford University School of Medicine

Maynard V. Olson, Ph.D.
University of Washington

David C. Page, M.D.
Whitehead Institute for Biomedical Research

Michael J. Palazzolo, M.D., Ph.D.
Lawrence Berkeley National Laboratory

David Schlessinger, Ph.D.
Washington University School of Medicine

Robert H. Waterston, M.D.
Washington University School of Medicine

James L. Weber, Ph.D.
Marshfield Medical Research Foundation

Top of page

Last Reviewed: February 2006

Last updated: February 01, 2006

NHGRI 100 kb Map Workshop