NHGRI logo

Executive Summary of the SNP Meeting

Pooks Hill Marriott
Bethesda, Md.

June 7-8, 1999

The single nucleotide polymorphisms (SNP) meeting brought together all the principal investigaors (PI) from the SNP RFA as well as the principals from the SNP Consortium (TSC), to discuss issues related to coordination, SNP quality, resources, and databases. The conclusion of the meeting was that for SNPs to be most useful, several questions need to be addressed and several additional resources need to be provided.

Major scientific questions related to genetic variation that need to be addressed
  • Patterns of variation: How much variation and linkage disequilibrium (LD) exist, and how do they vary across the genome and by population? How do other factors affect patterns of variation?
     
  • The number and frequency of SNPs needed: How many SNPs are needed to address various questions? What allele frequencies should they have?
     
  • Comparative analyses: How can comparing patterns of variation within and among species, including other primates and mammals, be used to make inferences about function and selection?
     
  • Function: How can variable sites be related to functional differences, particularly when there is LD across many sites? How important is it to focus on SNPs in functionally important regions?
Assessment of currently available information and resources
  • Linkage disequilibrium: In about six months it would be useful to have a meeting to summarize what is known about linkage disequilibrium and to identify gaps in our knowledge.
     
  • Population samples: It would be useful to find out what population samples being collected by the NIH are available, to see whether they would be informative for general population studies.
New resources needed
  • Technology for genotyping: To use SNPs to relate genotypes to phenotypes will require much better technologies than currently exist for cheap and large-scale genotyping. More research on novel polymorphism genotyping technologies would be useful.
     
  • Somatic cell hybrids of the DNA Polymorphism Discovery Resource lines, and immortalized complete hydatidiform moles: These lines are useful standards for detecting duplicate genes that are incorrectly assayed as SNPs, for defining haplotypes, and for technology development.
     
  • Primate samples: Standard samples of some species and subspecies would be useful for figuring out which human SNP alleles are ancestral, and comparing variation within and among species.
     
  • Human samples: Much more discussion is needed of the purposes and types of human samples. More research addressing ELSI issues in defined groups would be useful.
     
  • Analytical tools for SNP data: Methods are not yet adequate to analyze the large amount of data soon to be produced. More research on complex trait and SNP analysis would be useful.
SNP quality
  • Standards: A working group was set up to make recommendations about standard sets of samples, gene regions and methods to assess SNP quality.

Summary of the SNP Meeting

The SNP meeting brought together all the PIs from the SNP RFA as well as the principals from the SNP Consortium (TSC), to discuss issues related to coordination, SNP quality, resources and databases. The conclusion of the meeting was that for SNPs to be most useful, several questions need to be addressed and several additional resources need to be provided.

Major scientific questions related to genetic variation that need to be addressed
  • Patterns of variation: How much variation and linkage disequilibrium (LD) exist, and how do they vary across the genome and by population? How do other factors such as recombination, gene duplication, mutation, gene conversion, selection, population structure, and migration history affect the amount and pattern of variation and LD? How does the age of an allele affect the LD around it? Can all common human haplotypes be discovered?
     
  • The number and frequency of SNPs needed: How many SNPs are needed to address various questions? What allele frequencies should they have? Most SNPs are in all populations, and the common haplotypes are in many populations. One could choose SNPs of similar frequencies across populations to have markers that work for all populations, and see whether this set works for finding genes. On the other hand, population-specific SNPs are rare, so choosing markers based on uniform frequency may not be needed, and some variants more common in some populations may be important for diseases more common in those populations. cSNPs in candidate genes, of any frequency, are of particular interest. Are there special strategies for finding rare alleles in genes?
     
  • Comparative analyses: How can comparing patterns of variation within and among species, including other primates and mammals, be used to make inferences about function, selection, and mutation? What sorts of questions can be addressed by species of differing phylogenetic relatedness?
     
  • Function: How can variable sites be related to functional differences, particularly when there is LD across many sites? For studies relating genotype to phenotype, such as linkage analysis in families and association analysis in populations, what experimental designs under which conditions will provide the most statistical power? How important is it to focus on SNPs in functionally important regions?
The SNPs Needed for Various Questions
Question Number Needed Least Frequency of Minor Allele
Linkage 2-3K 10-20%
Loss of heterozygosity 2-10K?? 30-40%
Whole-genome association
Large number of founders 300-500K ?? 60-200K ??
Small number of founders    
Finding disease-associated alleles Focus on genes All frequencies
Population studies    

Current SNP discovery

  • Coordination of SNP discovery: Some duplication among groups is useful for validating the various methods. The PIs looking for SNPs in known genes (Lander, Chakravarti, Olson, Oefner) feel that informal discussion among themselves will suffice for preventing much overlap. The groups looking for SNPs in random genomic DNA (Cox, Oefner, TSC) are using different methods that do not lend themselves to useful coordination. Since some duplication is useful, and there are many SNPs, coordinating these groups does not seem needed now. It might be useful for SNP producers to have a non-public listing of what genome regions are being worked on.
     
  • SNPs from large-scale sequencing: Many potential mapped SNPs are found when overlapping regions are sequenced from the same or different libraries. Some of the sequencing groups are mining potential SNPs this way. The information on overlapping sequences is not generally deposited in sequence databases and thus is not readily available to everybody. The meeting agreed that Pui-Yan Kwok should coordinate the discovery of SNPs from all large-scale sequence production. He is being provided with additional funding to develop the software to automate this process, coordinate with the various sequencing centers, examine the sequence data, and deposit the putative SNPs in the NCBI SNP database, dbSNP. Most of these SNP alleles will be common ones; most rare alleles will not be found by this method.
Issues that need to be addressed to use SNPs and understand their patterns
  • Population studies: How do we design sets of samples to assess the amount of variation and LD, and how they vary by population? We could examine 20 regions of 30-50 kb in various populations, including sites with common and with rare minor alleles. Common alleles provide large heterozygosities, but the amount of LD around them may be small because they are generally old alleles. Rarer alleles provide less heterozygosity but may be associated with longer blocks of LD. We need to study well-defined small isolated populations and we need to deal with the complexity of large open populations such as that of the US. However the studies are designed, the same regions should be studied in the same populations, using the same individual samples. Standard samples will allow methods to be compared without having additional differences due to different sets of samples. Standard samples should be widely available, so cell lines will be needed.
     
  • Quality standards for SNPs: Groups use various methods to confirm SNPs and check whether they arise spuriously from gene duplications. These methods include sequencing both DNA strands, detecting SNPs by another method, and genotyping SNPs in multiple individuals. Most SNPs are mapped, either by RH mapping or by looking for SNPs in mapped DNA regions. Many methods for SNP discovery can provide estimates of the error that a putative SNP is not a SNP. It would be useful to choose standard genomic regions in standard samples to compare methods of SNP discovery, and to have groups cross-check to confirm SNPs discovered by different methods and by different groups. A working group was set up to consider SNP quality assessment: Eric Lander (chair), Lisa Brooks, Aravinda Chakravarti, David Cox, Deborah Nickerson, Peter Oefner, Steve Sherry, and David Wang.
     
  • Technology for genotyping: Current initiatives will discover hundreds of thousands of SNPs. Using these SNPs to relate genotype to phenotype will require much better technologies than currently exist for cheap and large-scale SNP genotyping. Different technologies may be most efficient for scoring many SNPs in a few individuals, a few SNPs in many individuals, or many SNPs in many individuals. More research on novel genotyping technologies for SNPs and other forms of polymorphism would be useful.
     
  • SNP information in the database: The first information people want about variation is whether a particular gene has SNPs. The additional information from genotypes is useful for validating SNPs, for assessing allele frequency and Hardy-Weinberg fit, and for inferring haplotypes and linkage disequilibrium. Haplotype information is extremely informative for LD. As genotypes or haplotypes are generated for individuals they should be placed in the database. It is also useful to report regions that have been examined and found to be monomorphic. The database will have filters to allow researchers to choose SNPs of particular frequencies or degrees of validation, so all SNPs should be deposited in the database with information about validation.
     
  • Analytical tools for SNP data: Soon there will be data on hundreds of thousands of SNPs in thousands of individuals typed for hundreds of phenotypes. The analytical tools do not yet exist to deal with this amount of data in a statistically rigorous way. Tools are needed to find associations among alleles at different SNPs and between phenotypes and genotypes. More research on complex trait and SNP analysis would be useful.
     
  • ELSI issues related to group definitions: Some scientific questions have social and cultural implications that need to be considered when designing the research. A current ELSI RFA addresses issues related to genetic variation and populations; more research may be needed to help define how to deal with these issues. Discussions of genetic variation research will be needed with various communities.
Assessment of currently available information and resources
  • Linkage disequilibrium: When more data are generated, in about six months, it would be useful to have a meeting to summarize what is known about linkage disequilibrium, including the questions that LD addresses and the advantages of various methods of analysis. The meeting would identify gaps in understanding patterns of variation and LD and help to design a framework for population sampling.
     
  • Population samples: It would be useful to find out what population samples being collected by the NIH are available, under what consent conditions, to see whether they would be informative for general population studies. It may be possible to use controls from disease studies. Many samples for population studies are chosen for convenience rather than with strong scientific justification.
New resources needed
  • Immortalized complete hydatidiform moles: Moles are useful for detecting duplicate genes that are incorrectly assayed as SNPs, for defining haplotypes, and for technology development. There have been some unsuccessful attempts to immortalize mole lines, but nobody has focused on doing it.
     
  • Somatic cell hybrids: Duplicate loci can also be detected using somatic cell hybrids. David Cox plans to produce somatic cell hybrids of the first 24 samples of the DNA Polymorphism Discovery Resource, since so much SNP discovery uses these samples. He discussed creating about 50 hybrid cell lines for each of the first 24 DNA Polymorphism Discovery Resource lines.
     
  • Samples of trios: Parent and offspring sets allow haplotypes to be obtained fairly directly, without much statistical inference, but require some redundant typing.
     
  • Primate samples: Standard samples of some species and subspecies would be useful for figuring out which human SNP variants are ancestral and comparing the amount and pattern of variation within species to the divergence among species. These comparisons allow detection of functional constraints, selective sweeps of alleles to fixation, and maintenance of variation within species. A standard set of primate samples has the same advantages as standard human samples: different researchers can compare methods on the same samples, and information can accumulate on defined samples.

PARTICIPANTS

(Listed in alphabetical order from left to right)

Aravinda Chakravarti
Case Western Reserve University
10900 Euclid Avenue
Cleveland, OH 44106-4955

David Cox
Department of Genetics
School of Medicine
Stanford University
Stanford, CA 94305

Daniel Geraghty
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N. D2-100
PO Box 19024
Seattle, WA 98109-1024

Pui-Yan Kwok
Washington University Scbool of Medicine
660 S. Euclid Avenue
Campus Box 8123
St. Louis, MO 63110

Charles Langley
Center for Population Biology and Section of Evolution and Ecology
University of California, Davis
One Shields Avenue
2320 Storer Hall
Davis, CA 95616

Jean McEwen
Boston College Law School
885 Centre Street
Newton, MA 02459

Richard Myers
Department of Genetics
Stanford University School of Medicine
300 Pasteur Drive
Stanford, CA 94305-5120

John Nolan
Los Alamos National Laboratory
University of California
PO Box 1663
Los Alamos, NM 87545

Maynard Olson
Genome Center
University of Washington
225 Fluke Hall on Mason Rd
Seattle, WA 98195

Barbara Skene
The Wellcome Trust
183 Euston Rd.
London NW12BE, United Kingdom
01716118690

Carl Ton
University of Washington School of Medicine
225 Fluke Hall on Mason Rd.
Seattle, WA 98195

James Weber
Marshfield Medical Research Foundation
1000 North Oak Avenue
Marshfield, WI 54449
Mark Chee
Illumina, Inc.
9390 Towne Centre Drive, Suite 200
San Diego, CA 92121

Evan Eichler
Case Western Reserve University
10900 Euclid Avenue
Cleveland, OH 44106-3029

Arthur Holden
TSC
8770 West Bryn Mawr Avenue
Suite 1300
Chicago, IL 60631

Eric Lander
Whitehead Institute
Center for Genome Research
One Kendall Square - Bldg. 300
Cambridge, MA 02139

Robert Lipshutz
Affymetrix, Inc.
3380 Central Expressway
Santa Clara, CA 95051

John McPherson
Washington University School of Medicine
4444 Forest Park Blvd.
St. Louis, MO 63108

Deborah Nickerson
Department of Molecular Biotechnology
University of Washington
Box 357730
Seattle, WA 98195-7730

Peter Oefner
Stanford Genome Center
855 California Avenue
Palo Alto, CA 94304

Michael Silber
Pfizer, Inc.
Central Research Division
Easter Point Rd.
Groton, CT 06340

Lincoln Stein
Cold Spring Harbor Laboratory
One Bungtown Road
Cold Spring Harbor, NY 11724

David Wang
Bristol-Myers Squibb
Pharmaceutical Research Institute
PO Box 5400
Princeton, NJ 08543-5400

Robert Weiss
Department of Human Genetics
University of Utah
20 S. 2030 E, Room 308
Salt Lake City, UT 84112-9454

NIH Attendees:

Douglas Bell
NIH/NIEHS
111 Alexander Drive
Box 12233
Research Triangle Park, NC 27709

Lisa Brooks
NIH/NHGRI
Bldg. 38A, Room 614
38 Library Drive
Bethesda, MD 20892

Jean Cahill
NIH/NHGRI
Bldg. 38A, Room 613
38 Library Drive
Bethesda, MD 20892

Francis Collins
Director
NIH/NHGRI
Bldg. 31, Room 4B09
31 Center Drive
Bethesda, MD 20892

Camilla Day
NIH/CSR
6701 Rockledge Drive
Bethesda, MD 20892

Elise Feingold
NIH/NHGRI
Bldg. 38A, Room 614
38 Library Drive
Bethesda, MD 20892

Maria Giovanni
NIH/NEI
Executive Plaza South
Suite 350
6120 Executive Blvd.
Bethesda, MD 20892

Mark Guyer
NIH/NHGRI
Bldg. 38A, Room 604
38 Library Drive
Bethesda, MD 20892

Kathy Hudson
NIH/NHGRI
Bldg. 31, Room 4B09
31 Center Drive
Bethesda, MD 20892

Elke Jordan
NIH/NHGRI
Bldg. 31, Room 4B09
31 Center Drive
Bethesda, MD 20892

Rochelle Long
NIH/NIGMS
Natcher Bldg., Room 4AS49
45 Center Drive
Bethesda, MD 20892

Karen Mohlke
NIH/NHGRI
Bldg. 9, Room 1W108
9000 Rockville Pike
Bethesda, MD 20892

Susan Old
NIH/NHLBI
Two Rockledge Centre
Suite 9150
6701 Rockledge Drive
Bethesda, MD 20892

Jane Peterson
NIH/NHGRI
Bldg. 38A, Room 610
38 Library Drive
Bethesda, MD 20892

Jerry Roberts
NIH/NHGRI
Bldg. 38A, Room 609
38A Library Drive
Bethesda, MD 20892

James Selkirk
NIH/NIEHS
111 Alexander Drive
PO Box 12233
Research Triangle Park, NC 27709

Grace Shen
NIH/NCI
EPN, Room 501
6130 Executive Blvd.
Bethesda, MD 20892

Kaisa Silander
NIH/NHGRI
Bldg. 9, Room 1W108
Bethesda, MD 20892

Judy Small
NIH/NIDCR
Natcher Bldg., Room 4AN-24J
45 Center Drive
Bethesda, MD 20892

Elizabeth Thomson
NIH/NHGRI
Bldg. 38A, Room 617
38A Library Drive
Bethesda, MD 20892

Jose Velazquez
NIH/NIEHS
PO Box 12233
Research Triangle Park, NC 27709

Sally York
NIH/NHGRI
Bldg. 38A, Room 613
38 Library Drive
Bethesda, MD 20892
Joy Boyer
NIH/NHGRI
Bldg. 38A, Room 617
38 Library Drive
Bethesda, MD 20892

Ken Buetow
NIH/NCI
Bldg.9, 1N105
Bethesda, MD 20892

Peter Chines
NIH/NHGRI
Room 3As.43
45 Center Drive
Bethesda, MD 20892

Yasmin Cypel
NIH/NHGRI
Bldg. 38A, Room 614
38 Library Drive
Bethesda, MD 20892

Mike Erdos
NIH/NHGRI
49 Convent Drive
Bethesda, MD 20892

Adam Felsenfeld
NIH/NHGRI
Bldg. 38A, Room 614
38 Library Drive
Bethesda, MD 20892

Bettie Graham
NIH/NHGRI
Bldg. 38A, Room 614
38 Library Drive
Bethesda, MD 20892

Linda Hall
NIH/NHGRI
Bldg. 38A, Room 613
38 Library Drive
Bethesda, MD 20892

Karin Jegalian
NIH/NHGRI
31 Center Drive
Bldg. 31, Room 4B09
Bethesda, MD 20892

Robert Karp
NIH/NIAAA
Willco Bldg.
6000 Executive Blvd.
Suite 402
Bethesda, MD 20892

Stephen Mockrin
NIH/NHLBI
Two Rockledge Centre
6701 Rockledge Drive
Bethesda, MD 20892

Ken Nakamura
NIH/NHGRI
Bldg. 38A, Room 609
38 Library Drive
Bethesda, MD 20892

Diane Patterson
NIH/NHGRI
Bldg. 38A, Room 613
38 Library Drive
Bethesda, MD 20892

Rudy Pozzatti
NIH/NHGRI
Bldg. 38A, Room 609
38 Library Drive
Bethesda, MD 20892

Jeffery Schloss
NIH/NHGRI
Bldg. 38A, Room 610
38 Library Drive
Bethesda, MD 20892

Vicki Seyfert
NIH/NIAID
Solar Bldg., 4A21
Bethesda, MD 20892

Steve Sherry
NIH/NCBI
Bldg. 38A, Room 8N805
38 Library Drive
Bethesda, MD 20892

Karl Sirotkin
NIH/NCBI
Bldg. 38A, Room 8S810
38 Library Drive
Bethesda, MD 20892

Rochelle Small
NIH/NIDCD
Executive Plaza South Bldg.
6120 Executive Blvd.
Suite 400-C
Bethesda, MD 20892

Marjorie Tingle
NIH/NCRR
One Rockledge Centre
Room 6154
6705 Rockledge Drive
Bethesda, MD 20892

Cathy Yarbrough
NIH/NHGRI
Bldg. 31, Room 4B09
31 Center Drive
Bethesda, MD 20892

 

Last updated: May 01, 2006