Last updated: November 29, 2010
A Catalog of Genome-Wide Association Studies
Full Description of Methods
Weekly PubMed searches are done using the terms "genome-wide" OR "genome AND identification" OR "genome AND association", with limits on the current year and human status.
Studies and associations are eligible for inclusion in the NHGRI GWAS catalog if they meet the following criteria:
- Inclusion of at least 100,000 SNPs in the initial stage, before quality control filters are applied.
- Statistical significance (SNP-trait p-value
a. If a study does not report a combined p-value, the p-value and effect size from the largest sample size will be reported as long as the initial and replication samples each show an association of p
b. If a study does not include a replication stage, significant SNPs from the discovery stage will be reported.
c. SNP-trait associations that are described as previously known at the time of publication and are statistically significant in the GWAS sample, but are not attempted for replication, are reported.
Studies and associations are excluded if:
- The study was published in a language other than English.
- SNPs assayed were limited to those in candidate genes.
- Samples were assayed to measure somatic variation (e.g., in tumor samples).
- The study does not include any new GWAS data.
Information on the following study-level fields is extracted: author (last name of first author); study date (online publication date, if available); PubMed URL; publication title; disease/trait information; initial sample size (summing across multiple Stage 1 populations, if applicable); replication sample size (summing across multiple populations, if applicable); platform (manufacturer); number of SNPs passing quality control metrics (using "up to [maximum number of SNPs]" if multiple platforms are used without imputation, the total number of imputed SNPs, or "pooled" to denote studies of pooled DNA, as applicable); whether the study was one of copy number variants (initially excluded; additional studies to be added).
For each identified SNP, we extract: chromosomal region (from UCSC Genome Browser); gene (as reported); rs number and risk allele (as reported); risk allele frequency in controls (if not available among all controls, among the control group with the largest sample size); p-value and any relevant text (e.g., subgroups where applicable); OR (or % variance explained, SD increment, or unit difference for quantitative traits), 95% CI and any relevant text (e.g., subgroups). If the p-value, OR, and 95% CI fields are not available for the combined population, we extract estimates from the population group with the largest sample size.
In extracting information, we follow these additional guidelines: Missing or not applicable fields are denoted as follows: ?, allele not reported; NS, not significant (no associations at p