More than 800 published genetic association studies have implicated dozens of potential risk loci in Parkinson's disease (PD). To facilitate the interpretation of these findings, we have created a dedicated online resource, PDGene, that comprehensively collects and meta-analyzes all published studies in the field. A systematic literature screen of ~27,000 articles yielded 828 eligible articles from which relevant data were extracted. In addition, individual-level data from three publicly available genome-wide association studies (GWAS) were obtained and subjected to genotype imputation and analysis. Overall, we performed meta-analyses on more than seven million polymorphisms originating either from GWAS datasets and/or from smaller scale PD association studies. Meta-analyses on 147 SNPs were supplemented by unpublished GWAS data from up to 16,452 PD cases and 48,810 controls. Eleven loci showed genome-wide significant (P<5×10−8) association with disease risk: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, and SYT11/RAB25. In addition, we identified novel evidence for genome-wide significant association with a polymorphism in ITGA8 (rs7077361, OR 0.88, P = 1.3×10−8). All meta-analysis results are freely available on a dedicated online database (www.pdgene.org), which is cross-linked with a customized track on the UCSC Genome Browser. Our study provides an exhaustive and up-to-date summary of the status of PD genetics research that can be readily scaled to include the results of future large-scale genetics projects, including next-generation sequencing studies.
The genetic basis of Parkinson's disease is complex, i.e. it is determined by a number of different disease-causing and disease-predisposing genes. Especially the latter have proven difficult to find, evidenced by more than 800 published genetic association studies, typically showing discrepant results. To facilitate the interpretation of this large and continuously increasing body of data, we have created a freely available online database (“PDGene”: http://www.pdgene.org) which provides an exhaustive account of all published genetic association studies in PD. One particularly useful feature is the calculation and display of up-to-date summary statistics of published data for overlapping DNA sequence variants (polymorphisms). These meta-analyses revealed eleven gene loci that showed a statistically very significant (P<5×10−8; a.k.a. genome-wide significance) association with risk for PD: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, SYT11/RAB25. In addition and purely by data-mining, we identified one novel PD susceptibility locus in a gene called ITGA8 (rs7077361, P = 1.3×10−8). We note that our continuously updated database represents the most comprehensive research synopsis of genetic association studies in PD to date. In addition to vastly facilitating the work of other PD geneticists, our approach may serve as a valuable example for other complex diseases.
Citation: Lill CM, Roehr JT, McQueen MB, Kavvoura FK, Bagade S, et al. (2012) Comprehensive Research Synopsis and Systematic Meta-Analyses in Parkinson's Disease Genetics: The PDGene Database. PLoS Genet 8(3): e1002548. doi:10.1371/journal.pgen.1002548
Editor: Amanda J. Myers, University of Miami, United States of America
Received: October 16, 2011; Accepted: January 5, 2012; Published: March 15, 2012
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Funding: The main funding for this study was provided by the Michael J. Fox Foundation for Parkinson's Disease (MJFF) with additional financial support by the Cure Alzheimer's Fund (CAF), the National Alliance for Research on Schizophrenia and Depression (NARSAD), Prize4Life, and EMD Serono (all to L Bertram). CM Lill was supported by a fellowship from the Deutscher Akademischer Austauschdienst (DAAD) and Fidelity Biosciences Research Initiative (FBRI). L Bertram is also supported by the German Ministry for Education and Research (BMBF). JPA Ioannidis was supported through the Tufts Clinical and Translational Science Institute (Tufts CTSI) under funding from the National Institute of Health/National Center for Research Resources (UL1 RR025752). Points of view or opinions in this paper are those of the authors and do not necessarily represent the official position or policies of the Tufts CTSI. M Sharma was supported by the Michael J. Fox Foundation. The NeuroGenetics Research Consortium GWAS  was funded by the Edmond J. Safra Michael J. Fox Foundation Global Genetics Consortium Initiative and NIH R01 NS 036960. The work of the International Parkinson's Disease Genomics Consortium (IPDGC) was supported in part by the Intramural Research Programs of the National Institute on Aging, National Institute of Neurological Disorders and Stroke, National Institute of Environmental Health Sciences, National Human Genome Research Institute, National Institutes of Health, Department of Health and Human Services: project numbers Z01 AG000949-02 and Z01-ES101986. In addition the work of the IPDGC was supported by the U.S. Department of Defense, award number W81XWH-09-2-0128. Portions of the work of the IPDGC utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Md. (http://biowulf.nih.gov). T Foroud received funds from the National Institutes of Health (R01CA141668 and R01NS37167). C Klein is the recipient of a career development award from the Volkswagen Foundation and from the Hermann and Lilly Schilling Foundation. DM Maraganore acknowledges active funding support from the National Institutes of Health (2R01 ES10751), Alnylam Pharmaceuticals, Medtronic, and NorthShore University Health System. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: CB Do, N Eriksson, and JY Tung are employed by 23andMe and own stock options in the company. MJ Farrer and Mayo Foundation received royalties from H.Lundbeck A/S and Isis Pharmaceuticals. In addition, MJ Farrer has received an honorarium for a seminar at Genzyme. T Gasser has received consultancy fees from Cephalon and Merck-Serono, grants from Novartis, payments for lectures including service on speakers' bureaus from Boehringer Ingelheim, Merck-Serono, UCB, and Valean, and holds patents NGFN2 and KASPP. JA Hardy has received consulting fees or honoraria from Eisai and his institute has received consulting fees or honoraria from Merck-Serono. DM Maraganore has received extramural research funding support from the National Institutes of Health (2R01 ES10751), the Michael J. Fox Foundation (Linked Efforts to Accelerate Parkinson Solutions Award, Edmond J. Safra Global Genetics Consortia Award), and from Alnylam Pharmaceuticals and Medtronic (observational studies of Parkinson's disease). DM Maraganore has also received intramural research funding support from the Mayo Clinic and from NorthShore University Health System. DM Maraganore filed a provisional patent for a method to predict Parkinson's disease. This provisional patent is unlicensed. He also filed a provisional patent for a method to treat neurodegenerative disorders. That provisional patent has been licensed to Alnylam Pharmaceuticals and DM Maraganore has received royalty payments in total of less than $20,000. K Stefansson has received grants from deCODE.
¶ Memberships of the consortia are provided in Text S1.
Parkinson's disease (PD) is the second most common neurodegenerative disease with a prevalence of ~1% over 60 years of age . Approximately 5–10% of the patients show an autosomal dominant or recessive mode of inheritance, and several causative genes have been identified, e.g. SNCA, LRRK2, PARK2, and PINK1 (for review see ref. ). Recently, two other novel autosomal dominant PD genes, VPS35 and EIF4G1 –, have been identified, the former via application of next-generation sequencing techniques. It can be anticipated that causal mutations in additional genes will emerge within the next years. However, the vast majority of patients suffer from non-Mendelian forms of PD, which are likely caused by the combined effects of genetic and environmental factors. In order to decipher the genetic architecture underlying PD susceptibility, more than 800 genetic association studies have been performed over the past 20 years. While early candidate gene studies and subsequent meta-analyses provided conclusive evidence showing that polymorphisms in SNCA  (encoding alpha-synuclein), LRRK2  (leucine-rich repeat kinase 2), MAPT  (microtubule-associated protein tau), and GBA  (acid beta-glucosidase) significantly impact PD susceptibility, most association studies in the field provided inconclusive or even conflicting results.
During the last few years, genome-wide association studies (GWAS) – have postulated additional PD loci. While the early GWAS and a GWAS-meta-analysis  were of limited sample sizes and yielded mostly inconsistent results, more recent studies have identified a number of loci that were independently confirmed in follow-up studies (e.g. GAK, BST1, and PARK16, see Table 1 for all proposed GWAS findings across GWAS publications). Very recently, a GWAS meta-analysis  implicated several other new putative PD loci which currently await further validation. Despite this progress, approximately 40% or more of the population-attributable risk probably remains unexplained by today's most promising PD loci . To this end, genetic association studies remain one of the mainstays of PD genetics research. However, GWAS and other large-scale association studies typically only highlight the most promising results and often do not provide data on variants showing suggestive evidence for association, or previously implied variants that could not be confirmed in the GWAS setting. As a result, the cumulative genetic evidence in favor of or against association with certain variants in the PD field is becoming increasingly difficult to follow, evaluate and interpret. To address this problem, we have comprehensively collected, catalogued and systematically meta-analyzed the data from all genetic association studies published in the field of non-Mendelian PD, including GWAS, and made all results publicly available on a regularly updated online database, “PDGene” (http://www.pdgene.org).
The results of this research synopsis are based on a freeze of the PDGene database content on March 31st 2011 (available upon request from the authors). At that time, PDGene included details on 828 individual studies across more than 50 different countries and six continents reporting on 3,382 polymorphisms in 890 genetic loci. Data for more than 2,000 SNPs were supplemented by results derived from up to three publicly available GWAS datasets , ,  following extensive quality control and imputation. Ultimately, this procedure yielded a total of 867 polymorphisms across ~300 genetic loci that met our criteria for meta-analysis (see Methods). Additional independent GWAS data for 147 SNPs yielding P values of ≤0.1 in these initial meta-analyses were provided by researchers of all remaining currently published Caucasian GWAS datasets , –, . Following the identification of genome-wide significant association with an intronic SNP (rs7077361) in ITGA8 after addition of these data, we obtained additional data from the same GWAS datasets on ~1,400 SNPs in the chromosomal region encompassing ITGA8 (chr10:15346353–15801533, hg18). Finally, independent replication data in Caucasian and Asian populations from the GEO-PD consortium  generated for ten recently described PD loci  were made available for inclusion. As a result, we were able to substantially increase the sample size (up to 16,452 PD cases and 48,810 controls) for a large number of some of the most promising PD loci. For instance, we were able to add data from up to 48,861 previously not analyzed combined cases and controls to meta-analyses of some of the recently proposed PD loci  (median sample size 14,896, see Table 2 and Table S1 for details). In addition to these focused analyses, PDGene displays meta-analysis results for more than seven million additional SNPs originating from up to three publicly available GWAS datasets , , . The results are available online (e.g. as summarized in http://www.pdgene.org/largescalemeta.asp), where they are cross-linked to a customized and fully browsable track on the UCSC Genome Browser.
Table 2. Genome-wide significant summary meta-analysis results of the PDGene database in populations of Caucasian and Asian decent.doi:10.1371/journal.pgen.1002548.t002
PDGene meta-analysis results
The PDGene meta-analyses of the 867 core polymorphisms were based on a median of 7,680 subjects (interquartile range 4,612–16,726). Additional meta-analyses were performed after stratification for Caucasian and Asian ancestry (for details on sample size and included ethnicities for individual meta-analyses see Table S1). In addition, we also performed random-effects meta-analyses across all three publicly available GWAS datasets , ,  following genotype imputation using data from the International HapMap Consortium and 1000 Genomes Project. Ultimately this yielded 7,123,920 SNPs that could be meta-analyzed across at least two GWAS datasets (see Figure S1 for a quantile-to-quantile plot of the GWAS-only meta-analyses). All 867 core meta-analysis results are available online on PDGene as forest plots, summarizing the relative contributions of each dataset to the most current summary effect estimate, and in the form of cumulative plots, illustrating how summary ORs evolve over time. All meta-analysis results are plotted in Figure 1 (green dots) alongside the GWAS-only meta-analysis results (black and grey dots).
Figure 1. Manhattan plot of all meta-analysis results performed in PDGene.
This summary combines association results from 7,123,986 random-effects meta-analyses based on the March 31st 2011 datafreeze of the PDGene database. Results are plotted as −log10 P-values (y-axis) against physical chromosomal location (x-axis). Black and grey dots indicate results originating exclusively from the three fully publicly available GWAS datasets , ,  (see Methods), while green dots are based on a combination of smaller scale studies, supplemented by GWAS datasets (where applicable). Gene annotations are provided for genes highlighted in the main text.doi:10.1371/journal.pgen.1002548.g001
One-hundred-three meta-analyses across 12 genetic loci (BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, ITGA8, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, SYT11/RAB25) yielded summary ORs suggesting a genome-wide significant (P≤5×10−8) increase or decrease in PD risk in all ethnicities and/or after stratification for ethnic ancestry (Table 2, Table S1, and Figure S2 [forest plots]). None of these loci contained more than one SNP independently associated at genome-wide significance (as judged by pair-wise linkage disequilibrium assessments using ‘SNAP’ and r2-values of 0.2 as cut off http://www.broadinstitute.org/mpg/snap/). The majority of polymorphisms tested in the genome-wide significant loci do not show evidence for publication bias (Table S1). Finally, all genome-wide significant signals were robust against potential undetected sample overlap using a recently proposed procedure  (see Table S2 for more details). Combined sample sizes for all 12 loci were substantially larger here as compared to any previously published meta-analysis (Table S1), providing unequivocal evidence for an involvement of these loci in PD susceptibility. While power to detect genome-wide significance was excellent for most of these loci (>80% based on an OR of 1.15, and a minor allele frequency down to 0.05 using the Genetic Power Calculator, http://pngu.mgh.harvard.edu/~purcell/gpc/), power was less for a large number of other meta-analyses due to smaller sample sizes and allele frequencies (see Table S1 for details). Thus, no simple statistic can summarize the overall power of our study.
The above list includes an intronic polymorphism in ITGA8 located on chromosome 10p13 for which we identified novel evidence for genome-wide association with PD risk (OR 0.88, P = 1.3×10−8, I2 = 0, see Table 2, and Figure 2). This SNP had previously been proposed to be associated with PD risk at sub-genome-wide significance by Simon-Sanchez et al . After obtaining and meta-analyzing GWAS data from ~1,400 additional SNPs in this region derived from all Caucasians GWAS datasets , , , –, , , rs7077361 remained the most significantly associated SNP in this region (Figure S3).
Figure 2. Forest plot of the meta-analysis of rs7077361 in ITGA8.
Study-specific allelic odds ratios (ORs, black squares) and 95% confidence intervals (CIs, lines) were calculated for each included dataset. The summary OR and CI was calculated using the DerSimonian Laird random-effects model (grey diamond) . C = Caucasian ancestry.doi:10.1371/journal.pgen.1002548.g002
In addition to using random-effects models, we also performed exploratory fixed-effect meta-analyses on all eligible polymorphisms. These analyses did not reveal genome-wide significant effect sizes for any additional locus, except ACMSD/TMEM163 (most significant SNP rs6723108, OR 0.91, P = 1.3×10−9, I2 = 46% [95% CI 0–73%], Figure S4, panel 1) and HLA (most significant SNP chr6:32609909, OR 0.78, P = 8.8×10−15, I2 = 84% [95% CI 70–91%], Figure S4, panel 2), both of which were reported to be associated with PD risk at genome-wide significance in previous work , . In both instances, the lack of genome-wide significance in the random-effects models (Table S1) was due to relatively pronounced heterogeneity of effect estimates across studies. However, the heterogeneity across the 11 datasets in the ACMSD/TMEM163 meta-analysis is almost entirely due to variance of effect size estimates in the same direction (see Figure S4, panel 1), making it likely that ACMSD/TMEM163 represents a genuine PD risk locus. For the SNP tested in the HLA locus (chr6:32609909, Figure S4, panel 2), heterogeneity is more pronounced and more complex owing to ORs on either side of 1. This could be due to a number of reasons, e.g. subtle and uncorrected population substructure and/or different LD patterns between the analyzed SNP and the actual functional variant(s) . Thus, although the evidence is currently not as conclusive as for ACMSD/TMEM163 it still appears quite possible that there is one or more PD association signals in the HLA region. Regardless of these considerations, additional data are needed to more firmly assess the role of both loci in contributing to PD susceptibility.
Ethnicity-specific meta-analysis results
SNCA, LRRK2, BST1, and PARK16 show evidence for genome-wide significance in meta-analyses restricted to Caucasian and Asian populations (Table 2). Furthermore, data obtained from the GEO-PD consortium  suggest that the effect estimates for some of the recently discovered PD loci (i.e. CCDC62/HIP1R, MCC1, and STK39)  may be comparable in Caucasian and Asian populations (Table S1), although additional datasets are needed to establish genome-wide significance in populations of Asian-descent for these loci. Conversely, only insufficient data are currently available to assess the effect sizes of GAK and SYT11/RAB25 on PD risk in Asians: GAK rs6599388 violated Hardy-Weinberg equilibrium in Asian datasets from the GEO-PD consortium and was thus excluded from further analyses on that ethnic group . SYT11/RAB25 chr1:154105678 was excluded from all analyses due to technical reasons in the study by the GEO-PD consortium . Moreover, none of the reported SYT11/RAB25 and GAK SNPs from the recent GWAS meta-analysis  were captured directly or by proxy (with an r2≥0.8) in the Japanese GWAS dataset , . Finally, Asian-descent populations cannot be appropriately assessed for PD association with the MAPT-H1/H2 haplotype, rs10928513 in ACMSD, and rs7077361 in ITGA8 owing to monomorphicity at these sites , .
Evaluating the credibility of significant associations
To estimate the epidemiologic credibility of associations with polymorphisms showing sub-genome-wide significant association with PD (P>5×10−8), we applied two “credibility” measures for each such result. First, we calculated Bayes factors (BF, expressed here as log10-values, “logBF”) assuming an average non-null odds ratio of 1.15, as approximation of a typical “complex disease effect size”, and a spike and smear prior distribution of effects . Our second assessment was based on the Human Genome Epidemiology Network's (HuGENet) interim criteria for the assessment of cumulative epidemiologic evidence in genetic association studies , . The results of these analyses are summarized in Table S1.
There was strong epidemiologic support in both assessments for all loci showing genome-wide significant association. This included several additional polymorphisms in these same loci that only showed sub-genome-wide significant association. However, there was no additional sub-genome-wide significantly associated locus that received unequivocally strong support from both credibility assessments (Table S1). In this list, the strongest support was assigned to SNP chr6:32588205 in the HLA locus receiving the best possible grade in the HuGENet criteria (grade A), but more moderate support in the Bayesian analyses (logBF = 4.4). However, the relevance of this assessment needs to be evaluated as the underlying analysis was only based on four GWAS datasets.
The PDGene database represents a comprehensive, regularly updated and freely available online research synopsis of genetic association studies in PD. Detailed summaries of the most compelling findings are provided within an easy-to-use, dedicated online framework, displaying forest plots, cumulative meta-analyses, and an up-to-date ranking of “Top Results”. To allow comparison of PDGene results with association findings from other complex diseases and to facilitate their interpretation with respect to functional genetics data, all meta-analysis results have been ported as a customized track onto the UCSC Genome Browser. This will also allow for a integration and visualization  of association results from large-scale resequencing data (e.g. from whole-exome or whole-genome studies) into PDGene once these become available.
To the best of our knowledge, our study represents the most comprehensive research synopsis in the field of PD genetics. In addition, it represents the first disease-specific genetic database that allows a systematic and exhaustive inclusion of GWAS data, and may serve as a model for similar databases in other complex genetic diseases. Owing to our multi-pronged data retrieval and analysis protocol we were able to perform meta-analyses on the vast majority of PD risk-gene candidates, including those “featured” as top association results in all published GWAS. In particular, this includes the five novel loci recently featured in the recent GWAS meta-analysis . Through collaboration with other PD genetics laboratories we obtained independent summary data for these and 142 additional SNPs, substantially extending the hitherto available evidence. Taken together, our analyses provide unequivocal evidence that BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, ITGA8, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, SYT11/RAB25 represent genuine PD risk loci, while the role of several other loci (e.g. ACMSD/TMEM163, and the HLA locus) remains to be determined. The unpublished data aggregated here from various PD genetics groups for selected candidate genes represents the first step towards a systematic meta-analysis across the full GWAS datasets from the same populations. Once completed, the results of this “mega” meta-analysis will be posted on the PDGene database, allowing users to browse the complete results via the customized genome browser track already in place.
Of particular interest are loci with unusually large effect sizes. While most loci in PDGene have only small effects on PD risk (with ORs ranging from 1.10 to 1.35, which are typical for complex diseases), for some loci much larger ORs were estimated (i.e. GBA [OR 3.51 in Caucasians], LRRK2 [OR 2.23 in Asians], and SYT11/RAB25 [OR 1.73 in Caucasians], see Table 2). The risk-allele frequencies at these polymorphisms are typically rather small (i.e. below 0.05), resulting in low population attributable risks for these loci (for the above mentioned loci individually less than 2%).
Interestingly, the meta-analysis results of GBA N370S as well as the LRRK2 rs34778348 are solely based on candidate-gene approaches since these SNPs are not on any of the current GWAS arrays or imputation reference panels. Thus, even in the “GWAS era” smaller-scale, non-GWAS but “focused” genetic studies, will likely continue to play an important role. This is also true when it comes to providing independent replication of proposed disease associations and/or when validating imputation-derived results by direct genotyping in sufficiently sized datasets. PDGene systematically concatenates all these different types of data into one database framework, vastly facilitating an assessment of the overall evidence for any given SNP or locus.
The strength of our approach is further exemplified by the identification of genome-wide significant association between disease risk and a SNP in ITGA8, which was not featured as a relevant PD gene in any previous study. ITGA8 (encoding integrin alpha 8, a type-I transmembrane protein) is functionally interesting as it is expressed in brain , mediates cell-cell interactions and regulates neurite outgrowth of sensory and motor neurons . Additional studies are needed to further assess the potential role of this gene in PD pathogenesis. Furthermore, PDGene shows that two additional loci, not highlighted by the recent GWAS meta-analysis , yield genome-wide signficiant results in the PDGene meta-analyses, i.e. PARK16, originally implicated as a PD susceptibility locus in an Asian GWAS  but not highlighted in the recent GWAS meta-analysis on Caucasian samples  and GBA, a gene that was found soley by candidate-gene approaches. Another strength of our study is that it combines genetic data from currently more than 50 different countries allowing a systematic assessment of genetic associations across populations of different ethnic descent. For instance, these analyses suggest that variants in BST1, LRRK2, the PARK16 locus, and SNCA show genome-wide significant association with PD risk in both Caucasian and Asian-descent samples. Furthermore, the recently described Caucasian GWAS loci CCDC62/HIP1R, MCC1, and STK39  also show similar effect size estimates in populations of Asian-descent . PD association data originating from other ethnic groups are still relatively scarce. However, they could easily be added to the already existing data on the respective polymorphisms available on PDGene.
In summary, we have created a continuously updated online resource for genetic association studies in the field of PD. Synthesizing essentially all available data in the field led to the identification of ITGA8 as a novel potential PD risk locus. Our quantitative approach to data integration across a multitude of different study designs can be readily scaled to include large-scale resequencing efforts that will emerge over the coming years, making the complex field of PD genetics accessible to a broad range of investigators.
Note that the following section only provides a brief summary of the methods applied to our study. A much more detailed description can be found in Text S1.
For inclusion in PDGene, a study has to meet three criteria: 1) It must evaluate the association between a bi-allelic genetic polymorphism (minor allele frequency ≥0.01 in the healthy control population of at least one study) and Parkinson's disease (PD) risk in datasets comprised of both affected (defined as clinically and/or neuropathologically diagnosed “Parkinson's disease”) and unaffected individuals; 2) it must be published in a peer-reviewed journal; 3) it must be published in English. For this manuscript, we also included data on ten SNPs generated in the GEO-PD Consortium datasets ,  and obtained data for the newly identified SNP rs7077361 in ITGA8 from the Japanese GWAS dataset .
In brief, genetic association data of the following studies were excluded from the meta-analyses (see Text S1 for details): family-based studies without available subject-level data (however, unrelated case-control data enriched for familial cases were not excluded), studies investigating only disease controls, multi-allelic polymorphisms, and studies of polymorphisms in mitochondrial DNA. We also excluded genetic data of apparently “poor” quality if discrepancies could not be resolved after contacting the study authors (e.g. inadequate genotyping/sequencing protocols or discrepancies in terms of allele names or frequencies when compared with public databases; more details can be found in Text S1).
Our literature searches until March 31st, 2011, yielded 27,210 articles, which were screened for eligibility using the title, abstract, or full-papers, as necessary. Additional screening of bibliographies in reviews, published meta-analyses, and original genetic association studies were also performed. Overall, full text versions of 1,534 articles were obtained. Following the inclusion and exclusion criteria outlined above, 828 articles were included in PDGene until March 31st 2011 (also see Figure 3).
Figure 3. Flowchart of literature search, data extraction, and analysis strategies applied for PDGene.doi:10.1371/journal.pgen.1002548.g003
Random-effects allelic meta-analyses  were performed if a minimum of four independent datasets existed per polymorphism. Summary odds ratios [ORs] and 95% confidence intervals [CIs] were calculated irrespective of ethnic descent as well as for distinct ethnic groups (i.e. Caucasians, and Asians) if sufficient data were available. In addition, we performed a number of sensitivity analyses (excluding the initial studies and datasets in which HWE was violated in control individuals), systematically assessed between-study heterogeneity (via I2), and assessed the credibility of each at least nominally significant meta-analysis result by calculating Bayes factors (BF; here expressed as log10(BF)="logBF”)  and by determining a grading score developed by the Human Genome Epidemiology Network (HuGENet) , .
Assessment of small-study bias/publication bias.
This is of particular importance in meta-analyses of published association data and was carefully addressed here: First, we added publicly available GWAS data , ,  to the vast majority of SNPs. Since these data are typically unbiased, this should decrease the potential for small-study bias/publication bias. Secondly, for 147 SNPs of the core PDGene meta-analyses that showed statistically suggestive results (P≤0.1), we obtained additional data from all currently published, but not publicly available GWAS datasets, further decreasing a potential impact of small-study bias/publication bias. Thirdly, we directly assessed the evidence for small study bias by applying a recently proposed regression test  on all nominally significant (P<0.05) meta-analysis results. The results of these analyses are fully displayed in Table S1.
We obtained individual-level genotype data for all publicly available PD GWAS datasets from NCBI's “dbGAP” database (a total of three , ,  at the time of the datafreeze, March 31st, 2011). Genotype data were cleaned using standard procedures, followed by imputation of untested genotypes (using reference panels from HapMap and the 1000 Genomes Project), and association analyses incorporating imputation uncertainty (case-control datasets only), age, sex, and population stratification. Overall, this procedure led to a total of 7,723,931 unique SNPs, 7,123,920 of which were present in at least two, and 711,271 in at least three datasets. Meta-analyses (either combining test-statistics and standard errors using random-effects models, or by combining P-values weighted by sample size, see Text S1 for more details) were performed on the 7,123,920 SNPs present in at least two of the GWAS datasets.
After completion of all data-management and analysis steps, all study-specific variables, genotype data (except for GWAS), and meta-analysis plots are posted on a dedicated, publicly available, online adaptation of the PDGene database using the same software and code as our databases for Alzheimer's disease  and schizophrenia . The online database is hosted by the “Alzheimer Research Forum” and can be accessed via its own designated URL (http://www.pdgene.org).
The database software can easily be ported to other genetically complex diseases and will be made available on a collaborative basis to interested researchers upon request.
QQ plots showing the distribution of expected versus observed P-values for the GWAS-only meta-analysis results. Analyses were performed using the METAL software (ref.  in Text S1). The excess of observed P-values (Figure S1, panel 1) is entirely due to association signals in the SNCA, MAPT, LRRK2, and DGKQ/GAK loci as can be seen in Figure S1, panel 2 that showcases the P-value distributions after removal of 18,622 SNPs in these regions (lambda = 1.007).
Forest plots of allelic meta-analyses for SNPs showing genome-wide significant association (P<5×10−8) with PD susceptibility in the March 31st 2011 datafreeze. Study-specific allelic odds ratios (ORs, black squares) and 95% confidence intervals (CIs, lines) were calculated for each included dataset. The summary OR and CI was calculated using random-effects models (grey diamond). Whenever multiple polymorphisms showed genome-wide significant association in the same locus, only the variant with the smallest P-value is listed here for meta-analysis results after stratification for Caucasian and Asian ancestries. For a complete list of meta-analyses performed for the datafreeze, see Table S1. Figure S1, panel 1-S1, panel 12 and S1, panel 13-S1, panel 16 display the SNP showing the most significant genome-wide association in datasets of Caucasian ancestry and Asian ancestry, respectively. Details and references of all included studies displayed here can be found on the PDGene database (http://www.pdgene.org). I2 = estimate of percentage of between-study heterogeneity that is beyond chance, “excl initial” = summary OR and 95%CI after meta-analysis after exclusion of the initial study, C = Caucasian ancestry, A = Asian ancestry, H = Hispanic descent, D = African descent, “•” = initial study (applies to candidate-gene studies), “†” = no data provided or data was not eligible for inclusion in meta-analysis, “‡” = study excluded due to overlap, “#” = HWE violation in controls (P<0.05, not applicable to quality-controlled GWAS datasets, see Text S1), “i” = SNP monomorphic in the respective dataset, “ø” = meta-analysis after excluding initial study not applicable.
Locus plot of the ITGA8 region on chromosome 10p13 (15346353–15801533 bp, hg18). The figure displays association results for ~1,400 SNPs in the ITGA8 region including at least four independent datasets. SNPs are color-coded based on linkage disequilibrium (r2) estimates from the CEU 1000G dataset (release June 2010). All LD estimates refer to the most significantly associated SNP rs7077361. SNPs color-coded in grey indicate missing LD estimates in the CEU dataset. Recombination rates were estimated based on the CEU dataset, and are displayed as blue line in the background. Gene annotations are based on RefSeq and the UCSC Genome browser. Locus plots were generated using the LocusZoom Stand-alone package (http://genome.sph.umich.edu/wiki/LocusZoom_Standalone).
Forest plots of fixed-effect meta-analyses for SNP rs6723108 in the ACMSD/TMEM163 locus and chr6:32609909 in the HLA locus. Symbols are the same as for Figure S2 (see above).
Overview of all 867 polymorphisms meta-analyzed in the March 31st 2011 datafreeze using random-effects allelic models. Random-effects allelic meta-analyses were performed on polymorphisms for which four or more independent datasets were available. Meta-analyses after stratification for different ethnic descent were performed if at least three independent datasets were available in the respective stratum (applicable only to samples of European and Asian descent). Each nominally significant meta-analysis result (P<0.05) was graded according to the HuGENet interim criteria. For details on how these criteria are applied, see Text S1. Meta-analysis results in this table are ordered by genomic location. OR = Odds Ratio, CI = confidence interval, N minor = number of minor alleles, Ethnicities: C = Caucasian, A = Asian, D = African Descent, H = Hispanic, O = Other/Mixed, Low OR = OR<1.15 or ≥0.87, respectively, F = loss of significance in the respective meta-analysis after exclusion of the first study, HWE = loss of significance after excluding studies violating HWE (P<0.05), Regr = evidence for small-study/publication bias using a modified regression test (see Text S1), A = Grade A (‘strong’ epidemiologic credibility), B = Grade B (‘modest’ epidemiologic credibility), C = Grade C (‘weak’ epidemiologic credibility), logBF = Bayes Factor (see Text S1). “*” denotes SNPs that have been supplemented by additional data after the datafreeze (in total this applies to 147 SNPs, see Text S1 for the description of included datasets).
Investigation of the extent of statistical inflation assuming sample overlaps of 1%, 5%, and 10% across cases and controls in datasets originating from the same countries. Hypothetical sample overlap across datasets was assumed between different candidate-gene/replication studies and between candidate-gene/replication studies and GWAS datasets if they originated from the same country. These analyses were performed applying random-effects models and adding the sum of weighted co-variances of overlapping datasets to the overall study variance (see ref.  in the main text). Note that the assumption of undetected overlapping samples does not apply (and was therefore not modeled here) to overlap between individual GWAS as duplicate samples in these datasets were removed prior to meta-analysis. It also does not apply to independent datasets used in the same publication where duplicate samples had been removed by the authors prior to analysis and publication. We emphasize that this table describes hypothetical scenarios, because the geographical origin of each study had been investigated extensively and potentially overlapping datasets had been excluded as part of PDGene's data inclusion protocol. Thus, the extent of overlap across geographically distinct datasets within the same countries is reduced to accidental recruitment of the same subjects more than once in different datasets throughout the respective countries, and can be expected to be less than ~1%. This estimate is based on data of the GEO-PD consortium, where sufficient data were centrally available of 6,072 subjects from 20 geographically distinct sites in 13 countries that had been investigated for potentially duplicate samples across sites, but no duplicate subjects (neither between not within countries) were identified when matching on ethnicity, birth, sex, and genotype. The investigation of overlap was not applicable here for Asian datasets, as they originated from different countries and/or were cleaned by the respective authors prior to publication.
Supplementary material. This file includes supplementary methods and references as well as the list of members of the GWAS consortia, the GEO-PD Consortium, and consortia-specific acknowledgements.
23andMe acknowledges Elizabeth Dorfman, Amy K. Kiefer, Emily M. Drabant, Uta Francke, Joanna L. Mountain, David Hinds, and Anne Wojcicki from 23andMe, as well as Samuel M. Goldman, Caroline M. Tanner, and J. William Langston from the Parkinson's Institute, Sunnyvale, CA, USA. We also acknowledge the contribution of Mitsutoshi Yamamoto, Nobutaka Hattori, and Miho Murata for sample collection in the Japanese GWAS 1.0 . We are grateful to the Alzheimer Research Forum—in particular to June Kinoshita, Colin Knep, Paula Noyes, and Gabrielle Ströbel—for hosting PDGene on their website. We also thank the many PD researchers who have kindly provided us with genotype data and helpful information beyond those included in the original publications. Finally, we would like to thank the many PD patients and control subjects who volunteered to participate in the individual studies.
Conceived and designed the experiments: CM Lill, MB McQueen, JPA Ioannidis, L Bertram. Performed the experiments: CM Lill, JT Roehr, S Bagade, B-M Schjeide, E Meissner, U Zauft, NC Allen, KJ Anderson, G Beecham, D Berg, JM Biernacka, A Brice, AL DeStefano, CB Do, N Eriksson, SA Factor, MJ Farrer, T Foroud, T Gasser, T Hamza, JA Hardy, P Heutink, C Klein, JC Latourelle, DM Maraganore, ER Martin, M Martinez, RH Myers, H Payami, WK Scott, M Sharma, AB Singleton, K Stefansson, T Toda, JY Tung, J Vance, NW Wood, CP Zabetian, 23andMe, GEO-PD, IPDGC, Parkinson's Disease GWAS, WTCC2. Analyzed the data: CM Lill, JT Roehr, MB McQueen, FK Kavvoura, L Bertram. Wrote the paper: CM Lill, JPA Ioannidis, L Bertram. Helped write the manuscript: E Meissner, MJ Farrer, T Foroud, T Gasser, C Klein, DM Maraganore, H Payami, AB Singleton, M Sharma, F Zipp, H Lehrach. Helped analyze the data: S Bagade, T Liu, M Schilling, CB Do, N Eriksson, T Hamza, EM Hill-Burns, MA Nalls, N Pankratz, W Satake, M Sharma. Interpretation of results: CM Lill, JPA Ioannidis, L Bertram. Study coordination: CM Lill, T Foroud, JA Hardy, H Payami, AB Singleton, P Young, RE Tanzi, MJ Khoury, F Zipp, H Lehrach, JPA Ioannidis, L Bertram. Literature searches and data entry: CM Lill, S Bagade, B-M Schjeide, E Meissner, U Zauft, N Allen.
- 1. de Lau LML, Breteler MMB (2006) Epidemiology of Parkinson's disease. Lancet Neurol 5: 525–535. doi:10.1016/S1474-4422(06)70471-9.
- 2. Hardy J, Lewis P, Revesz T, Lees A, Paisan-Ruiz C (2009) The genetics of Parkinson's syndromes: a critical review. Curr Opin Genet Dev 19: 254–265. doi:10.1016/j.gde.2009.03.008.
- 3. Vilariño-Güell C, Wider C, Ross OA, Dachsel JC, Kachergus JM, et al. (2011) VPS35 mutations in Parkinson disease. Am J Hum Genet 89: 162–167. doi:10.1016/j.ajhg.2011.06.001.
- 4. Zimprich A, Benet-Pagès A, Struhal W, Graf E, Eck SH, et al. (2011) A mutation in VPS35, encoding a subunit of the retromer complex, causes late-onset Parkinson disease. Am J Hum Genet 89: 168–175. doi:10.1016/j.ajhg.2011.06.008.
- 5. Chartier-Harlin M-C, Dachsel JC, Vilariño-Güell C, Lincoln SJ, Leprêtre F, et al. (2011) Translation initiator EIF4G1 mutations in familial Parkinson disease. Am J Hum Genet 89: 398–406. doi:10.1016/j.ajhg.2011.08.009.
- 6. Maraganore DM, de Andrade M, Elbaz A, Farrer MJ, Ioannidis JP, et al. (2006) Collaborative analysis of alpha-synuclein gene promoter variability and Parkinson disease. JAMA 296: 661–670. doi:10.1001/jama.296.6.661.
- 7. Zabetian CP, Yamamoto M, Lopez AN, Ujike H, Mata IF, et al. (2009) LRRK2 mutations and risk variants in Japanese patients with Parkinson's disease. Mov Disord 24: 1034–1041. doi:10.1002/mds.22514.
- 8. Goris A, Williams-Gray CH, Clark GR, Foltynie T, Lewis SJG, et al. (2007) Tau and alpha-synuclein in susceptibility to, and dementia in, Parkinson's disease. Ann Neurol 62: 145–153. doi:10.1002/ana.21192.
- 9. Sidransky E, Nalls MA, Aasly JO, Aharon-Peretz J, Annesi G, et al. (2009) Multicenter analysis of glucocerebrosidase mutations in Parkinson's disease. N Engl J Med 361: 1651–1661. doi:10.1056/NEJMoa0901281.
- 10. Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ, et al. (2005) High-resolution whole-genome association study of Parkinson disease. Am J Hum Genet 77: 685–693. doi:10.1086/496902.
- 11. Fung H-C, Scholz S, Matarin M, Simón-Sánchez J, Hernandez D, et al. (2006) Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol 5: 911–916. doi:10.1016/S1474-4422(06)70578-6.
- 12. Pankratz N, Wilk JB, Latourelle JC, DeStefano AL, Halter C, et al. (2009) Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Hum Genet 124: 593–605. doi:10.1007/s00439-008-0582-9.
- 13. Simón-Sánchez J, Schulte C, Bras JM, Sharma M, Gibbs JR, et al. (2009) Genome-wide association study reveals genetic risk underlying Parkinson's disease. Nat Genet 41: 1308–1312. doi:10.1038/ng.487.
- 14. Satake W, Nakabayashi Y, Mizuta I, Hirota Y, Ito C, et al. (2009) Genome-wide association study identifies common variants at four loci as genetic risk factors for Parkinson's disease. Nat Genet 41: 1303–1307. doi:10.1038/ng.485.
- 15. Edwards TL, Scott WK, Almonte C, Burt A, Powell EH, et al. (2010) Genome-wide association study confirms SNPs in SNCA and the MAPT region as common risk factors for Parkinson disease. Ann Hum Genet 74: 97–109. doi:10.1111/j.1469-1809.2009.00560.x.
- 16. Hamza TH, Zabetian CP, Tenesa A, Laederach A, Montimurro J, et al. (2010) Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson's disease. Nat Genet 42: 781–785. doi:10.1038/ng.642.
- 17. Spencer CCA, Plagnol V, Strange A, Gardner M, Paisan-Ruiz C, et al. (2011) Dissection of the genetics of Parkinson's disease identifies an additional association 5′ of SNCA and multiple associated haplotypes at 17q21. Hum Mol Genet 20: 345–353. doi:10.1093/hmg/ddq469.
- 18. Saad M, Lesage S, Saint-Pierre A, Corvol J-C, Zelenika D, et al. (2011) Genome-wide association study confirms BST1 and suggests a locus on 12q24 as the risk loci for Parkinson's disease in the European population. Hum Mol Genet 20: 615–627. doi:10.1093/hmg/ddq497.
- 19. Simón-Sánchez J, van Hilten JJ, van de Warrenburg B, Post B, Berendse HW, et al. (2011) Genome-wide association study confirms extant PD risk loci among the Dutch. Eur J Hum Genet 19: 655–661. doi:10.1038/ejhg.2010.254.
- 20. Evangelou E, Maraganore DM, Ioannidis JPA (2007) Meta-analysis in genome-wide association datasets: strategies and application in Parkinson disease. PLoS ONE 2: e196. doi:10.1371/journal.pone.0000196.
- 21. Nalls MA, Plagnol V, Hernandez DG, Sharma M, Sheerin U-M, et al. (2011) Imputation of sequence variants for identification of genetic risks for Parkinson's disease: a meta-analysis of genome-wide association studies. Lancet 377: 641–649. doi:10.1016/S0140-6736(10)62345-8.
- 22. Do CB, Tung JY, Dorfman E, Kiefer AK, Drabant EM, et al. (2011) Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease. PLoS Genet 7: e1002141. doi:10.1371/journal.pgen.1002141.
- 23. Sharma M, Ioannidis JPA, Aasly JO, Annesi G, Brice A, et al. (n.d.) Large-scale replication and heterogeneity in Parkinson disease genetic loci. Neurology in press.
- 24. Lin D-Y, Sullivan PF (2009) Meta-analysis of genome-wide association studies with overlapping subjects. Am J Hum Genet 85: 862–872. doi:10.1016/j.ajhg.2009.11.001.
- 25. Ioannidis JPA (2008) Effect of formal statistical significance on the credibility of observational associations. Am J Epidemiol 168: 374–383; discussion 384–390. doi:10.1093/aje/kwn156.
- 26. Ioannidis JPA, Boffetta P, Little J, O'Brien TR, Uitterlinden AG, et al. (2008) Assessment of cumulative evidence on genetic associations: interim guidelines. Int J Epidemiol 37: 120–132. doi:10.1093/ije/dym159.
- 27. Khoury MJ, Bertram L, Boffetta P, Butterworth AS, Chanock SJ, et al. (2009) Genome-wide association studies, field synopses, and the development of the knowledge base on genetic variation and human diseases. Am J Epidemiol 170: 269–279. doi:10.1093/aje/kwp119.
- 28. Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26: 2204–2207. doi:10.1093/bioinformatics/btq351.
- 29. Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, et al. (2007) A survey of genetic human cortical gene expression. Nat Genet 39: 1494–1499. doi:10.1038/ng.2007.16.
- 30. Varnum-Finney B, Venstrom K, Muller U, Kypta R, Backus C, et al. (1995) The integrin receptor alpha 8 beta 1 mediates interactions of embryonic chick motor and sensory neurons with tenascin-C. Neuron 14: 1213–1222. doi: 10.1016/0896-6273(95)90268-6
- 31. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7: 177–188. doi: 10.1016/0197-2456(86)90046-2
- 32. Harbord RM, Egger M, Sterne JAC (2006) A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med 25: 3443–3457. doi:10.1002/sim.2380.
- 33. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE (2007) Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39: 17–23. doi:10.1038/ng1934.
- 34. Allen NC, Bagade S, McQueen MB, Ioannidis JPA, Kavvoura FK, et al. (2008) Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat Genet 40: 827–834. doi:10.1038/ng.171.