Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Pooled Genome-Wide Association Study of Asperger Syndrome

  • Varun Warrier ,

    vw260@medschl.cam.ac.uk (VW); sb205@cam.ac.uk (SB-C)

    ‡ These authors are joint first authors on this work.

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Bhismadev Chakrabarti ,

    ‡ These authors are joint first authors on this work.

    Affiliations Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom, School of Psychology and Clinical Language Sciences, Centre for Integrative Neuroscience and Neurodynamics, University of Reading, Reading, United Kingdom

  • Laura Murphy,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Allen Chan,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Ian Craig,

    Affiliation MRC Centre for Social, Genetic and Developmental Psychiatry, King’s College London, Institute of Psychiatry, London, United Kingdom

  • Uma Mallya,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Silvia Lakatošová,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Karola Rehnstrom,

    Affiliation The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom

  • Leena Peltonen †,

    † Deceased.

    Affiliation The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, United Kingdom

  • Sally Wheelwright,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Carrie Allison,

    Affiliation Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom

  • Simon E. Fisher,

    Affiliations Max Planck Institute for Psycholinguistics, 6500 AH, Nijmegen, The Netherlands, Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands

  • Simon Baron-Cohen

    vw260@medschl.cam.ac.uk (VW); sb205@cam.ac.uk (SB-C)

    Affiliations Autism Research Centre, Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom, CLASS Clinic, Cambridgeshire and Peterborough NHS Foundation Trust (CPFT), Cambridge, United Kingdom

Abstract

Asperger Syndrome (AS) is a neurodevelopmental condition characterized by impairments in social interaction and communication, alongside the presence of unusually repetitive, restricted interests and stereotyped behaviour. Individuals with AS have no delay in cognitive and language development. It is a subset of Autism Spectrum Conditions (ASC), which are highly heritable and has a population prevalence of approximately 1%. Few studies have investigated the genetic basis of AS. To address this gap in the literature, we performed a genome-wide pooled DNA association study to identify candidate loci in 612 individuals (294 cases and 318 controls) of Caucasian ancestry, using the Affymetrix GeneChip Human Mapping version 6.0 array. We identified 11 SNPs that had a p-value below 1x10-5. These SNPs were independently genotyped in the same sample. Three of the SNPs (rs1268055, rs7785891 and rs2782448) were nominally significant, though none remained significant after Bonferroni correction. Two of our top three SNPs (rs7785891 and rs2782448) lie in loci previously implicated in ASC. However, investigation of the three SNPs in the ASC genome-wide association dataset from the Psychiatric Genomics Consortium indicated that these three SNPs were not significantly associated with ASC. The effect sizes of the variants were modest, indicating that our study was not sufficiently powered to identify causal variants with precision.

Introduction

Asperger Syndrome (AS) is a neurodevelopmental condition and a subset of Autism Spectrum Conditions (ASC) [1]. Individuals with ASC have difficulties in social interaction and communication, alongside unusually repetitive and stereotyped behaviour and unusually narrow interests. In AS, language and cognitive development proceed on time. ASC is highly heritable [2], with monozygotic twin heritability rate estimated between 73 – 95% [3], and has a prevalence of approximately 1% [4]. ASC is characterized by high clinical and aetiological heterogeneity. Environmental, epigenetic and genetic factors have been implicated in ASC [3, 57]. Currently, more than 660 genes are implicated in ASC (https://gene.sfari.org/autdb/HG_Home.do), though no single gene or variant accounts for more than 1–2% of cases [5,8]. Additionally, several large copy number variants that duplicate or delete multiple genes have also been identified in association with ASC [3,9].

Though no common variant has been consistently associated with ASC across multiple Genome-wide Association Studies (GWAS) [3,5], it is clear now that they contribute considerably to the variation in ASC [10, 11]. Two recent studies have identified that common inherited variation contributes to between 40 – 60% of the variance in ASC [10, 11]. However, despite the majority of variance attributable to common inherited variants, as explained earlier, genome-wide association studies have failed to consistently identify causative variations. One explanation for this lack of success is that genome-wide association studies in ASC may be underpowered to detect small effect sizes; the largest ASC GWAS had less than 9000 participants (cases and controls) and, although this seems large, it has been argued that much larger sample sizes are needed (in the range of tens and hundreds of thousands) to successfully identify causative variants [9]. An alternative view is that the inability to consistently identify causative common variants is due to the underlying genetic and phenotypic heterogeneity. At a phenotypic level, delay and difficulties in language development is an important source of heterogeneity in ASC. Language delay in individuals with ASC is associated in changes in brain volume in both total grey matter and in specific regions in the brain [12]. This different brain architecture points to different biological and genetic networks involved in different forms of ASC. As mentioned earlier, AS is a subset of ASC where individuals have no language delay, suggesting it may have a genetic architecture distinct from the rest of ASC.

Only a few studies have specifically investigated the genetics of AS. In one of the first such studies, we tested for associations between 216 SNPs across 68 candidate genes. We identified nominal associations between SNPs in 14 genes and AS [13]. In the current study, we performed a pooled DNA genome wide association in individuals with AS and controls to identify SNPs in a hypothesis-free way. DNA pooling is a rapid, efficient and economical method to identify genetic associations in various conditions [14]. We hypothesized that genome-wide DNA pooling would detect the differences in allele frequencies between individuals with AS and controls. SNPs whose p-values were below a pre-defined threshold were then individually genotyped in the same sample using an established approach reported in several previous studies [1517].

Methods

Participants

612 individuals were genotyped in the pooled genotyping stage. There were 294 cases (males = 254, females = 40, reflecting the male bias in AS [5]) and 318 controls (males = 250, females = 68). 607 of these individuals were individually genotyped. 5 (1 case and 4 controls) individuals were not genotyped at this stage due to poor DNA quality. All participants reported Caucasian ancestry for at least 2 generations. All cases were recruited from the Cambridge Autism Research Database (CARD) at www.autismresearchcentre.com, and reported that they had a clinical diagnosis of AS according to DSM IV or ICD-10 criteria. Clinical diagnostic assessment was done by independent clinicians. Control participants were recruited through advertisement and reported that they were free of psychiatric and neurological conditions. Written consent was obtained from all participants. Ethical approval was obtained from the National Health Service Research Ethics Service (NRES).

Pooled DNA Genotyping

DNA from each participant was extracted from buccal swabs and anonymized. DNA was then suspended in Tris-EDTA and quantified using PicoGreen double-stranded DNA quantification reagent (Invitrogen, USA). 100 ng of DNA from each individual was added to their respective pool. The cases were divided into 7 pools with 5 pools for males and 2 pools for females. On average, there were 42 participants in each pool, though the numbers ranged from 12 to 59. Two additional pools with 24 female cases and 44 male cases were genotyped, but were not included in the analysis or taken forward for individual genotyping due to DNA contamination. The controls were divided into 9 pools with an average of 35 participants per pool. The number of participants per control pool ranged from 14 participants to 57.

Genotyping was performed using the Affymetrix GeneChip Human Mapping version 6.0 array (Affymetrix, California, USA) according to the protocol recommended by the company. Cell intensity (.cel) files were generated using GeneChip Scanner 3000 7g. The files generated were converted into relative allele signal scores (RAS) using a custom made script (snpmap.R [18]).

To test for differences in allele frequencies for each SNP between the cases and the controls, independent t-tests (equal variance assumed) were performed using the mean RAS scores from the pools. In addition, Levene’s test was performed to check for equality of variance. A threshold of significance was chosen a priori at p = 1 x 10−5. This particular threshold was chosen in order to reduce the risk of false negatives due to the loss of power from DNA pooling [19*] and this is a threshold that is typically used in the discovery phase of GWAS studies. All SNPs were screened for quality control. The study design had approximately 38% power to detect variants with an effect of 1.3 for the given threshold of significance after taking power loss due to DNA pooling into consideration. The frequency of both the marker and the effect allele was 0.5 for the power calculation. SNPs were rejected if they had a minor allele frequency (MAF) below 0.01 in the Caucasian population according to the HapMap project, and if the coefficient of variation (calculated as SD/mean) in more than 50% of the pool was greater than 0.2. All SNPs which passed quality control and had a p-value below the threshold of significance were taken forward for individual genotyping to verify the result from the pooled association. Nominally significant SNPs in the individual genotyping stage were further investigated using summary genome-wide association data of the ASC cohort available from the Psychiatric Genomics Consortium (PGC, http://www.med.unc.edu/pgc/). The PGC analysed genome-wide SNPs using DNA from 161 cases, 526 controls, 4788 trio cases and 4788 trio pseudocontrols, all of Caucasian ancestry. A crucial difference in the PGC cohort from our study cohort is that the PGC cohort did not stratify for language delay (and hence includes cases of autistic disorder/childhood autism as well as AS). Additional details of methods, statistical analyses and participant ancestry are provided elsewhere [20*].

Individual Genotyping and functional annotation

Individual genotyping was performed by Geneservices UK Ltd using the Sequenom MassARRAY iPLEX platform (Sequenom, San Diego, USA). 5 (1 case and 4 controls) individuals who were genotyped at the pooled DNA analysis stage were not included in this stage due to poor DNA quality. Total genotyping rate was 97%. MAF in the genotyped sample for all the SNPs was above 0.05. Allelic association testing was performed using Plink v1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/) [21]. Functional annotation was performed using Haploreg v2 (http://www.broadinstitute.org/mammals/haploreg/) [22] and SNPnexus (http://www.snp-nexus.org/) [23].

Validation of DNA pooling and replication

To assess the accuracy of DNA pooling in predicting differences in allele frequency, we individually genotyped 12 random SNPs in all the participants. This includes 11 SNPs that did not reach the predefined threshold in the pooled association stage and one SNP, rs7785891, which did reach the threshold. Pearson’s correlation coefficient between the mean RAS scores and the allele frequency was calculated at r = 0.65. This correlation is considerably higher than another study that used pooled DNA obtained from cheek swabs on the same platform, though lower than the correlation reported for DNA obtained from blood samples [24].

Results

In the DNA-pooling stage, 11 SNPS passed the threshold of significance and quality control. Additionally, 5 SNPs with p-values below 1 x 10−5 failed quality control at the pooling stage (Fig 1). All the 11 SNPs were individually genotyped and all the SNPs passed quality control in the individual genotyping phase. Three SNPs were nominally significant at p <0.05 in this stage (rs7785891, rs1268055, and rs2782448). None of the SNPs survived correction for multiple testing using Bonferroni correction. None of these three nominally significant SNPs from the individual genotyping stage were significant in the PGC ASC dataset. Results are summarised in Table 1. A Q-Q plot of the results from the pooling stage is provided in Fig 2.

thumbnail
Fig 1. Manhattan plot of the SNPs tested in the pooled DNA association stage.

https://doi.org/10.1371/journal.pone.0131202.g001

thumbnail
Fig 2. Quantile-quantile plot of the SNPs tested in the pooled DNA association stage.

https://doi.org/10.1371/journal.pone.0131202.g002

Discussion

The current study used pooled DNA analysis to identify common variants associated with AS. Using pooled DNA we scanned the genome for SNPs that had a difference in allele frequencies between the case groups and the control groups. A threshold of 1 x 10−5 was selected in the pooling stage due to the loss of power during DNA pooling and to control for false negatives. SNPs which had p-values below the pre-defined threshold were treated as candidate SNPs and genotyped individually in the same group of individuals to more accurately estimate allele frequencies. Of the 11 SNPs that crossed the threshold of significance in the pooling stage, only three remained nominally significant after the individual genotyping stage. These three SNPs were not significantly associated with ASC in a larger, more heterogeneous ASC cohort from the PGC consortium.

rs778589, the top performing SNP at the pooling stage, is an intronic SNP in DOCK4, a gene previously associated with ASC [25, 26]. rs2782448 is an intergenic SNP at 13q21. It is 371 kb from KLHL1 and 7.5 kb from the 3’ end of RP11- 459J23.1, a LincRNA identified by the Gencode project. 13q21 has been previously implicated in both autism [27, 28] and Specific Language Impairment [29]. The third nominally significant SNP, rs1268055, is an intronic SNP in ARMC2, a gene with uncertain function in humans.

The major limitation of this study is power. First, DNA pooling retains only 68% percent of the power [19]. Second, even after only selecting for individuals with AS, no SNP remained significant after Bonferroni correction. This indicates that larger sample sizes are required to detect causative alleles of small effect sizes. There is a considerably high correlation between the two stages of analysis, yet of the eleven SNPs selected for individual genotyping, only three remained nominally significant at this stage. The top two associated SNPs that passed quality control in the pooled DNA analysis stage were both nominally significant at the individual genotyping stage. However, they did not remain significant after correction for multiple comparisons. Of the three nominally significant alleles, rs7785891 has an odds ratio above 1, whereas rs1268055 and rs2782448 have odds ratio below 1 (see Table 1). Finally, while all our participants reported Caucasian ancestry for at least two generations, population stratification can confound the results and lead to false positives [30]. There are currently no known methods to correct for population stratification for pooled DNA association studies, taking into account the polygenicity of the condition.

While the current study tested for association with AS, we also checked to see if the three nominally significant SNPs were significant in an ASC cohort. The direction of effect for all three SNPs was similar to the effect direction in our sample. However, none of the three SNPs were nominally significant in the PGC ASC cohort. This may be due to a) the heterogeneity of the PGC cohort compared to our study cohort, since the former did not stratify for language delay, and/or b) the design of the association study being different (a family based association study based on trios vs a population based study), which may lead to different signal-to-noise ratios. However, it needs to be highlighted that the effect sizes for the SNPs in both the samples were small. This underscores the need for larger sample sizes to effectively identify common variants.

In conclusion, we report the identification three SNPs (rs1268055, rs7785891 and rs2782448) as nominally associated with AS using a genome-wide pooled DNA association study. rs2782448 and rs1268055 lie in genetic loci previously implicated in ASC. None of the SNPs remained significant after Bonferroni correction, underscoring the need for larger sample sizes to uncover alleles with small effect sizes. This is the first genome-wide case-control association study to test common variants for association with AS.

Acknowledgments

We are grateful to Jon Breidbord, Lindsey Kent and Frank Dudbridge for help, advice, and discussions. Professor Peltonen kindly facilitated the collaboration with the Wellcome Trust Sanger Institute, but tragically passed away before the study was completed.

Author Contributions

Conceived and designed the experiments: BC IC LP SB-C. Performed the experiments: UM SW CA KR BC. Analyzed the data: VW AC SL LM BC. Contributed reagents/materials/analysis tools: SEF SB-C. Wrote the paper: VW BC SB-C.

References

  1. 1. American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders: DSM-IV. 4th edition. Washington, DC: American Psychiatric Association; 1994.
  2. 2. Hallmayer J, Cleveland S, Torres A, Phillips J, Cohen B, Torigoe T, et al. Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry. 2011;68: 1095–102. pmid:21727249
  3. 3. Persico AM, Napolioni V. Autism genetics. Behav Brain Res. 2013; 251:95–112. pmid:23769996
  4. 4. Baron-Cohen S, Scott FJ, Allison C, Williams J, Bolton P, Matthews FE, et al. Prevalence of autism-spectrum conditions: UK school-based population study. Br J Psychiatry. 2009;194: 500–9. pmid:19478287
  5. 5. Lai M-C, Lombardo MV, Baron-Cohen S. Autism. Lancet. 2013;13: 61539–1
  6. 6. Mbadiwe T, Millis RM. Epigenetics and Autism. Autism Res Treat. 2013;2013: 826156.
  7. 7. Samaco RC, Hogart A, LaSalle JM. Epigenetic overlap in autism-spectrum neurodevelopmental disorders: MECP2 deficiency causes reduced expression of UBE3A and GABRB3. Hum Mol Genet. 2005;14: 483–92. pmid:15615769
  8. 8. Bill BR, Geschwind DH. Genetic advances in autism: heterogeneity and convergence on shared pathways. Curr Opin Genet Dev. 2009;19: 271–8. pmid:19477629
  9. 9. Geschwind DH. Advances in autism. Annu Rev Med. 2009;60: 367–80. pmid:19630577
  10. 10. Klei L, Sanders SJ, Murtha MT, Hus V, Lowe JK, Willsey AJ, et al. Common genetic variants, acting additively, are a major source of risk for autism. Mol Autism. 2012;3:9. pmid:23067556
  11. 11. Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP, Lee AB, et al. Most genetic risk for autism resides with common variation. Nat Genet. 2014;46:881–5. pmid:25038753
  12. 12. Lai MC, Lombardo MV, Ecker C, Chakrabarti B, Suckling J, Bullmore ET, et al. Neuroanatomy of Individual Differences in Language in Adult Males with Autism. Cereb Cortex. 2014. [Epub ahead of print].
  13. 13. Chakrabarti B, Dudbridge F, Kent L, Wheelwright S, Hill-Cawthorne G, Allison C, et al. Genes related to sex steroids, neural growth, and social-emotional behavior are associated with autistic traits, empathy, and Asperger syndrome. Autism Res. 2009;2: 157–77. pmid:19598235
  14. 14. Szelinger S, Pearson JV, Craig DW. Microarray-based genome-wide association studies using pooled DNA. Methods Mol Bio. 2011;700: 49–60.
  15. 15. Butcher LM, Meaburn E, Liu L, Fernandes C, Hill L, Al-Chalabi A, et al. Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav Genet. 2004;34:549–55. pmid:15319578
  16. 16. Butcher LM, Meaburn E, Knight J, Sham PC, Schalkwyk LC, Craig IW, et al. SNPs, microarrays and pooled DNA: identification of four loci associated with mild mental impairment in a sample of 6000 children. Hum Mol Genet. 2005;4:1315–25.
  17. 17. Meaburn EL, Harlaar N, Craig IW, Schalkwyk LC, Plomin R. Quantitative trait locus association scan of early reading disability and ability using pooled DNA and 100K SNP microarrays in a sample of 5760 children. Mol Psychiatry. 2008;13:729–40. pmid:17684495
  18. 18. Davis OSP, Plomin R, Schalkwyk LC. The SNPMaP package for R: a framework for genome-wide association using DNA pooling on microarrays. Bioinformatics. 2009;25: 281–3. pmid:19008252
  19. 19. Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, Clayton DG. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann Hum Genet. 2002;66: 393–405. pmid:12485472
  20. 20. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 2013;381:1371–9 pmid:23453885
  21. 21. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81: 559–75. pmid:17701901
  22. 22. Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40: D930–4. pmid:22064851
  23. 23. Chelala C, Khan A, Lemoine NR. SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics, 2008;25: 655–661. pmid:19098027
  24. 24. Schosser A, Pirlo K, Gaysina D, Cohen-Woods S, Schalkwyk LC, Elkin A, Korszun A, Gunasinghe C, Gray J, Jones L, Meaburn E, Farmer, AE, Craig IW, McGuffin P. Utility of the pooling approach as applied to whole genome association scans with high-density Affymetrix microarrays. BMC Res Notes. 2010;3:274 pmid:21040578
  25. 25. Pagnamenta AT, Bacchelli E, de Jonge MV, Mirza G, Scerri TS, Minopoli F, et al. Characterization of a family with rare deletions in CNTNAP5 and DOCK4 suggests novel risk loci for autism and dyslexia. Biol Psychiatry. 2010;4: 320–8
  26. 26. Maestrini E, Pagnamenta AT, Lamb JA, Bacchelli E, Sykes NH, Sousa I, et al. High-density SNP association study and copy number variation analysis of the AUTS1 and AUTS5 loci implicate the IMMP2L-DOCK4 gene region in autism susceptibility. Mol Psychiatry. 2010;9: 954–68.
  27. 27. Bartlett CW, Flax JF, Logue MW, Smith BJ, Vieland VJ, Tallal P, et al. Examination of potential overlap in autism and language loci on chromosomes 2, 7, and 13 in two independent samples ascertained for specific language impairment. Hum Hered. 2004;57: 10–20. pmid:15133308
  28. 28. Talebizadeh Z, Arking DE, Hu VW. A Novel Stratification Method in Linkage Studies to Address Inter- and Intra-Family Heterogeneity in Autism. PloS One. 2013;8: e67569. pmid:23840741
  29. 29. Bartlett CW, Flax JF, Logue MW, Vieland VJ, Bassett AS, Tallal P, et al. A major susceptibility locus for specific language impairment is located on 13q21. Am J Hum Genet. 2002;71: 45–55. pmid:12048648
  30. 30. Tian C, Gregersen PK, Seldin MF. Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet. 2008;17:R143–50. pmid:18852203