Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Assessing the Impact of Copy Number Variants on miRNA Genes in Autism by Monte Carlo Simulation

  • Maurizio Marrale,

    Affiliation Dipartimento di Fisica e Chimica, Università di Palermo, Palermo, Italy

  • Nadia Ninfa Albanese,

    Affiliation Dipartimento di Fisica e Chimica, Università di Palermo, Palermo, Italy

  • Francesco Calì,

    Affiliation U.O.C. di Genetica Medica Laboratorio di Genetica Molecolare, Associazione Oasi Maria SS. (I.R.C.C.S.), Troina, Italy

  • Valentino Romano

    valentino.romano@unipa.it

    Affiliations Dipartimento di Fisica e Chimica, Università di Palermo, Palermo, Italy, U.O.C. di Genetica Medica Laboratorio di Genetica Molecolare, Associazione Oasi Maria SS. (I.R.C.C.S.), Troina, Italy

Abstract

Autism Spectrum Disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins. Previous studies have investigated the role of de novo Copy Number Variants (CNVs) and microRNAs as important but distinct etiological factors in ASD. We developed a novel computational procedure to assess the potential pathogenic role of microRNA genes overlapping de novo CNVs in ASD patients. Here we show that for chromosomes # 1, 2 and 22 the actual number of miRNA loci affected by de novo CNVs in patients was found significantly higher than that estimated by Monte Carlo simulation of random CNV events. Out of 24 miRNA genes over-represented in CNVs from these three chromosomes only hsa-mir-4436b-1 and hsa-mir-4436b-2 have not been detected in CNVs from non-autistic subjects as reported in the Database of Genomic Variants. Altogether the results reported in this study represent a first step towards a full understanding of how a dysregulated expression of the 24 miRNAs genes affect neurodevelopment in autism. We also propose that the procedure used in this study can be effectively applied to CNVs/miRNA genes association data in other genomic disorders beyond autism.

Introduction

The Autism Spectrum Disorders (ASDs, MIM: 209850) are a heterogeneous group of childhood diseases characterized by abnormalities in social behaviour and communication, as well as patterns of restricted and repetitive behaviors [1]. The spectrum of autism reflects dimensional variability of each core symptom as well as the occurrence of co-morbid conditions (intellectual disability, epilepsy, dysmorphisms etc.). Like an epidemic, in part due to improved diagnostic tools, the global prevalence of autism and other pervasive developmental disorders has increased over the years reaching the current, impressive, figure of 1/160 children [2]. Twin studies have demonstrated a much higher concordance rates for the disease in monozygotic twins (92%) than in dizygotic twins (10%) [3], [4], indicating a strong genetic basis for autism susceptibility, also supported by the presence of autistic features in several monogenic disorders (e.g., Fragile X syndrome, Tuberous sclerosis). However, despite these progresses the identity of genetic factors still remains unknown in the majority of patients and it is likely, that overall, the causes of autism are more complex than previously thought involving an interaction of genetic, epigenetic and environmental factors all interfering with the normal course of neurodevelopment [5][7].

In 2007 Copy Number Variants (CNVs) were for the first time recognized as important genetic factors in ASD [8]. CNVs are a class of inherited or de novo genomic mutations duplicating (“Gains”) or deleting (“Losses”) DNA segments Kb, thus altering the normal dosage of the overlapping genes. According to their frequency they can be divided into unique, rare () or recurrent. Since 2007, many studies have investigated CNVs in autism, to assess their functional impact, the biological networks where the genes in these CNVs are involved and the general burden of CNVs in these individuals (e.g. see references [9][11]). Extended, multiplex families are more likely to carry heritable risk factors while, in contrast, sporadic ASD families have demonstrated a higher rate of de novo CNVs [8], [12]. A striking observation was the high prevalence of de novo CNVs in both sporadic and familial cases of ASD compared with controls. Rare, de novo or inherited CNVs were observed in 5–10% of idiopathic ASD cases.

Studies performed so far have highlighted the pathogenic role of CNVs in terms of dosage change for protein-coding genes [10], [13][15], without taking into account the potential involvement of non-coding RNA, particularly microRNAs genes (see reference [16] for a recent exception).

miRNAs are an important class of post-trascriptional regulators each governing the expression of tens or even hundreds of proteins in both differentiated cells and during development. miRNA transcripts undergo several processing steps occurring first in the nucleus, then in the cytoplasm where the mature 22–25 nucleotide-long miRNA [17], [18] enacts mRNA translational repression or cleavage by binding to the 3′-untranslated region of their respective target mRNAs [19]. Only a few studies have investigated the ASD transcriptome in post-mortem brain samples, so far [20][22]. Other studies have used mRNA from lymphoblastoid cell lines or peripheral blood cells isolated from patients [23][26]. Disruption of miRNA expression has been repeatedly reported in several microarray studies and believed to be linked to pathogenesis in autism [27][31]. However, lymphoblastoid cell lines are not the best proxy for neural tissue and only very few miRNAs displayed consistent dysregulation among different studies and patients.

In this paper, we aim to expand our knowledge on the pathogenic role of miRNA in autism by investigating the associations between de novo CNVs and miRNA genes. For this reason we developed a novel and powerful computational procedure based on Monte Carlo randomization [32] and applied it to several published de novo CNV datasets from patients. Our positive findings consist in the identification of 24 miRNA genes over-represented in de novo CNV from chromosomes nos. 1, 2 and 22 and therefore likely to play a pathogenic role in autism.

Results

Study design and preliminary analyses

The general strategy and steps of our study are outlined in the flow-chart of Figure 1. We developed the MAPCNVMIR programme to map all human miRNA genes from miRBase to 178 de novo CNVs from 192 autistic patients (“APL datasets”, see Table S1). Only 64 out of 178 CNVs were found to overlap at least one miRNA gene. In addition, 145 miRNA genes were identified within distinct or partly overlapping de novo CNVs (Table S1) spread over 20 chromosomes. No miRNA genes included in de novo CNVs were detected in chromosomes # 11, 13, 14, Y.

thumbnail
Figure 1. Overview of this study and source data.

In this study we have used previously published data from 192 autistic patients (the “APL datasets” of Table S1) bearing overall 178 de novo CNVs (118 CNV_Losses and 60 CNV_Gains) with unique start and end positions.

https://doi.org/10.1371/journal.pone.0090947.g001

For each chromosome we compared the number and the length of all human microRNA genes reported so far in miRBase with the length of the chromosome. The fractional length of miRNA genes over the size of the chromosome (R1 ratios) are of the order of for all chromosomes except for chromosome 19 () (see Table 1). This result shows that the fraction of the length of the chromosome covered by microRNA genes is very similar for all chromosomes (except chr. 19). On the other hand, the fractional length of CNVs over the size of the chromosome (R2 ratios),computed separately for CNV_Gains and CNV_Losses (Table 1) using the MAPCNVMIR software developed by us (see below), largely differ among the various chromosomes, from a value of 0.0006 up to value of 0.2618. This highlights that the fractional length of chromosomes covered by CNVs can be very dissimilar among chromosomes. Indeed, there are several instances (e.g., chr. # 10, 12, 15 and 22 for CNV_Gains and chr. # 7, 12, 16, 18, 21 and 22 for CNV_Losses) of CNVs affecting large regions of a particular chromosome. Therefore, an analysis was performed between the number of miRNA genes overlapping de novo CNVs and the R2 ratios in order to find possible correlations. Furthermore, the number of miRNA genes is also variable. Figures 2a and 2b show that, for many chromosomes, the number of miRNA genes included in CNVs follow a linear trend as function of the total length of both CNV_Gains and CNV_Losses. In order to quantify this kind of correlation, values of correlation coefficients were calculated and were found to be larger for CNV_Gains (r = 0.75765) than CNV_Losses (r = 0.32732). This finding is consistent with the larger spread of CNV_Losses compared to CNV_Gains. On the other hand, some chromosomes do not follow the linear trend and are characterized by a large number of miRNA genes associated to CNVs (see Figure 2 and Table 1). For example,chromosome 22 for CNV_Gains and chromosome 2 for CNV_Losses appear as “outliers” since they include a much higher number of miRNA genes in de novo CNVs than expected. However, the latter analysis does not allow us to classify the observed dissimilarities according to a statistically significant criterion.

thumbnail
Figure 2. Correlation graph between the no. of miRNA genes in de novo CNVs and the CNV/Chr.

lengths ratio (Ratio #2) For each chromosome, the number of miRNA genes associated to CNVs is plotted as a function of the fractional length of CNV over the chromosome's size for Gains (a) and Losses (b), respectively. The graphs show that whereas the majority of data points lay very close to the best-fit line, indicating that the two variables are positively correlated, few chromosomes instead behave as outliers in which certain CNVs appear to affect a no. of miRNA genes higher than expected (data used for the graphs were taken from Table 1).

https://doi.org/10.1371/journal.pone.0090947.g002

thumbnail
Table 1. Fractional lengths of miRNA genes and CNVs in relation to chromosome's size.

https://doi.org/10.1371/journal.pone.0090947.t001

Monte Carlo randomization

In order to assess which “outliers” with a high number of overlaps are actually significantly different from the other chromosomes, we developed the SIMCNVMIR programme which implements a numerical analysis procedure for the identification of all instances (i.e., chromosomes) where the number of miRNA genes overlapping de novo CNVs is significantly higher than expected in case of random distribution of (simulated) CNVs. The steps of our computational analysis are reported in Figure 3. The results of this analysis are reported in Figure 4 and Table 2 and show that for chromosomes # 1, 2, and 22 the actual number of miRNA loci affected by de novo CNVs in patients is significantly higher (FDR-adjusted p-values ) than that estimated by the simulated random CNV events. Specifically, CNV_Gains in chromosome # 22 and CNV_Losses in chromosomes 1 and 2 display an over-representation of microRNA genes (see Table 2 and Figures 4). In Table 2, note that for CNV_Loss in chromosome # 2 the number. of “hits” is higher than the number of distinct miRNA genes (“Unique”), implying that the same miRNA gene is involved. Overall, there are 24 miRNA genes overlapping de novo CNVs in the three positive chromosomes (see Table 3). Only two, hsa-mir-4436b-1 and hsa-mir-4436b-2, have not yet been detected in CNVs from the general population (i.e., the DGV database).

thumbnail
Figure 3. Schematic representation of the counting process of miRNA genes included in de novo CNVs of autistic patients.

The small black rectangles close to chromosome are the miRNA genes, whereas the various segments above represent the various CNVs within the chromosome. An “hit” is an overlap between a CNV and a miRNA gene. In a) four “hits” are shown. b) Four examples of random distributions of simulated CNVs within the chromosome, keeping fixed the lenght of each CNV and changing its start/end positions. Clockwise from top left, the numbers of “hits” are 2, 0, 6 and 1, respectively. For each chromosome we carried out 106 simulations. c) Finally, the histograms displaying the relative frequency of miRNA genes included in randomly located CNVs (“hits”) are obtained and the comparison between experimental data and computed Monte Carlo distribution is performed. Red lines correspond to the no. of miRNA genes detected in de novo CNVs from patients. p-values reported in Table 2 are the areas of the histogram to the right side of the red line. (See also Figure 4).

https://doi.org/10.1371/journal.pone.0090947.g003

thumbnail
Figure 4. Histograms displaying the relative frequency of miRNA genes included in randomly located CNVs.

For each chromosome, the SIMCNVMIR program computes the no. of miRNA genes affected by each randomly distributed CNV realizations and plots the frequency distribution corresponding to 106 realizations. The analyses were performed separately for CNV_Gains (a) and CNV_Losses (b).

https://doi.org/10.1371/journal.pone.0090947.g004

thumbnail
Table 2. de novo CNVs from autistic patients with an overrepresented no. of miRNA genes.

https://doi.org/10.1371/journal.pone.0090947.t002

thumbnail
Table 3. List of 24 microRNA genes overrepresented in de novo CNVs.

https://doi.org/10.1371/journal.pone.0090947.t003

Code scripts of the MAPCNVMIR and SIMCNVMIR programmes are available to readers from the following URL: http://fisicaechimica.unipa.it/cnvmirna/

Pathway analysis

The two autism-specific miRNA genes (hsa-mir-4436 b-1 and has-mir-4436b-2), are deleted in a patient who bears one CNV of 332,304 bp. Interestingly, they encode the same 3p and 5p mature microRNAs. According to Mirwalk [33], no validated targets are yet known for these two miRNAs. We therefore performed a functional enrichment analysis on the predicted target mRNAs to highlight the significance of these two microRNAs in ASD pathogenesis. The results of this latter analysis, reported in Table 4, show that several pathways identified by miRPath have been already implicated in autism by previous studies, (in bold in Table 4): Lysine degradation [34], Drug metabolism - cytochrome P450 [35], Notch signaling pathway [31], [36], HIF-1 signaling pathway [37], Vasopressin-regulated water reabsorption [38], Natural killer cell mediated cytotoxicity [39]. 2NSD1 [40] and AMT [41] have been previously identified as autism candidate genes.

thumbnail
Table 4. KEGG pathways enriched for targets of miRNAs hsa-mir-4436b-3p and -5p identified by mirPath1.

https://doi.org/10.1371/journal.pone.0090947.t004

Discussion

In recent years, Copy Number Variants and microRNAs have emerged as potentially important etiological factors in ASD. However, until recently, in nearly all the studies, these two topics have been investigated separately in the context of the research of autism. The pathogenic role of CNVs has been interpreted in terms of their effect on the function of the overlapping protein-coding genes. On the other hand, miRNAs have been studied with the aim of uncovering changes in their level of expression in cells isolated from patients (see the Introduction for references) vs. control cells. In our study the focus is on the potential pathogenic role played by miRNA genes/de novo CNVs, instead. For this reason, we developed a new computational procedure implemented in a Fortran-written programme which allows to detect over-representation of miRNA genes in de novo CNVs in each chromosome. By this computational analysis based on Monte Carlo simulations we found that in positive chromosomes (FDR-adjusted p values , see Table 2 for details) there is a probability of less than 5% to find, by chance, a number of miRNA genes included in CNVs (gain and/or loss) higher than that actually detected in patients. Overall, twenty-four candidate susceptible miRNA genes of autism were identified in our study.

Hereafter, we discuss several potentially critical aspects of this new procedure that may have biased our results and interpretation. Firstly, the results do not appear to be biased by the different distribution of miRNA genes in chromosomes as all chromosomes display very similar miRNA genes length/chromosomes€length ratios. The only exception was chr. 19 which has the highest ratio (), but was not scored positive in a simulation analysis. Secondly, we considered if different ratios between the length of de novo CNVs and the length of the chromosome may have accounted for the detection of positive chromosomes. In general, we would expect the number of miRNA genes duplicated or deleted by a CNV to increase linearly with CNV size. However, such a linearity is not always followed, even for chromosomes displaying very similar ratios. Typical examples of this latter situation include CNV_Gains in chromosome 10 (ratio = 0.0822) vs. CNV_Gains in chromosome 22 (ratio = 0.0893) pair consisting of one negative (chr. 10) and one positive (chr. 22) chromosome. Thirdly, we could not perform a simulation analysis, to be used as a negative control, for CNVs detected in individuals from the general population, since the data stored in the Database of Genomic Variants generally refer to blood donors only and not to their parents, thus preventing ascertainment of de novo CNVs. However, it is worth mentioning here the results of a recent study by Marcinkowska et al [42], which are consistent with our findings. Indeed, these Authors found that miRNA loci are under-represented in highly polymorphic and well-validated CNVs from the general population (i.e., the Database of Genomic Variants). Fourthly, in Table 1 the absence of “hits” for both CNV_Gain and CNV_Loss for chromosomes # 11, 13, 14, Y is simply due to the lack of autistic patients bearing de novo CNVs (Loss or Gain) overlapping miRNA genes (see Table S1). In turn, it can be speculated that, the lack of this type of patient may be ascribed to various factors such as the sample size, the use of low-resolution aCGH platforms, the occurrence of “protective” miRNA loci for autism in these chromosomes. Finally, a more suitable analysis could have involved more homogeneous CNV data from subjects (patients AND unaffected individuals) of the same ethnicity analyzed with aCGH platforms with similar resolution. This was indeed the case for the autistic sample (APL) we have used which was homogenous in relation to ethnicity in that all patients from the APL dataset were “Caucasians” (white north Americans and Europeans). In our study, the use of heterogeneous data concerns instead the different aCGH platforms used (APL and DGV datasets) and the mixed ethnicity of individuals reported in the Database of Genomic Variants. We decided to use such heterogeneous data to increase the chance of collecting a higher number of patients with de novo CNVs. This decision had its strengths and drawbacks. For instance, the use of different aCGH platforms may have caused an under-estimation of the number of CNV/miRNA genes associations in positive chromosomes from the APL dataset. In contrast, the use of samples with mixed ethnicity from the DGV database does not seem to have limited the identification of miRNA genes/CNV association in common between DGV and APL datasets. In conclusion, though the use of heterogeneous CNV data may have limited the identification of additional miRNA/CNV associations, it did not prevent the identification of chromosomes with an enrichment of CNVs overlapping miRNA genes.

In our study, the occurrence of the same 22 deleted or duplicated miRNA genes detected in both patients and unaffected individuals (i.e., DGV) strongly suggest that they are low-penetrant risk factors for autism. Difference in penetrance for such duplicated/deleted miRNA genes would be explained by a variety of factors including: (i) prenatal exposure to enviromental risk factors [43], (ii) presence/absence of functional SNPs in susceptibility protein-coding genes of autism [44], (iii) epistasis [45], (iv) epigenetic factors [6], (v) number and type of protein-coding genes co-existing in different CNVs overlapping the same miRNA. It is reassuring that other CNV studies have linked several miRNA genes from this group to autism, and include: hsa-mir-1306, hsa-mir-185, hsa-mir-1286 and hsa-mir-649 genes [16], [46], hsa-mir-200a and hsa-mir-429 [16], [47], hsa-mir-200b and hsa-mir-149 [16]. Furthermore, hsa-miR-185 displays an 1.44-fold upregulation in lymphoblastoid cell lines from autistic patients [48]. Interestingly, this latter finding is consistent with the presence of 3 copies of the hsa-mir-185 gene in the 2 patients from our APL dataset.

To avoid an over-interpretation of our results we have adopted a stringent, conservative criterion according to which we propose that the 22 miRNA genes shared by unaffected individuals and patients should be considered as provisional candidates miRNA genes in ASD. On the other hand, hsa-mir-4436b-1 and hsa-mir-4436b-2, appear at the present time as strong pathogenic candidates in ASD. Unforunately, no validated targets have yet been identified for these two miRNAs. However, functional annotation analysis carried out on predicted mRNA targets for these two miRNAs revealed, that 43% (6/14) of the statistically significant KEGG pathways obtained with hsa-miR-4436b-3p have been already implicated in autism in previous studies (referenced in Table 4), a finding which supports a pathogenic role for this miRNA.

During the preparation of our manuscript, Vaishnavi et al published an article also addressing the impact of microRNAs present in autism-associated Copy Number Variants [16]. Despite, several differences (methodological, type of CNV data used) distinguishing our study from that of Vaishnavi et al., it is worth noting that 8 miRNA genes have been found in common between the two studies (chr. 1: hsa-mir-429, hsa-mir-200a, hsa-mir-200b; chr. 2: hsa-mir-149; chr. 22: hsa-mir-185, hsa-mir-1306, hsa-mir-1286, hsa-mir-649).

Conclusions

Summing up, positive findings of our study include the identification of 24 miRNA genes over-represented in de novo CNVs from 3 chromosomes. Two miRNA genes from this group, hsa-mir-4436b-1 and hsa-mir-4436b-2, are likely to play a significant pathogenic role in autism since they have not been found in CNVs from unaffected individuals. We hope these results will lead experimental research towards a better understanding on the role played by miRNAs in autism. Finally, we propose that the novel procedure used in this study can be effectively applied to CNV/miRNA genes association data from other genomic disorders beyond autism.

Methods

Data and databases used

Data on de novo CNVs detected in autistic patients were downloaded from three different sources: (i) 71 CNVs from the Autism Chromosome Rearrangements Database [49], [50], (ii) 51 CNVs from Suppl. Table 8 of [10] and (iii) 75 CNVs from Table S1 (document S2) of [13]. Throughout our paper the combined three above sets of data will be named “APL datasets”. CNV (“APL datasets”) data used in this study are reported in the Table S1. CNVs and indels detected in individuals of the general population were downloaded from the Database of Genomic Variants (DGV vers. July 2013) [51]. In our paper this latter dataset is named by the acronym “DGV”. Names, genomic coordinates and chromosomal position for 1,523 human microRNA genes, were obtained from miRBase (vers. 2012). Readers are referred to the article of Griffiths-Jones et al. [52] for an explanation of symbols and nomenclature used for miRNAs and their genes. Genomic coordinates for start and end of CNVs, indels and miRNAs genes were all from Build 37. When necessary, conversion of genomic coordinates between different genome versions was done using the Liftover tool of the UCSC Genome browser [53], [54]. Finally, the list of potentially pathogenic miRNAs was obtained by excluding miRNA genes over-represented in de novo CNVs from patients, but not overlapping CNVs from the Database of Genomic Variants (DGV). miRWalk software was used to look for experimentally validated mRNA targets for miRNAs [33]. miRPath [55], [56] was used to identify statistically significant KEGG pathways enriched in the list of predicted miRNA targets (p<0.05; p-values were corrected to account for the False Discovery Rate).

MAPCNVMIR program (Python)

Data pre-processing included: (i) computation of the total DNA length accounted for by all miRNA genes (L1) and CNVs (L2) in each chromosome; (ii) computation of R1 (R1 = L1/chromosome length) and R2 (R2 = L2/chromosome length); (iii) counts of the number of miRNA loci overall included in distinct or overlapping CNVs (“hits”). This analysis is performed separately for CNV_Gains and CNV_Losses. Thus, for a given chromosome, “hits” may consist of distinct and/or identical miRNA genes associated to CNVs; on the other hand, we indicate as “unique” the distinct miRNA genes overlapping de novo CNVs in patients.

We developed the MAPCNVMIR programme in Python language to achieve a two-fold task: (i) to calculate for each chromosome the total length of DNA corresponding to the de novo CNV regions and (ii) to map the microRNA genes within the de novo CNVs detected in patients using their genomic coordinates (Build 37). In order to achieve the first task this programme considers the overlapping DNA regions of different CNVs once only. The programme first initializes for each chromosome an empty array and put the numeric values corresponding to the first () and last () nucleotide positions (“start” and “end”) of the first CNV (of the total list of CNVs reported in Table S1) into a sub-array. Afterwards, the code considers another CNV and compares its initial () and final () nucleotide positions with those of the first CNV. Three different cases can occur:

  1. the first CNV is totally included in the second one (i.e. and ) and the values inside the sub-array are replaced by these new ones;
  2. the second CNV is partially or totally included in the first one [i.e. ( and ) or ( and ) or ( and )] and the sub-array is composed of the minimum between and and the maximum between and ;
  3. the second CNV does not overlap the first one and in this case a new sub-array with the and is added to the initial array.

Then another CNV is analyzed and its and values are compared with the values of the first and second CNVs and the values of the array are modified according the above-described procedure. This procedure is carried out for all CNVs and finally an array with the “start” and “end” values of non-overlapping DNA regions covered by different CNVs is achieved. The total length of CNVs in each chromosome is the sum of the lengths of the corresponding DNA regions. Regarding the mapping of the microRNA genes within the de novo CNVs detected in patients, for each chromosome, the programme first initializes a variable to zero and then compares the initial and final nucleotide positions of each CNV with the corresponding nucleotide positions of each miRNA gene. Let us name , , , the numeric values corresponding to the first and last nucleotide positions of each CNV and microRNA gene (M), respectively, in a particular chromosome. If the condition and is verified then the count of the number of miRNA genes is increased by one, otherwise the variable remains unchanged. By repeating this procedure for each CNV and each miRNA gene, the total number of microRNA genes overlapping a CNV is obtained for each chromosome.

Correlation analysis

On the data obtained by the MAPCNVMIR programme (see above) we performed an analysis to evaluate possible correlation between the number of miRNA genes overlapping de novo CNVs (“hits”) and the fractional length of CNVs in relation to the size of the chromosome (R2 ratios). Briefly, for each chromosome the number of “hits” was plotted against the R2 ratios, thus obtaining the linear best fit functions and correlation coefficients were calculated. These analyses were performed separately for CNV_Gains and CNV_Losses.

SIMCNVMIR program (FORTRAN)

The SIMCNVMIR programme was developed to perform Monte Carlo randomization analyses, separately for CNV_Gains and CNV_Losses in each chromosome. These analyses included: (i) simulation of random CNV events, (ii) generation of a frequency distribution of “hits” in simulated CNVs, (iii) computation of p-values and FDR-adjusted p-values and (iv) selection of chromosomes displaying over-representation of miRNA genes in de novo CNVs from patients.

The analysis aimed at evaluating an over-representation of miRNA genes in CNVs was carried out by means of a computational simulation procedure implemented by a home-made written FORTRAN code. The null hypothesis underlying our investigation is that the distribution of CNVs within the chromosome is absolutely random, that is, they can occur anywhere throughout the whole length of a chromosome. Therefore, for each chromosome, the sizes (number of nucleotides) of all CNVs reported in the APL datasets (Table S1) are computed and various realizations of random distributions of these CNV regions within each chromosome are simulated. Once a new distribution of CNVs is obtained, the programme then computes the number of miRNA genes overlapping the simulated CNVs. This procedure was repeated 106 times for each chromosome. Data are then plotted as histograms displaying the occurrence frequency of miRNA genes associated to each CNV. These histograms provide information on the number of times a certain number of miRNAs genes are found to overlap CNVs randomly-distributed in each chromosome. These distributions take into account many factors such as the size of the chromosome, the number and positions of miRNA genes inside the chromosomes, and the size of CNVs. For each chromosome, the number of miRNA genes overall associated to the experimentally observed CNVs (i.e., the CNVs reported in the APL datasets) was compared to the corresponding histogram obtained by the simulation. In order to evaluate whether the number of miRNA genes included in de novo CNV in patients is significantly larger than expected with a random distribution of CNVs in each chromosome, we estimated the probability (p-value) of obtaining a number of miRNA genes associated to the simulated CNVs larger than that seen with experimental CNVs. This probability is calculated by summing the area under the histogram for a number of miRNA genes included in CNVs larger than or equal to the experimental value. The p-value is very small if the number of miRNA genes included in experimental CNVs is much larger than the mean value. This means that if the distribution of the CNVs on a given chromosome was random (see the above-mentioned null hypothesis) we would have a low probability of finding a greater number of miRNA genes associated to CNVs. In other words, in autistic patients, CNVs tend to be more frequent in chromosomal regions where the miRNA genes are present than in other regions of the chromosome. The analyses described above have been performed twice: (i) for CNV_Gains and (ii) for CNV_Losses respectively. In our analysis we used a false discovery rate procedure (as multiple hypothesis testing) developed by Benjamini and Hochberg to control the expected proportion of incorrectly rejected null hypotheses [57]. In particular, we exploited a spreadsheet available on-line [58] which calculates FDR-adjusted pvalues from the knowledge of the p-values for the various chromosomes. We set acceptable FDR 0.05 as a maximum (which is the default value of the spreadsheet).

Supporting Information

Table S1.

The APL dataset of de novo CNVs and the overlapping miRNA genes.

https://doi.org/10.1371/journal.pone.0090947.s001

(PDF)

Author Contributions

Conceived and designed the experiments: MM VR. Performed the experiments: MM NNA FC. Analyzed the data: MM VR. Contributed reagents/materials/analysis tools: MM. Wrote the paper: VR.

References

  1. 1. Abrahams B, Geschwind D (2008) Advances in autism genetics: On the threshold of a new neurobiology. Nature Reviews Genetics 9: 341–355.
  2. 2. Elsabbagh M, Divan G, Koh YJ, Kim Y, Kauchali S, et al. (2012) Global Prevalence of Autism and Other Pervasive Developmental Disorders. Autism Research 5: 160–179.
  3. 3. Bailey A, Le Couteur A, Gottesman I, Bolton P, Simonoff E, et al. (1995) Autism as a strongly genetic disorder: Evidence from a British twin study. Psychological Medicine 25: 63–77.
  4. 4. Steffenburg S, Gillberg C, Hellgren L, Andersson L, Gillberg I, et al. (1989) A twin study of autism in Denmark, Finland, Iceland, Norway and Sweden. Journal of Child Psychology and Psychiatry and Allied Disciplines 30: 405–416.
  5. 5. Schaaf C, Zoghbi H (2011) Solving the Autism Puzzle a Few Pieces at a Time. Neuron 70: 806–808.
  6. 6. Miyake K, Hirasawa T, Koide T, Kubota T (2012) Epigenetics in autism and other neurodevelopmental diseases. Advances in Experimental Medicine and Biology 724: 91–98.
  7. 7. LaSalle J (2011) A genomic point-of-view on environmental factors inuencing the human brain methylome. Epigenetics 6: 862–869.
  8. 8. Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, et al. (2007) Strong association of de novo copy number mutations with autism. Science 316: 445–449.
  9. 9. Glessner J, Wang K, Cai G, Korvatska O, Kim C, et al. (2009) Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature 459: 569–572.
  10. 10. Pinto D, Pagnamenta A, Klei L, Anney R, Merico D, et al. (2010) Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466: 368–372.
  11. 11. Sanders S, Ercan-Sencicek A, Hus V, Luo R, Murtha M, et al. (2011) Multiple Recurrent De Novo CNVs, Including Duplications of the 7q11.23 Williams Syndrome Region, Are Strongly Associated with Autism. Neuron 70: 863–885.
  12. 12. Morrow E, Yoo SY, Flavell S, Kim TK, Lin Y, et al. (2008) Identifying autism loci and genes by tracing recent shared ancestry. Science 321: 218–223.
  13. 13. Levy D, Ronemus M, Yamrom B, Lee YH, Leotta A, et al. (2011) Rare De Novo and Transmitted Copy-Number Variation in Autistic Spectrum Disorders. Neuron 70: 886–897.
  14. 14. Cuscó I, Medrano A, Gener B, Vilardell M, Gallastegui F, et al. (2009) Autism-specific cop1y4 number variants further implicate the phosphatidylinositol signaling pathway and the glutamatergic synapse in the etiology of the disorder. Human Molecular Genetics 18: 1795–1804.
  15. 15. Ronemus M, Iossifov I, Levy D, Wigler M (2014) The role of de novo mutations in the genetics of autism spectrum disorders. Nature Reviews Genetics 15: 133–141.
  16. 16. Vaishnavi V, Manikandan M, Tiwary BK, Munirajan AK (2013) Insights on the Functional Impact of MicroRNAs Present in Autism-Associated Copy Number Variants. PLoS one 8: e56781:1–13.
  17. 17. Bartel D (2004) MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell 116: 281–297.
  18. 18. Lee Y, Ahn C, Han J, Choi H, Kim J, et al. (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419.
  19. 19. van den Berg A, Mols J, Han J (2008) RISC-target interaction: Cleavage and translational suppression. Biochimica et Biophysica Acta - Gene Regulatory Mechanisms 1779: 668–677.
  20. 20. Abu-Elneel K, Liu T, Gazzaniga F, Nishimura Y, Wall D, et al. (2008) Heterogeneous dysregulation of microRNAs across the autism spectrum. Neurogenetics 9: 153–161.
  21. 21. Garbett K, Ebert P, Mitchell A, Lintas C, Manzi B, et al. (2008) Immune transcriptome alterations in the temporal cortex of subjects with autism. Neurobiology of Disease 30: 303–311.
  22. 22. Purcell A, Jeon OH, Pevsner J (2001) The Abnormal Regulation of Gene Expression in Autistic Brain Tissue. Journal of Autism and Developmental Disorders 31: 545–549.
  23. 23. Baron C, Liu S, Hicks C, Gregg J (2006) Utilization of lymphoblastoid cell lines as a system for the molecular modeling of autism. Journal of Autism and Developmental Disorders 36: 973–982.
  24. 24. Gregg J, Lit L, Baron C, Hertz-Picciotto I, Walker W, et al. (2008) Gene expression changes in children with autism. Genomics 91: 22–29.
  25. 25. Hu V, Nguyen A, Kim K, Steinberg M, Sarachana T, et al. (2009) Gene expression profiling of lymphoblasts from autistic and nonaffected sib pairs: Altered pathways in neuronal development and steroid biosynthesis. PLoS ONE 4.
  26. 26. Nishimura Y, Martin C, Vazquez-Lopez A, Spence S, Alvarez-Retuerto A, et al. (2007) Genome-wide expression profiling of lymphoblastoid cell lines distinguishes different forms of autism and reveals shared pathways. Human Molecular Genetics 16: 1682–1698.
  27. 27. Chan AW, Kocerha J (2012) The path to microRNA therapeutics in psychiatric and neurodege1n5-erative disorders. Frontiers in Genetics 3.
  28. 28. Sarachana T, Zhou R, Chen G, Manji H, Hu V (2010) Investigation of post-transcriptional gene regulatory networks associated with autism spectrum disorders by microRNA expression profiling of lymphoblastoid cell lines. Genome Medicine 2.
  29. 29. Talebizadeh Z, Butler M, Theodoro M (2008) Feasibility and relevance of examining lymphoblastoid cell lines to study role of microRNAs in autism. Autism research : official journal of the International Society for Autism Research 1: 240–250.
  30. 30. Abu-Elneel K, Liu T, Gazzaniga FS, Nishimura Y, Wall DP, et al. (2008) Heterogeneous dysregulation of micrornas across the autism spectrum. Neurogenetics 9: 153–161.
  31. 31. Ghahramani Seno MM, Hu P, Gwadry FG, Pinto D, Marshall CR, et al. (2011) Gene and mirna expression profiles in autism spectrum disorders. Brain research 1380: 85–97.
  32. 32. Kalos M, Whitlock P (2008) Monte Carlo Methods. Wiley.
  33. 33. Dweep H, Sticht C, Pandey P, Gretz N (2011) MiRWalk - Database: Prediction of possible miRNA binding sites by “walking” the genes of three genomes. Journal of Biomedical Informatics 44: 839–847.
  34. 34. James S, Shpyleva S, Melnyk S, Pavliv O, Pogribny I (2013) Complex epigenetic regulation of engrailed-2 (en-2) homeobox gene in the autism cerebellum. Translational psychiatry 3: e232.
  35. 35. Correia C, Almeida J, Santos P, Sequeira A, Marques C, et al. (2009) Pharmacogenetics of risperidone therapy in autism: association analysis of eight candidate genes with drug efficacy and adverse drug reactions. The pharmacogenomics journal 10: 418–430.
  36. 36. Griswold AJ, Ma D, Cukier HN, Nations LD, Schmidt MA, et al. (2012) Evaluation of copy number variations reveals novel candidate genes in autism spectrum disorder-associated pathways. Human molecular genetics 21: 3513–3523.
  37. 37. Burstyn I, Wang X, Yasui Y, Sithole F, Zwaigenbaum L (2011) Autism spectrum disorders and fetal hypoxia in a population-based cohort: Accounting for missing exposures via estimation-maximization algorithm. BMC medical research methodology 11: 2.
  38. 38. Miller M, Bales KL, Taylor SL, Yoon J, Hostetler CM, et al. (2013) Oxytocin and vasopressin 1in6 children and adolescents with autism spectrum disorders: Sex differences and associations with symptoms. Autism Research
  39. 39. Bressler JP, Gillin PK, O'Driscoll C, Kiihl S, Solomon M, et al. (2012) Maternal antibody reactivity to lymphocytes of offspring with autism. Pediatric neurology 47: 337–340.
  40. 40. Buxbaum JD, Cai G, Nygren G, Chaste P, Delorme R, et al. (2007) Mutation analysis of the nsd1 gene in patients with autism spectrum disorders and macrocephaly. BMC medical genetics 8: 68.
  41. 41. Yu TW, Chahrour MH, Coulter ME, Jiralerspong S, Okamura-Ikeda K, et al. (2013) Using whole-exome sequencing to identify inherited causes of autism. Neuron 77: 259–273.
  42. 42. Marcinkowska M, Szymanski M, Krzyzosiak W, Kozlowski P (2011) Copy number variation of microRNA genes in the human genome. BMC Genomics 12.
  43. 43. Gardener H, Spiegelman D, Buka S (2009) Prenatal risk factors for autism: Comprehensive meta-analysis. British Journal of Psychiatry 195: 7–14.
  44. 44. Sanders S, Murtha M, Gupta A, Murdoch J, Raubeson M, et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 484: 237–241.
  45. 45. Coutinho A, Sousa I, Martins M, Correia C, Morgadinho T, et al. (2007) Evidence for epistasis between SLC6A4 and ITGB3 in autism etiology and in the determination of platelet serotonin levels. Human Genetics 121: 243–256.
  46. 46. Xu B, Karayiorgou M, Gogos JA (2010) Micrornas in psychiatric and neurodevelopmental disorders. Brain research 1338: 78–88.
  47. 47. Qiao Y, Badduke C, Mercier E, Lewis SM, Pavlidis P, et al. (2013) mirna and mirna target genes in copy number variations occurring in individuals with intellectual disability. BMC genomics 14: 544.
  48. 48. Sarachana T, Zhou R, Chen G, Manji HK, Hu VW (2010) Investigation of post-transcriptional gene regulatory networks associated with autism spectrum disorders by microrna expression profiling of lymphoblastoid cell lines. Genome Med 2: 23.
  49. 49. Marshall C, Noor A, Vincent J, Lionel A, Feuk L, et al. (2008) Structural Variation of Chromosomes in Autism Spectrum Disorder. American Journal of Human Genetics 82: 477–488.
  50. 50. Autism Chromosome Rearrangements Database (version 2012). Available: http://projects.tcag.ca/autism/1.7
  51. 51. Zhang J, Feuk L, Duggan G, Khaja R, Scherer S (2006) Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenetic and Genome Research 115: 205–214.
  52. 52. Griffiths-Jones S, Grocock R, van Dongen S, Bateman A, Enright A (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic acids research 34: D140–144.
  53. 53. James Kent W, Sugnet C, Furey T, Roskin K, Pringle T, et al. (2002) The human genome browser at UCSC. Genome Research 12: 996–1006.
  54. 54. Liftover Accessed June 2012 (2002). Tool of the UCSC Genome browser. Available: http://genome.ucsc.edu/cgi-bin/hgLiftOver.
  55. 55. Vlachos IS, Kostoulas N, Vergoulis T, Georgakilas G, Reczko M, et al. (2012) Diana mirpath v. 2.0: investigating the combinatorial effect of micrornas in pathways. Nucleic acids research 40: W498–W504.
  56. 56. miRPath version 20 (2012) Available: http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=mirpath/index. Accessed 2013 Dec.
  57. 57. Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics 25: 60–83.
  58. 58. Pike N. Spreadsheet for calculating the FDR-adjusted p-values starting from p-values. Available: http://users.ox.ac.uk/~npike/#programs. Accessed 2013 Oct 26.