Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Caenorhabditis elegans Operons Contain a Higher Proportion of Genes with Multiple Transcripts and Use 3′ Splice Sites Differentially

  • Fei Wang,

    Affiliation State Key Laboratory of Medical Genetics, Central South University, Changsha, China

  • Shi Huang,

    Affiliation State Key Laboratory of Medical Genetics, Central South University, Changsha, China

  • Long Ma

    longmace@gmail.com

    Affiliation State Key Laboratory of Medical Genetics, Central South University, Changsha, China

Abstract

RNA splicing generates multiple transcript isoforms from a single gene and enhances the complexity of eukaryotic gene expression. In some eukaryotes, operon exists as an ancient regulatory mechanism of gene expression that requires strict positional and regulatory relationships among its genes. It remains unknown whether operonic genes generate transcript isoforms in a similar manner as non-operonic genes do, the expression of which is less likely limited by their positions and relationships with surrounding genes. We analyzed the number of transcript isoforms of Caenorhabditis elegans operonic genes and found that C. elegans operons contain a much higher proportion of genes with multiple transcript isoforms than non-operonic genes do. For genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and non-operonic genes. C. elegans operonic genes also have a different preference of the 20 most common 3′ splice sites compared to non-operonic genes. Our analyses suggest that C. elegans operons enhance expression complexity by increasing the proportion of genes that express multiple transcript isoforms and maintain splicing efficiency by differential use of common 3′ splice sites.

Introduction

RNA splicing generates multiple transcript isoforms from a single gene and is believed to be a driving force for biological complexity in evolution [1], [2]. In C. elegans, over 13% of genes are alternatively spliced [3]. In human, most genes are alternatively spliced [4], [5], [6]. Compared to RNA splicing, operons provide a different regulatory form of gene expression. An operon is a cluster of genes that are transcribed from a single promoter and controlled by the same regulatory sequences [7]. Operons exist abundantly in prokaryotes and are also found in eukaryotes, which include the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and some mammals [7], [8]. In C. elegans, it was initially estimated that there were 15% of genes in about 1000 operons with an average of 2.8 genes per operon [9], [10]. Recently the number of annotated operons in the C. elegans genome has increased to approximately 1250 (Wormbase Release 205), which gives an average of 2.3 genes per operon considering the number of operonic genes remains largely unchanged (around 2880, see the Results). In C. elegans, genes in an operon form a closely-spaced cluster with an ∼100 bp intergenic distance [10]. However it is not known how operonic genes increase expression complexity, e.g., by RNA splicing, to adjust to the pressure of evolution and at the same time maintain their positional and regulatory relationships. C. elegans has a large number of operonic genes that are alternatively spliced, which provides an interesting model to understand the relationship between operons and RNA splicing.

Results

We examined the average number of transcript isoforms per gene for genes of the whole genome, for all non-operonic genes and for all operonic genes. As shown in Figure 1A, non-operonic genes had about 1.26 transcript isoforms per gene, which was similar to the average of 1.31 transcript isoforms per gene for the whole genome. Operonic genes had 1.68 transcript isoforms per gene, which was over 30% more than that of the non-operonic genes.

thumbnail
Figure 1. C. elegans operons contain a higher proportion of genes that express multiple transcript isoforms.

(A) C. elegans operonic genes express more transcript isoforms per gene than non-operonic genes do. (B) C. elegans operons contain a higher proportion of genes that express multiple transcript isoforms than non-operonic genes do. (C) Alternatively spliced C. elegans operonic genes and non-operonic genes have a similar number of transcript isoforms per gene. Z-test was performed (Figure 1A and 1C) to evaluate the significance of difference between the means of transcript numbers. Error bars represent standard deviations.

https://doi.org/10.1371/journal.pone.0012456.g001

One reason that operonic genes have more transcript isoforms per gene than non-operonic genes do is that operons may contain a higher proportion of genes that generate multiple transcript isoforms. Indeed, about 40% of all operonic genes have multiple transcript isoforms (Figure 1B and Table 1). However, only 14% and 17% of non-operonic genes and all genes, respectively, have multiple transcript isoforms (Figure 1B and Table 1). We next examined whether there is any difference in the average number of isoforms for genes that have multiple transcript isoforms. For all such non-operonic genes, there were about 2.81 isoforms per gene. For all such operonic genes, there were 2.71 isoforms (Figure 1C). For all genes of the whole genome, this number was 2.78, which was similar to that of operonic and non-operonic genes (Figure 1C). These results suggest that alternatively spliced operonic and non-operonic genes do not differ apparently in generating transcript isoforms. Therefore, operonic genes may utilize the splicing machinery as efficiently as non-operonic genes do to enhance their expression complexity.

thumbnail
Table 1. The numbers of genes and transcripts we analyzed.

https://doi.org/10.1371/journal.pone.0012456.t001

To investigate whether operonic introns utilize 3′ splice sites differently from non-operonic introns, we analyzed the nucleotide sequences of position −7 to −1 of C. elegans introns. This sequence (3′ splice site) is recognized by the splicing factors U2AF large and small subunits and plays important roles in regulating splicing efficiency and alternative splicing [11], [12], [13], [14]. Among all 3′ splice sites, the top 20 most commonly used sites were found in over 80% of introns (Table 2), suggesting that these sites are responsible for the splicing of the majority of introns. As shown in Figure 2, operonic introns use ttttcag, atttcag, tttccag and tttgcag significantly more frequently than non-operonic introns do, in which the frequency of tttgcag usage in operonic introns increased over 30% compared to that in non-operonic introns. 16 sites were used equally or less frequently in operonic introns. Among them, the frequencies of tttttag, gtttcag, ctttcag, attttag and tgttcag were significantly reduced compared to that of non-operonic introns.

thumbnail
Figure 2. Common 3′ splice sites are used differentially by C. elegans operonic genes.

The proportions of each 3′ splice site (X axis) of operonic and non-operonic genes were compared to that of all genes of the whole genome and were presented as fold changes (Y axis). Pairwise Z-test was performed (see Table 2) to evaluate the significance of difference between the proportions of each 3′ splice site in operonic genes and non-operonic genes. *: p≤0.01.

https://doi.org/10.1371/journal.pone.0012456.g002

thumbnail
Table 2. The proportions and numbers of the 20 most frequently used 3′ splice sites in different groups of genes.

https://doi.org/10.1371/journal.pone.0012456.t002

Discussion

It is a challenge for operonic genes to increase expression complexity and maintain splicing efficiency while keeping strict positional and regulatory relationships. C. elegans operons may achieve these goals by at least two approaches. First, C. elegans operons significantly increase the proportion of genes that express multiple transcript isoforms (Figure 1). However, for genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and non-operonic genes. This result suggests that C. elegans operons are more permissive for their genes to increase expression complexity by RNA processing than non-operonic genes are. By increasing the proportion of genes that express multiple transcript isoforms, C. elegans operons may compensate for a more strict transcriptional regulation and achieve the goal of expression complexity. Alternatively, C. elegans operonic genes may be under more pressure evolutionarily to enhance their transcript complexity, e.g., in order to perform more complex biological functions. Second, C. elegans operonic genes use four of the 20 most abundant 3′ splice sites (ttttcag, atttcag, tttccag and tttgcag) more frequently and use the other 3′ splice sites equally or less frequently (Figure 2). The differential usage of common 3′ splice sites may help maintain efficient splicing of operonic genes, which are often highly expressed and have essential biological functions [9], [10]. The differential usage of common 3′ splice sites by operonic genes is also consistent with the notion that transcription and RNA splicing are coupled processes [1], [2]. Compared to individual genes, it is plausible that the coupling of transcription and splicing of multiple genes in an operon presents a more challenging task for the splicing machinery, which may favor those 3′ splice sites that optimize the splicing process and result in a differential use of common 3′ splice sites by operonic genes.

The expression of transcript isoforms by C. elegans operonic genes may also depend on other regulatory mechanisms, e.g., by using different splicing silencers or enhancers and by generating alternative 5′ and 3′ untranslated regions (UTRs). Further analysis of these possibilities will provide a more comprehensive picture about the expression complexity of C. elegans operonic genes.

Methods

We downloaded C. elegans gene names and annotated transcripts from the WormMart (WormBase Release 195) as html files. The data were processed using MS Excel to identify genes with different number of transcripts. Non-operonic genes were identified by deducting operonic genes from all genes of the whole genome. A random examination of over 100 operonic genes that are annotated to have multiple transcript isoforms indicates that the isoforms for each gene share at least one coding exon.

The total number of each analyzed 3′ splice site (positions −7 to −1) for the whole genome was obtained from the Intronerator (http://genome-test.cse.ucsc.edu/Intronerator/) [15]. We downloaded 16,087 unique operonic intron sequences from WormMart (WormBase Release 195) and processed the sequences using a software written in the C programming language and Microsoft Excel. Identical 3′ splice sites (positions −7 to −1) are grouped and the proportion of each site is determined. The number of each 3′ splice site for non-operonic genes was obtained by deducting the number of the same site for operonic genes from the number for the whole genome. The online calculator for pairwise Z-test analysis is found at http://www.dimensionresearch.com/resources/calculators/ztest.html.

Acknowledgments

We thank Linfeng Xia for processing C. elegans intron sequences.

Author Contributions

Analyzed the data: FW LM. Contributed reagents/materials/analysis tools: FW. Wrote the paper: SH LM.

References

  1. 1. Graveley BR (2001) Alternative splicing: increasing diversity in the proteomic world. Trends Genet 17: 100–107.
  2. 2. Maniatis T, Tasic B (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418: 236–243.
  3. 3. Zahler AM (2005) Alternative splicing in C. elegans. WormBook.
  4. 4. Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, et al. (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 302: 2141–2144.
  5. 5. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40: 1413–1415.
  6. 6. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456: 470–476.
  7. 7. Blumenthal T (1998) Gene clusters and polycistronic transcription in eukaryotes. Bioessays 20: 480–487.
  8. 8. Blumenthal T (2004) Operons in eukaryotes. Brief Funct Genomic Proteomic 3: 199–211.
  9. 9. Blumenthal T, Evans D, Link CD, Guffanti A, Lawson D, et al. (2002) A global analysis of Caenorhabditis elegans operons. Nature 417: 851–854.
  10. 10. Blumenthal T, Gleason KS (2003) Caenorhabditis elegans operons: form and function. Nat Rev Genet 4: 112–120.
  11. 11. Hollins C, Zorio DA, MacMorris M, Blumenthal T (2005) U2AF binding selects for the high conservation of the C. elegans 3' splice site. RNA 11: 248–253.
  12. 12. Kent WJ, Zahler AM (2000) Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. Genome Res 10: 1115–1125.
  13. 13. Ma L, Horvitz HR (2009) Mutations in the Caenorhabditis elegans U2AF large subunit UAF-1 alter the choice of a 3' splice site in vivo. PLoS Genet 5: e1000708.
  14. 14. Zhang H, Blumenthal T (1996) Functional analysis of an intron 3' splice site in Caenorhabditis elegans. RNA 2: 380–388.
  15. 15. Kent WJ, Zahler AM (2000) The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans. Nucleic Acids Res 28: 91–93.