Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution
Figure 3
Variation in 3′ splice site features are associated with differences in NAGNAG splicing.
(A) A simple biophysical model of NAGNAG splicing accurately models mean isoform usage across tissues as a function of difference in 3′ splice site score. Each point represents a single human NAGNAG, and the solid and dashed black lines show the mean ψ (across values for individual NAGNAGs with similar splice site score difference, with sliding window of 3.25 bits) and the standard deviation about the mean. The solid red line shows the prediction based on the model for parameters Q = 0.55 and B = 0.58, and the dashed red line indicates the standard deviation about the model mean expected from measurement error. The horizontal and vertical dashed lines indicate the splice site score difference (approximately 1 bit) at ψ = 50%. (B) The −3 bases largely determine whether a NAGNAG is alternatively spliced. We grouped NAGNAGs in the human genome according to their −3 bases and computed the fraction of each group which expressed the proximal (black) or distal (blue) isoform at ≥5% in at least one tissue. (C) Constitutive 3′ splice sites (top, YAG), YAGYAGs which express the proximal isoform at ≥75% in all tissues (middle, YAGYAG proximal major), YAGYAGs which express the distal isoform at ≥75% in all tissues (middle, YAGYAG distal major), and strongly regulated YAGYAGs (bottom, YAGYAG strongly regulated) all exhibit distinct upstream sequence preferences. The x-axis shows the position relative to the 3′ splice site (YAG) or proximal 3′ splice site (YAGYAG), and arrows indicate the 3′ splice site that is predominantly used. Figure was created with WebLogo [53]. Human and mouse YAGYAGs were grouped together to increase the statistical signal for (C–F). (D) Distal major YAGYAGs have shorter polypyrimidine tracts (p<0.001 relative to proximal major class, Kolmogorov-Smirnov test). Plot shows median length of the polypyrimidine tract, estimated as the first stretch of ≥5 consecutive pyrimidines upstream of the −3 position. Error bars indicate the standard deviation of the median, estimated by bootstrapping (the error bars for “CJ” were too small to be visible). (E) Distal major YAGYAGs have higher CT and TC dinucleotide content (p<0.005 relative to proximal major class, Kolmogorov-Smirnov test). Median CT and TC dinucleotide content of the polypyrimidine tract, computed as the fraction of the polypyrimidine tract composed of CT dinucleotides, with an optional T at the beginning or C at the end. Error bars indicate the standard deviation of the median, estimated by bootstrapping. (F) The AG exclusion zone [57] is more distally located in distal major YAGYAGs (p<0.001 relative to proximal major class, Kolmogorov-Smirnov test). Position of the first AG dinucleotide upstream of the −15 position is shown. Thick bars indicate the median positions, and boxes extend from the first to third quartiles.