Over representation of polyT residues at critical sites

PolyT tracts are known to occur in metazoan introns [Lorkovic et al., 2000], as well as in S. pombe and S. cerevisiae introns [Spingola et al., 1999; Lopez & Seraphin, 1999; Kaufer & Potashkin, 2000], notably in the region between the branch site and the 3' splice site (S2). The role of T-rich sequences in intron recognition has been well documented in metazoan. Such sequences are essential for boosting efficient splicing and may help to distinguish appropriate splice sites from cryptic sites [Lorkovic et al., 2000].

These patterns were searched using the program position-analysis [van Helden et al., NAR, 2000] in the seven yeast species [J. van Helden, unpublished results]. This program takes as input a set of sequences aligned on some reference points (in our case, the branch motif and the 3' motif respectively), and counts the number of occurrences as a function of the position relative to this reference point. Note that intron sequences were clipped, i.e. splice site motifs were hidden, to avoid any bias in the analyses. For each oligonucleotide, the observed positional profile is compared to the homogenous distribution, on the basis of the (2 statistics. The program returns oligonucleotides for which the observed positional distribution significantly discards from the homogenous distribution. All sizes of oligonucleotides from 1 to 7 were analysed. Positions were regrouped by different values of class intervals (5, 10 and 20 respectively), to obtain a sufficient number of occurrences in each class.

The analyses confirmed that S. cerevisiae introns have an over-representation of polyT elements (n=4) at strategic sites, i.e. not only in the S2 region (with a class interval of 20, (2 observed = 208 with 14 degrees of freedom, P-value of 1.16e-36) as previously reported [Spingola et al., 1999; Lopez & Seraphin, 1999] but also in the immediate region upstream from the branch site (with a class interval of 20, (2 observed = 25.75 with 4 degrees of freedom, P-value of 3.6e-5) [BON et al., 2003]. These patterns mainly consist in polyT islands (up to 15 residues) of varying lengths spaced by a C or A residues, like the pattern 'ATTTTATTCCTTTTTTTTTTT' found upstream the 3' splice motif of the YPL143w intron [BON et al., 2003]. It is clear however that this bias is not present in all S. cerevisiae introns and that some introns are purine rich while other exhibit a balanced pyrimidine/purine content in this region.

Although not clearly demonstrated, such bias is strongly suspected in the 'Saccharomyces-Kluyveromeces' species and in C. tropicalis on the basis of visual investigations [BON et al., 2003]. However, additional sequencing data are required to prove that such bias is statistically significant in the other hemiascomycetous species.

This suggests that yeast and metazoan introns may share more similarities than previously assumed.

Last modified: Tue Feb 10 11:04:51 CET 2004