ANNOTATION OF 3' SIGNALS IN THE YEAST GENOME
New BioRS query
1. STATISTICAL PREDICTIONS
From the analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes and from 1352 yeast EST clones, it is possible to identify several patterns (six-mer oligonucleotides, named 'words') that are likely to play a role as 3' end signals. Each kind of signals consists of a series of variants. The degeneracy of the signals probably results from a flexibility in the specificity in the DNA-protein interactions (van Helden et al. 2000. NAR 28: 1000-1010).
According to their positional profile similarities and by cluster analysis, words with similar sequences appear clustered together, reinforcing the hypothesis of their common function. Analysis was done using the stop codon as reference point. Thus, for the set of all yeast genes two main clusters appear:
The analysis of the EST sequences, done using the poly A site as reference point, yielded mainly three kind of signals:
- D1, including TATATA-like elements, showing a strong peak around +35 from the stop codon
- D2, including TTTTTT-like elements and with a peak around +55 from the stop codon
There is a strong similarity between signals D1 and E1, as well as between signals D2 and E2. Thus, the strong peak around +35 bp from the stop codon (for D1 signals) would correspond on average to a peak at -40 from the poly(A) site distance (for E1 signals). Likewise, the D2 signals with a peak around +55 would correspond to the peak located at around -60 for E2 signals. Therefore, we would be looking at the same signal with each of the pairs D1/E1 and D2/E2. Since that signal correspondence, the signals were clustered by positioning identity as follow (Signal Type):
- E1, including TATATA and many single-base substitutions and single-base shifts. This cluster strongly overlaps with the signal D1 extracted from downstream sequences. It shows a broad peak between -50 and -30 bp relative to the poly A site.
- E2, including T-rich words. This cluster has many words in common with D2. These words show a bimodal distribution with a strong peak at the poly A site, and a second upstream peak around -60 from it, separated by a valley at -25 bp.
- E3, contains a series of A-rich words that show an sharp peak 25 bp upstream of the cleavage site.
This signal appeared both as E1 and as D2 signals in the analysis.
- S1, comprises signals (3-Prime-Signal) D1 and E1 (AAATAG, ACATAC, ACATAT, ATAAAT, ATACAT, ATAGAT, ATATAC, ATATAT, ATATGT, ATCTAT, ATGTAC, ATGTAT, ATGTGT, ATTTAT, CATATA, GTAAAT, GTATAC, GTATAT, GTATGT, TAAATA, TAAGTA, TAATTA, TACATA, TACGTA, TAGATA, TAGTTA, TATACA, TATATA, TATCTA, TATGTA, TATTTA*, TGTACA, TGTATA, TGTGTA, TTAAAT)
- S2, comprises signals (3-Prime-Signal) D2 and E2 (ATTATT, TAATTG, TAGTTT, TATTAT, TATTCT, TATTTA*, TCATTT, TCTATT, TTAATT, TTATTA, TTATTC, TTATTT, TTCATT, TTTATT, TTTCTT, TTTTCT, TTTTGT, TTTTTT)
- S3, comprises signals (3-Prime-Signal) E3 (AAAAAA, AAATAA, AATAAA, AATAGA, AATTAA, AGTTAA)
In some cases there are overlapping positions, but they correspond to different signals (different WORD composition), that could constitute a single signal, although is not possible to ensure it without experimental data.
2. ANNOTATION OF POLY A SITES
Most of these sites were obtained from the EST database of the TIGR and published by Graber et al. (1999), Nucleic Acids Res. 27: 888-894.
Others have been obtained from original papers.
For the polyA sites, distance from STOP codon (PolyA_UTR_Length) as well as chromosome location (PolyA_Position) are indicated. Since the exact chromosome location of polyA is very difficult to be determined, small differences (few nucleotides) in the exact chromosome location could be observed.
3. EXPERIMENTALLY DETERMINED REGULATORY ELEMENTS
Regulatory elements, such as Efficiency Element (EE) and Positioning Element (PE), are known only for a very small number of genes for which experimental data have been obtained. Numbers indicate nucleotide position in relation to STOP codon.
New BioRS query
Last modified: Tue Feb 10 10:55:52 CET 2004