ANNOTATION OF 3' SIGNALS IN THE YEAST GENOME

New BioRS query

1. STATISTICAL PREDICTIONS

From the analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes and from 1352 yeast EST clones, it is possible to identify several patterns (six-mer oligonucleotides, named 'words') that are likely to play a role as 3' end signals. Each kind of signals consists of a series of variants. The degeneracy of the signals probably results from a flexibility in the specificity in the DNA-protein interactions (van Helden et al. 2000. NAR 28: 1000-1010).

According to their positional profile similarities and by cluster analysis, words with similar sequences appear clustered together, reinforcing the hypothesis of their common function. Analysis was done using the stop codon as reference point. Thus, for the set of all yeast genes two main clusters appear:

The analysis of the EST sequences, done using the poly A site as reference point, yielded mainly three kind of signals:

There is a strong similarity between signals D1 and E1, as well as between signals D2 and E2. Thus, the strong peak around +35 bp from the stop codon (for D1 signals) would correspond on average to a peak at -40 from the poly(A) site distance (for E1 signals). Likewise, the D2 signals with a peak around +55 would correspond to the peak located at around -60 for E2 signals. Therefore, we would be looking at the same signal with each of the pairs D1/E1 and D2/E2. Since that signal correspondence, the signals were clustered by positioning identity as follow (Signal Type):

This signal appeared both as E1 and as D2 signals in the analysis.
In some cases there are overlapping positions, but they correspond to different signals (different WORD composition), that could constitute a single signal, although is not possible to ensure it without experimental data.

2. ANNOTATION OF POLY A SITES

Most of these sites were obtained from the EST database of the TIGR and published by Graber et al. (1999), Nucleic Acids Res. 27: 888-894.
Others have been obtained from original papers.
For the polyA sites, distance from STOP codon (PolyA_UTR_Length) as well as chromosome location (PolyA_Position) are indicated. Since the exact chromosome location of polyA is very difficult to be determined, small differences (few nucleotides) in the exact chromosome location could be observed.

3. EXPERIMENTALLY DETERMINED REGULATORY ELEMENTS

Regulatory elements, such as Efficiency Element (EE) and Positioning Element (PE), are known only for a very small number of genes for which experimental data have been obtained. Numbers indicate nucleotide position in relation to STOP codon.

New BioRS query

Last modified: Tue Feb 10 10:55:52 CET 2004