The sequences for Human (H. sapiens, X04215), Rat (R. norvegicus, M10270), Mouse (M. musculus, M29240), Chicken (G. gallus, X04213), Fly (D. melanogaster, K03096), SC (S. cerevisiae, M16510), SB (S. bayanus, U03476), and SP (S. pombe, X15504) are taken from NCBI . The sequences for ZR (Z. rouxii, AR0AA020B08CP1), KL (K. lactis, BA0AB004C05LP1), and KM (K. marxianus, AZ0AA010E03T1r ) were extracted from the Génolevures RST libraries by using the SC U5 sequence as bait [Bon et al., 2003]. Note that the S. cerevisiae U5 molecule is found in two forms, the shorter of which ends at 179 nt and thus lacks the stem/loop II [Guthrie & Patterson, Annu. Rev. Genet., 1988, 22, 387-419]. The long form ends at 214 nt and is shown in the CLUSTALX multiple alignment.
The RNA has been divided into two domains according to Guthrie & Patterson [Guthrie & Patterson, 1988]: 5' and 3' terminal. Functional sites like the IBP-box of the 5' terminal domain which is the region recognising the intron 3' splice site by base pairing, and the SmBP-box which is the region of the 3' terminal domain that base pairs to the Sm proteins, are indicated.
The size and general organisation of U5 is highly conserved in eukaryotic organisms varying between 113 (H. sapiens) to 147 (S. pombe) nucleotides. The situation is not the same in the hemiascomycetous yeasts which have larger U5 snRNA with several supplementary domains which are phylogenetically variable and distributed all along the RNA sequence [Guthrie & Patterson, 1988; Bon et al., 2003].
(r) Reverse and complement sequence