The sequences for Human (H. sapiens, M15957), Rat (R. norvegicus, K00782), Fly (D. melanogaster, D00043), Nematode (C. elegans, X07828), Chicken (G. gallus, M14136), SC (S. cerevisiae, M17238) are taken from NCBI . The sequences for SK (S. kluyveri, AU0AA006A07T1), DH (D. hansenii, BC0AA001A12T2), and YL (Y. lipolytica, AW0AA001A07Dr) were extracted from the Génolevures RST libraries and the sequence for CA (C. albicans, Contig6_2246r) from the Stanford's C. albicans sequence assembly 6 by using the SC U4 sequence as bait [Bon et al., 2003]. Note that only partial sequences are available for DH.
The snRNAs have been divided into three domains according to Guthrie & Patterson [Guthrie & Patterson, Annu. Rev. Genet., 1988, 22, 387-419]: 5' and 3' terminal, and central. Functional sites like the U6BP-box of the 5' terminal domain which is the region that base pairs to the U6 snRNA, and the SmBP-box which is the domain of the 3' terminal domain that base pairs to the Sm proteins, are indicated.
Unlike U1, U2, and U5, the size and general organisation of U4 is highly conserved in all organisms, probably reflecting evolutionary constraints imposed by the interaction between U4 and U6 [Guthrie & Patterson, Annu. Rev. Genet., 1988, 22, 387-419] as evidenced the high conservation of the U6 base-pairing site sequence 'TGCTPuPuTT' in the U4 snRNAs homologues.
(r) Reverse and complement sequence