Assessing the Extent of Substitution Rate Variation of Retrotransposon Long Terminal Repeat Sequences in Oryza sativa and Oryza glaberrima
© Springer Science + Business Media, LLC 2010
Received: 26 February 2010
Accepted: 8 July 2010
Published: 31 July 2010
Long Terminal Repeat retrotransposons (LTR-RTs) are a major component of several plant genomes. Important insights into the evolutionary dynamics of these elements in a genome are provided by the comparative study of their insertion times. These can be inferred by the comparison of pairs of LTRs flanking intact LTR-RTs in combination with an estimated substitution rate. Over the past several years, different substitution rates have been proposed for LTRs in crop plants. However, very little is known about the extent of substitution rate variation and the factors contributing to this variation, so the rates currently used are generally considered rough estimators of actual rates. To evaluate the extent of substitution rate variation in LTRs, we identified 70 orthologous LTRs on the short arms of chromosome 3 of both Oryza sativa and Oryza glaberrima, species that diverged ∼0.64 Ma. Since these orthologous sequences were present in a common ancestor prior to species divergence, nucleotide differences identified in comparing these regions must correspond to mutations accumulated post-speciation, thereby giving us the opportunity to study LTR substitution rate variation in different elements across these short arms. As a control, we analyzed a similar amount of non-repeat-related sequences collected near the orthologous LTRs. Our analysis showed that substitution rate variation in LTRs is greater than 5-fold, is positively correlated with G+C content, and tends to increase near centromeric regions. We confirmed that in the vast majority of cases, LTRs mutate faster than their corresponding non-repeat-related neighboring sequences.
Long Terminal Repeat (LTR) retrotransposons (RTs) are widespread and ubiquitous in the plant kingdom (Flavell et al. 1992; Voytas et al. 1992; Suoniemi et al. 1998) where they constitute significant portions of many genomes (Feschotte et al. 2002; IRGSP 2005; Tuskan et al. 2006; Jaillon et al 2007; Zuccolo et al. 2007; Ming et al. 2008). They contribute actively to genome size variation (Hawkins et al. 2006; Piegu et al. 2006; Neumann et al. 2006; Ammiraju et al. 2007) and gene expression (Varagona et al. 1992; Leprinc et al. 2001; Kashkush et al. 2003) and are involved in genome rearrangements (Ma et al. 2005). Because of their potentially mutagenic effects, transposable elements (TEs) are strictly regulated by epigenetic silencing mechanisms (Lisch 2009) and are major targets for DNA methylation in plant genomes (Bender 2004).
Useful information about the “history” of LTR-RTs in a genome is provided by the comparative study of their insertion times in the host genome that can be inferred from the comparison between the two LTRs from each individual element. Due to the mechanisms of retrotranscription and insertion, LTR-RTs contain two identical LTRs at the moment of insertion (Lewin 1997). Over time, each LTR in a pair accumulate independent mutations and diverge (SanMiguel et al. 1998). Thus, sequence comparison between pairs of divergent LTRs allows one to estimate when an insertion in the host genome occurred when combined with an appropriate substitution rate.
Although substitution rates have been made for genes (Gaut et al. 1996) and LTR-RTs (Ma and Bennetzen 2004; Vitte et al. 2004), very little is known about the extent of LTR-RT substitution rate variation as well as the factors contributing to this variation. To better understand substitution rate variation associated with LTR-RTs and factors contributing to such variation, we took advantage of a unique plant within-genus sequence data set of chromosome 3 short arms from two cultivated Oryza species—Oryza sativa ssp. japonica (Asian rice) and Oryza glaberrima (West African rice). The length of the two chromosome 3 short arms studied is 17,111,432 bp in the case of O. glaberrima and 19,401,704 in that of O. sativa. We identified all orthologous LTR insertions between O. sativa and O. glaberrima, inferred their substitution rates, and then compared these rates with flanking sequences unrelated to TEs. We found that the substitution rates in LTRs vary by more than 5-fold, that this variation is positively correlated with the G+C content, and tends to increase near the centromere.
Materials and methods
The short arm of chromosome 3 (Chr3S RefSeqs) in O. sativa was obtained from http://rgp.dna.affrc.go.jp/IRGSP/Build4/chr03.fasta.gz (IRGSP 2005). The corresponding orthologous sequences of O. glaberrima were obtained from BACs whose GenBank accession numbers are in Supplementary Table 1.
Mining orthologous LTR retrotransposons sequences from O. sativa and O. glaberrima
Sequence data analyses
The orthologous LTR+NRR tracts were aligned using the program “Stretcher” (EMBOSS package—Rice et al. 2000) and were edited using the program JalView 2.3 (Clamp et al. 2004). Genetic distances between orthologous tracts (i.e., LTRs and non-repeat flanking sequences) were estimated using the Kimura 2 parameters method (Kimura 1980) as implemented in the program “Distmat” (EMBOSS package—Rice et al. 2000). Mutations were analyzed using the program DNAsp V.4 (Rozas et al. 2003). The G+C content was determined using a custom PERL script available upon request.
All the statistical analyses were carried out using scripts implemented in R language (R Development Core Team 2009).
Comparison of the O. glaberrima and O. sativa Chr3S sequences (see “Materials and methods”) revealed the presence of 70 orthologous LTR insertions, comprising 28 LTRs from 14 complete retroelements, 24 solo LTRs, and 18 LTRs from trunctated LTR-RTs, totaling 103,963 bp from orthologous LTRs and 91,954 bp from orthologous NRRs. The 14 complete elements were checked for the presence of target site duplications (TSD): all but one have TSDs. Orthologous LTR sequences were collected along with genomic sequences (NRR) flanking each LTR sequence (similar in size to each LTR sequence; Fig. 1). Since the Oryza genome is riddled with repetitive elements and their remnants (IRGSP 2005), all flanking NRR sequences were carefully inspected to identify and remove any kind of repetitive sequence that could be detected. The orthologous tracts (LTR+NRR) were aligned and manually inspected to identify all substitutions that accumulated in both the LTR and NRR sequences, respectively, since speciation took place (Supplementary Figure 1). Since the orthologous tracts were present in a common ancestor of O. sativa and O. glaberrima before species divergence, substitutions identified in both the LTR and NRR sequences of both species should have accumulated during the same amount of time. This is true for all the orthologous tracts isolated and thus offers a unique opportunity to study substitution rate variation in LTRs across the Chr3S.
Assessing the extent of nucleotide distance variation in LTRs and NRRs
Two thirds of LTR nucleotide distances fell within a 2-fold range (0.0162–0.0332), whereas 60% of the nucleotide distances varied 3-fold (0.005–0.0152) for the NRR sequences. This fact was reflected by the coefficients of variation that were 37.6% and 55.5% for LTR and NRR distances, respectively, indicating a greater degree of variation for nucleotide distances in the NRR flanking regions. To reduce the possible contribution of LTRs from complete LTR-RTs to the homogenization of nucleotide distances for LTRs alone, we recalculated the coefficient of variation for LTRs by removing all three LTRs from the complete elements. A very similar value for the coefficient of variation was obtained—38.52%. The ratio of nucleotide distances in LTR–NRR pairs varied between 0.422 and 14.609 reflecting the lack of correlation between variation in nucleotide distance for LTR and NRR regions.
Characterization of the forces behind nucleotide distance variation
Frequency of Different Substitutions
Parsing a total of 341 informative sites, we found a preponderance (315 to 26) of C to T (or G to A) mutations over the reverse suggesting that the mutation path from C to T remains the most common scenario in the case of LTR retrotransposon-related sequences. This possibly reflects the effects of cytosine methylation (Duncan and Miller 1980).
All remaining substitution types did not exhibit significant differences between NRRs and LTRs.
The G+C content of LTRs was, on average, greater than that of the NRRs (45.66% vs. 41.56%; p =9e-04 derived from a t test comparing distributional means). When LTR and flanking NRRs were considered together, the G+C content was greater for LTRs than NRRs in 73% of the cases (51 of 70). For the remaining 19 cases, five had higher substitution rates in NRRs than in LTRs, suggesting a possible effect of G+C content on substitution rate.
Multiple Linear Regression Analysis of LTR Nucleotide Distance Variation
(2 and 67 DF)
Analysis of variance table
Kimura 2p Distance Variation within LTR Retrotransposon Families
Evaluating the errors introduced by the current LTR-RT dating methodology
Kimura 2p Nucleotide Distances of Complete Retrotransposons
In addition to the identification of significant variation in K2p distance rates between pairs of orthologous LTR between species, we also observed LTR pairs from the same elements that accumulated different numbers of mutations during the same time period, post-speciation. For example, we found one case where the K2p distance rates varied by more than 2-fold (0.0442 vs. 0.0203), and in only two cases out of 14 where the K2p distance rates less that 10% between LTR pairs (Table 4).
The half-life of LTR retrotransposable elements in cereals has been estimated to be approximately six million years (Ma et al. 2004); thus, efforts to measure LTR substitution rate variation across genera (e.g., rice-maize-sorghum) are virtually impossible. Here, we used a within-genus model system to assess the extent of LTR substitution rates by scanning highly accurate Sanger-sequenced Chr3S pseudomolecules from O. sativa and O. glaberrima for orthologous transposable elements and their derivatives. These species diverged from a last common ancestor 0.64 Ma (Ma et al. 2004) and offer an ideal evolutionary vista to analyze nucleotide rate variation in both LTR-RTs and neighboring sequences.
The orthologous tracts used in this analysis comprised two contiguous LTR and NRR sequences of approximately the same lengths and were assumed to be subjected to the same “environmental pressures” since both sequence types are physically linked in the same genomic location. Thus, different mutational behavior between LTR and NRR sequences could be ascribed directly to the different nature of the sequences (i.e., LTR-RT vs. intergenic). Comparison of nucleotide distances in LTRs versus NRRs showed that in most cases (61 of 70) LTRs were evolving more rapidly than NRRs. We found only nine cases where NRR sequences were evolving faster, five of which had higher G+C contents that could explain the higher K2p distance rates. For the four remaining cases, an alternative hypothesis could be that the flanking LTRs are under evolutionary constraint or that the NRR regions actually contain uncharacterized repeats that were not detected during similarity searches.
Calculated nucleotide distances showed significant variation across the Chr3S RefSeq for both LTR and NRR sequences with a tendency to increase towards the centromeric region. The magnitude of variation in the case of LTRs spanned an almost 6-fold range. Variation of nucleotide distances is largely expected for coding genes (Wolfe et al. 1989; Zhang et al. 2002) and has recently proved to be significant also for intergenic regions in Arabidopsis (DeRose-Wilson and Gaut 2007). The nucleotide distances calculated for LTRs in most of the cases studied here were higher than that of the nearby NRR sequences; however, the degree of variation for nucleotide distances was smaller for LTRs than that for NRRs. This evidence can be explained considering that the species studied are close enough in evolutionary time that an appreciable amount of variance among loci is going to be due to coalescence processes, even if every locus has an identical substitution rate. However, if the LTR-RT elements are active or have been active in recent evolutionary time, they can only have a lower variance due to coalescence because of new or recent insertions. On the other hand, old insertions, just like NRR loci, can have long coalescences. Thus, on average, LTRs will have less variance among loci than NRRs (assuming homogenous mutation rates among loci). Yet, the amount of variability for substitution rate in LTRs remain high to the point that even LTRs belonging to the same element could accumulate mutations with a 2-fold differential rate between them. The clear lack of a relationship between the nucleotide distance in LTRs and flanking NRR sequences suggests that whatever the mechanism(s) acting on LTRs is, it does not extend its effects to the immediate flanking sequences (as far as mutations are involved). Our data also demonstrated that the mechanism(s) inducing these mutations was not specific to one LTR-RT family over another.
We identified at least two features of LTRs that appear to be important contributors to nucleotide distance variation—namely, G+C content and sequence position along the chromosome. It is clearly evident that nucleotide distances are positively correlated with G+C content, not only with LTRs but also with NRRs. Our results are consistent with previous findings that showed a positive correlation with G+C content and nucleotide substitution rates in Arabidopsis thaliana and Arabidopsis lyrata (DeRose-Wilson and Gaut 2007). Similarly, it is evident that substitution rate variation for both LTRs and NRRs increases towards the centromeric regions. This evidence agrees with recent findings in A. thaliana where a higher mutation rate in pericentromeric regions has been demonstrated (Ossowski et al. 2010). However, G+C content and position along the chromosome alone cannot explain all the variation. GC dinucleotides, CpNpG, and CpHpHp trinucleotides are known to be one of the major targets for DNA methylation (Gruenbaum et al. 1981), and DNA methylation in turn is one of the methods used by host genomes to control the chaotic effects of transposable element proliferation (Kumar and Bennetzen 1999; Zilberman and Henikoff 2004). Importantly, a certain amount of direct correlation between methylation and the position of genes along the chromosome has been demonstrated in A. thaliana where genes near centromeres were found to have a higher likely hood of being methylated (Zilberman et al. 2006). Our analysis demonstrated that the vast majority of differences in the frequency of substitution between LTRs and flanking NRRs could be ascribed to mutations possibly affecting cytosines (G:C ->A:T) which are targets of DNA methylation. In contrast, when the frequency of substitutions is compared between bases that are normally not methylated, no significant differences could be identified between LTRs and flanking NRR tracts. It is easy to speculate that the major driving force behind the mutation rate variation patterns described in this paper, including the positive correlation with G+C content and the increasing trends towards centromeres, is possibly associated with methylation. This hypothesis can be tested by performing a detailed characterization of methylation patterns in the LTR-RT pool which is beyond the scope of this work.
The molecular paleontology method of LTR-RT insertion dating has never claimed to provide rigorous and exact insertion time estimates because the pitfalls of using a single substitution rate for a population of LTR retrotransposons are well known (SanMiguel et al. 1998; Pereira 2004; Ma and Bennetzen 2006; Piegu et al. 2006). This work provides the first assessment of the extent of substitution rate variation affecting LTRs in a population of LTR retrotransposons in two closely related species. It should, however, be noted that this work, because of its experimental design, focused only on elements older than 0.64 million years. The amount of nucleotide distance variation on younger elements remains to be assessed. Our data confirm the cautious approaches that have characterized the use of the molecular paleontology dating so far. Since LTR variation rates were shown to span a nearly 6-fold range, LTR-RT insertion time dating that relies on a very general and approximate substitution rate is prone to severe errors. Such errors not only occur when different elements are analyzed in the same species but also, although to a lesser extent, when the same element is studied in two different but closely related species. These limitations indicate that LTR-RT insertion time estimate should be considered as a general qualitative assay rather than a quantitative estimation.
This work was supported by the National Science Foundation (Grant DBI-0638541 to R.A.W., S.J., and S.R.) and the Bud Antle Endowed Chair (to R.A.W.).
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST a new generation of protein database search programs. Nucleic Acids Research 1997;25:3389–3402.View ArticleGoogle Scholar
- Ammiraju JS, Zuccolo A, Yu Y, Song X, Piegu B, Chevalier F, et al. Evolutionary dynamics of an ancient retrotransposon family provides insights into evolution of genome size in the genus Oryza. Plant J. 2007;52(2):342–51.PubMedView ArticleGoogle Scholar
- Bender J. DNA methylation and epigenetics. Annu Rev Plant Biol. 2004;55:41–68.PubMedView ArticleGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java Alignment Editor. Bioinformatics. 2004;20:426–7.PubMedView ArticleGoogle Scholar
- DeRose-Wilson LJ, Gaut BS. Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evol Biol. 2007;7:66.PubMedPubMed CentralView ArticleGoogle Scholar
- Duncan BK, Miller JH. Mutagenic deamination of cytosine residues in DNA. Nature. 1980;287(5782):560–1.PubMedView ArticleGoogle Scholar
- Feschotte C, Jiang N, Wessler SR. Plant transposable elements: where genetics meets genomics. Nat Rev Genet. 2002;3:329–41.PubMedView ArticleGoogle Scholar
- Flavell JA, Dunbar E, Anderson R, Pearce SR, Hartley R, Kumar A. Ty1-copia group retrotransposons are ubiquitous and heterogeneous in higher plants. Nucleic Acids Res. 1992;20:3639–44.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaut BS, Morton BR, Mccaig BC, Clegg MT. Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996;93(19):10274–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Gruenbaum Y, Naveh-Many T, Cedar H, Razin A. Sequence specificity of methylation in higher plant DNA. Nature. 1981;292(5826):860–2.PubMedView ArticleGoogle Scholar
- Hawkins J, Kim H, Nason JD, Wing RA, Wendel JF. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 2006;16:1252–61.PubMedPubMed CentralView ArticleGoogle Scholar
- Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449(7161):463–7.PubMedView ArticleGoogle Scholar
- Jurka J, Kapitonow VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.PubMedView ArticleGoogle Scholar
- Kashkush K, Feldman M, Levy AA. Transcriptional activation of retrotransposons alters the expression of adjacent genes in wheat. Nat Genet. 2003;33(1):102–6.PubMedView ArticleGoogle Scholar
- Kimura M. Simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16:111–20.PubMedView ArticleGoogle Scholar
- Kumar A, Bennetzen JL. Plant retrotransposons. Annu Rev Genet. 1999;33:479–532.PubMedView ArticleGoogle Scholar
- Leprinc AS, Grandbastien MA, Meyer C. Retrotransposons of the Tnt1B family are mobile in Nicotiana plumbaginifolia and can induce alternative splicing of the host gene upon insertion. Plant Mol Biol. 2001;47:533–41.PubMedView ArticleGoogle Scholar
- Lewin B. Genes VI. Oxford: Oxford University Press; 1997.Google Scholar
- Lisch D. Epigenetic regulation of transposable elements in plants. Annu Rev Plant Biol. 2009;60:43–66.PubMedView ArticleGoogle Scholar
- Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA. 2004;101(34):12404–10.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma J, Bennetzen JL. Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc Natl Acad Sci USA. 2006;103(2):383–8.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma J, Devos KM, Bennetzen JL. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004;14(5):860–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma J, SanMiguel P, Lai J, Messing J, Bennetzen JL. DNA rearrangement in orthologous orp regions of the maize, rice and sorghum genomes. Genetics. 2005;70(3):1209–20.View ArticleGoogle Scholar
- Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008;452(7190):991–6.PubMedPubMed CentralView ArticleGoogle Scholar
- Neumann P, Koblizkova A, Navratilova A, Macas J. Significant expansion of Vicia pannonica genome size mediated by amplification of a single type of giant retroelement. Genetics. 2006;173:1047–56.PubMedPubMed CentralView ArticleGoogle Scholar
- Ossowski S, Schneeberger K, Lucas-Lledó JI, Warthmann N, Clark RM, Shaw RG, et al. The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis thaliana. Science. 2010;327(5961):92–4.PubMedView ArticleGoogle Scholar
- Pereira V. Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol. 2004;5(10):R79.PubMedPubMed CentralView ArticleGoogle Scholar
- Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, et al. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 2006;16(10):1262–9.PubMedPubMed CentralView ArticleGoogle Scholar
- R Development Core Team. R: A Language and Environment for Statistical Computing R. Austria: Foundation for Statistical Computing Vienna; 2009.Google Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–7.PubMedView ArticleGoogle Scholar
- Rozas J, Sanchez-Delbarrio JC, Messegyer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19:2496–7.PubMedView ArticleGoogle Scholar
- SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–5.PubMedView ArticleGoogle Scholar
- Sonnhammer EL, Durbin R. A dot matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 1995;167(1–2):GC1–10.PubMedView ArticleGoogle Scholar
- Suoniemi A, Tanskanen J, Schulman AH. Gypsy-like retrotransposons are widespread in the plant kingdom. Plant J. 1998;13:699–705.PubMedView ArticleGoogle Scholar
- The International Rice Genome Sequencing Project Project. The Map Based Sequence of the Rice Genome. Nature. 2005;436:793–800.View ArticleGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313(5793):1596–604.PubMedView ArticleGoogle Scholar
- Varagona MJ, Purugganan M, Wessler SR. Alternative splicing induced by insertion of retrotransposons into the maize waxy gene. Plant Cell. 1992;4:811–20.PubMedPubMed CentralView ArticleGoogle Scholar
- Vitte C, Ishii T, Lamy F, Brar D, Panaud O. Genomic paleontology provides evidence for two distinct origins of Asian rice (Oryza sativa L.). Mol Genet Genomics. 2004;272(5):504–11.PubMedView ArticleGoogle Scholar
- Voytas DF, Cummings MP, Koniczny A, Ausubel FM, Rodermel SR. Copia-like retrotransposons are ubiquitous among plants. Proc Natl Acad Sci USA. 1992;89:7124–8.PubMedPubMed CentralView ArticleGoogle Scholar
- Wolfe KH, Sharp PM, Li WH. Rates of synonymous substitution in plant nuclear genes. J Mol Evol. 1989;29(3):208–11.View ArticleGoogle Scholar
- Zhang L, Vision TJ, Gaut BS. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol Biol Evol. 2002;19(9):1464–73.PubMedView ArticleGoogle Scholar
- Zilberman D, Henikoff S. Silencing of transposons in plant genomes: kick them when they’re down. Genome Biol. 2004;5:249.PubMedPubMed CentralView ArticleGoogle Scholar
- Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S. Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet. 2006;39:61–9.PubMedView ArticleGoogle Scholar
- Zuccolo A, Sebastian A, Talag J, Yu Y, Kim H, Collura K, et al. Transposable element distribution, abundance and role in genome size variation in the genus Oryza. BMC Evol Biol. 2007;7:152.PubMedPubMed CentralView ArticleGoogle Scholar