Morphological Changes Caused by Rice Genome Doubling
To study the impact of WGD without interspecies hybridization effects, we induced autotetraploid rice (4x) from a diploid cultivar (2x), Oryza sativa ssp. indica cv. 93–11, which had been independently self-pollinated over 10 generations. Diploid and autotetraploid rice were confirmed by flow cytometry (Additional file 1: Figure S1). The autotetraploid rice exhibited remarkable differences in morphological traits compared with the diploid rice, such as decreased leaf size, root length and whole plant size during dynamic growth (Fig. 1a and Additional file 1: Figure S2a). Moreover, at the 12-day seedling stage, a significantly thicker leaf midvein (Fig. 1b) and greater cortex cell area (Fig. 1c) were observed in 4 × rice. At mature stages, the 4 × rice displayed enlarged plant height and panicle and grain sizes (Additional file 1: Figure S2b–S2e). Adult autotetraploids have increased branch numbers, enlarged cells, larger stomata and seed sizes, lower fertility (Zhang et al. 2015; Zhang et al. 2019; Li et al. 2020), and lower growth rates in young seedlings (Dudits et al. 2016; Allario et al. 2011).
Increasing Densities of Euchromatin Accessibility in the Autotetraploid Rice Genome
Studies in Arabidopsis have shown that genome doubling affects chromatin architecture; in particular, autotetraploids presented more interchromosomal interactions and fewer short-range chromatin interactions than diploid progenitors (Zhang et al. 2019). However, the intrinsic feature of the relationship between ACRs and autopolyploidization is still undetermined. Therefore, we applied ATAC-seq to aboveground tissues containing young leaves and stems of both autotetraploid rice and its diploid parent. As a result, 29,412, 37,998, and 24,886 peaks for 2 × rice and 36,135, 39,383, and 33,633 peaks for 4 × rice were called from three replicates of ATAC-seq and displayed strong Pearson correlation coefficients (R2 > 0.9) (Additional file 2: Table S1; Additional file 1: Figure S3a and b), suggesting that the data are reliable with high reproducibility. Subsequently, we identified 17,433 and 18,013 ACRs derived from these peaks in diploid and autotetraploid rice (Additional file 3: Table S2). Moreover, the repeatability of these ATAC-seq datasets can also be illustrated by their decoration patterns on ~ 80 kb chromatin region (Additional file 1: Figure S3f). Together, our data provide an overview of repeatable and reliable ACRs in 2 × and 4 × rice.
In rice chromosomes, the majority of heterochromatin is distributed in the pericentromeric regions, with chromosome 4 having a distinct pattern in which the entire left (short) arm is highly heterochromatinized (Cheng et al. 2001). We observed that ACRs tend to be enriched in euchromatic regions (Fig. 2a), consistent with our previous observations in the sorghum genome (Zhou et al. 2020). For comparison, we analyzed genome-wide 5 mC (a heterochromatic gene mark; Fig. 2a; GSE121274) and H3K4me3 (a euchromatic gene mark; Fig. 2a; SRR6781461) in the same tissues of wild-type plants. While more H3K4me3 peaks were found in euchromatic regions, 5 mC was relatively more enriched in the heterochromatic and pericentromeric regions (Fig. 2a), confirming previous data (Tan et al. 2016). The analysis revealed that the rice autopolyploidization resulted in clear gains of ACRs in euchromatic regions of the chromosomes, and to a lesser extent, slight losses in heterochromatin regions (Fig. 2a). Thus, our data indicated that genome duplication is involved in the ACRs of both heterochromatic and euchromatic regions.
In detail, we found that a higher frequency of ACRs occurred in autotetraploid rice genes (17.8%) than in diploid rice genes (15.5%), while this phenomenon was not observed in 2 × and 4 × rice repeats (Fig. 2b). To study the effects of duplication on the chromatin accessibility of different genomic elements, we calculated the percentages of ACR-associated promoters, 5′ untranslated regions (5′UTRs), 3′UTRs, coding exons, introns and intergenic regions. In general, ACRs in 2 × and 4 × rice were both located in much higher proportions in the promoter, intergenic, and 5′UTR regions than in other regions in gene bodies (Fig. 2b), which was similar to the distribution pattern of ACR in sorghum gene elements (Zhou et al. 2020). Moreover, ACRs in the autotetraploid rice promoter and 5’UTR exhibited a clear increase in contrast to diploid rice (Fig. 2b), which suggested that genome doubling may be a mechanism that promotes the expression of genes in 4 × rice in response to genome-dosage effects following WGD. To analyze the autopolyploidization effects on ACR density in different regions of rice protein-coding genes, we calculated the average ACR density for every 100-bp interval of each gene and its 2-kb upstream and downstream flanking regions in 2 × and 4 × plants. The analysis revealed that, in both 2 × and 4 × rice, ACRs were mainly present around TSSs in contrast to other genic regions (Fig. 2c), which was in line with the observation of much lower occurrences of the ACRs in the 3′UTR (Fig. 2b). Clearly, autotetraploid rice displayed a higher ACR density than diploid rice (Fig. 2c). Collectively, these data indicated that rice autopolyploidization may have a function in modulating chromatin accessibility.
Positional ACR Genes are Enriched in Specific Biological Pathways as a Result of Autotetraploidization in Rice
The functional roles of ACRs in transcriptional regulation provide physical scaffolds to recruit transcriptional coregulators and/or chromatin remodelers (Klemm et al. 2019; Zhou et al. 2020; Lu et al. 2019). However, the mechanism by which genome duplication controlling gene expression is dependent on ACRs remains unclear. To address this issue, two sets of RNA-seq data were obtained from three replicates of the same stages of 2 × and 4 × rice (R2 > 0.9; Additional file 1: Figure S3c and d; Additional file 2: Tables S1 and Additional file 4: Table S3). In addition, to reevaluate the differentially expressed genes (DEGs) recognized by DEseq2 (Love et al. 2014), we introduced Cuffdiff (Trapnell et al. 2010) and EdgeR (Robinson et al. 2010), which are alternative analyses for RNA-seq. Consistently, the vast majority of DEGs (Additional file 4: Table S3) in DEseq2 overlapped with DEGs in edgeR (above 95%), but only one-third of the DEGs called in Cuffdiff overlapped with the DEGs from DEseq2 and edgeR (Additional file 1: Figure S4a and S4b). In contrast with Cuffdiff, the fold change of DEGs from DEseq2 and edgeR displayed strong Pearson correlation coefficients (R2 = 0.994; Additional file 1: Figure S4c). Moreover, the RNA-seq data were validated by qPCR tests of randomly selected genes (Additional file 1: Figure S4d), which revealed a high correlation with the RNA-seq results (R2 = 0.655; Additional file 1: Figure S4e). Thus, DEseq2 should be suitable to calculate differentially expressed genes in 2 × and 4 × rice with the classical normalization method according to previously described protocols (Zhang et al. 2019).
These data revealed that active genes that showed higher expression levels (top 20%) exhibited typical sharp peaks of ACRs at TSSs; however, no obvious peaks were observed in inactive genes (bottom 20%), which showed weak expression levels (Fig. 3a). These data indicated that the accumulation of ACRs around the TSS contributed to positive control of gene transcription in rice. Moreover, compared to the promotion of ACR in modulating transcriptional processes in diploid rice, ACR in autotetraploid rice displayed a much greater ability to positively control gene expression (Fig. 3a). Overall, these analyses indicated that gene expression in autotetraploid rice was associated with chromatin accessibility.
Next, we classified ACRs based on their proximity to the nearest annotated genes. As a consequence, 6555 (37.6%) and 7641 (42.4%) of the ACRs in 2 × and 4 × rice, respectively, were designated genic ACRs (gACRs) because they overlapped with the nearby annotated genes by at least 1 bp (Fig. 3b; Additional file 3: Table S2). In addition, 10,878 (62.4%) and 10,372 (57.5%) of the ACRs in 2 × and 4 × rice, respectively, were designated intergenic ACRs (iACRs) because they were at least 1 bp away from nearby genes (Fig. 3b; Additional file 3: Table S2). Accordingly, there was a noticeable increase in gACR-associated genes in 4 × rice (6603) and no obvious change in either ACR- or iACR-associated genes compared to the total number of genes in 2 × rice (5751) (Fig. 3b; Additional file 5: Table S4). Next, we characterized the basic genomic features of the various types of ACRs. In general, the sizes of full-length 4 × rice ACR transcripts were shorter than those of 2 × rice transcripts (Additional file 1: Figure S5a). The results showed no obvious differences in A/T-rich regions in autotetraploid rice and its diploid parent (Additional file 1: Figure S5b). In retrospect, these distinct discrepant traits of ACRs were found in 2 × and 4 × rice, indicating that ACRs appear to have a distinct function during rice autopolyploidization.
To better understand the correlations of gene expression levels with different types of ACRs in 2 × and 4 × rice, we classified all genes into four categories: (1) genes associated with only gACRs (only gACRs), (2) genes associated with only iACRs (only iACRs), (3) genes associated with gACRs and iACRs (igACRs), and (4) genes without ACRs (nonACRs). Then, we calculated the transcriptional levels of these groups of genes, and the results showed that the average expression levels of genes with only gACRs or only iACRs were significantly higher than those of genes without ACRs (Fig. 3c). In fact, we observed that only gACR-associated genes displayed significantly lower expression levels than igACRs but higher expression levels than only iACRs (Fig. 3c). These results were consistent with previous observations in sorghum (Zhou et al. 2020). The analysis revealed that genic ACRs play predominant roles in controlling gene transcription in the rice genome. Additionally, only gACR-associated genes in 4 × rice displayed higher transcriptional activity than those in 2 × rice, which was not observed in other groups of genes (Fig. 3c). To further refine the correlation of the DEGs and differential ACR-associated genes (DAGs), we defined gACR-associated genes that specifically existed in 2× (2×-specific) or 4× (4×-specific) plants, similar to iACRs (Additional file 1: Figure S6a). In terms of DEGs (Additional file 1: Figure S6b), including 1330 upregulated genes and 1317 downregulated genes, autotetraploid rice had nearly 150 upregulated 4×-specific gACR-associated genes, which was more than the number of downregulated genes (Fig. 3d). In contrast, obvious differences were not found in other overlapping genes between DAGs and DEGs (Fig. 3d). As an example, the expression and ACR distributions of representative transcriptionally normalized reads corresponding to gACRs, iACRs, and igACRs in 2 × and 4 × rice are illustrated (Fig. 3e). In total, the analysis showed that transcriptional regulation of ACRs was also associated with their own positional states in the rice genome. In contrast with intergenic ACRs, genic ACRs were able to modulate transcriptional activity, which was much more prominent during rice genome duplication.
Consistently, Gowinda analysis revealed that genes involved in DNA-binding transcription factor activity, regulation of gene expression, response to biotic stimulus and membrane part were enriched (P value < 0.05) for gACRs that specifically exist in 4 × rice (Additional file 6: Table S5). These processes were also enriched for gACR-associated genes in autotetraploids, whereas none of the genes associated with any GO terms were enriched for those in 2 × rice (Additional file 6: Table S5). In particular, these differentially expressed genes are associated with responses to stimuli in autopolyploid plants (Zhang et al. 2019; Allario et al. 2011). Thus, these analyses reinforced the hypothesis that gACRs were able to promote transcription, which was enhanced in the process of rice genome doubling.
The Involvement of ACRs in Transcriptional Regulation is Associated with H3K36me2 and H3K36me3 During Rice Autopolyploidy
The interrelationship between histone marks with ACRs has been studied in Arabidopsis and other plants, suggesting that the interplay of ACRs and histone marks is worth studying in plants (Lu et al. 2019; Sun et al. 2019; Frerichs et al. 2019; Zhou et al. 2020). However, the crosstalk between ACRs and histone modification during plant genome autopolyploidization is still unclear. To investigate the epigenomic signatures of chromatin in relation to ACRs in the rice diploid and autotetraploid genomes, we integrated our ATAC-seq data in 2 × and 4 × rice with ChIP-seq and MeDIP-seq data (Additional file 7: Table S6). Pearson correlation coefficients were also introduced to indicate the degree of concurrence of histone marks at transcript regions, which demonstrated that ACRs in either 2 × or 4 × rice preferred to be associated with active marks, other than the well-known inactive and heterochromatic modifications H3K9me2, H3K27me3, and 5mC (Fig. 4a). These data were in accordance with previous observations that ACRs contributed to the positive control of gene transcription in plants (Zhou et al. 2020; Lu et al. 2019). More interestingly, the correlogram suggested that ACRs in 4 × rice preferred to recruit the H3K36me2 and H3K36me3 marks due to their positive correlations with ACRs (Fig. 4a). Similarly, the correlations of ACRs and epi-marks in the whole doubled genome appeared to be similar to those in transcriptional regions to some extent (Fig. 4a and Additional file 1: Figure S7). Collectively, the analysis supported that H3K36me2/3 act as predominant chromatin marks during rice genome autopolyploidization.
To analyze the effect of genome duplication on histone methylation levels at different regions of rice protein-coding genes, whole-genome profiles of two histone marks (H3K36me2 and H3K36me3) from two replicates of ChIP-seq in 2 × and 4 × rice that displayed strong Pearson correlation coefficients (R2 > 0.8; Additional file 1: Figure S3e and S3f) were examined, while the non-ChIP genomic DNA was included as input. As a result, a large number of histone modification peaks (from 10,836 to 92,633) were called (Additional file 2: Table S1). We observed that the evolutionarily conserved active mark H3K36me3 was abundant near the transcriptional start site (Fig. 4b); in contrast, H3K36me2 was distributed evenly across the transcriptional region (Fig. 4b), indicating that these two types of H3K36me marks may have different roles. Interestingly, the augmented levels of H3K36me2 and H3K36me3 were associated with rice autopolyploidization, especially H3K36me2 (Fig. 4b), which was in line with the observation of a correlation between multiple epi-marks and ACRs (Fig. 4a). In detail, 20,474 and 4,170 genes, respectively, were found to be only marked with H3K36me2 and H3K36me3 in 4 × rice (Fig. 4c). In addition, compared to 2 × rice, our findings showed that a dominant proportion (no less than 50%) of H3K36me2-marked genes were found for gACR- or iACR-specific genes in 4 × rice (Fig. 4c). At the same time, the percentages of H3K36me3-marked genes that were associated with gACRs or iACRs were also slightly increased in 4 × rice (Fig. 4c). Unsurprisingly, the ACR density levels of only H3K36me2- and H3K36me3-associated genes in autotetraploid rice were higher than those in diploid rice (Fig. 4d). Additionally, the enrichment of H3K36me2 in either gACR- or iACR-associated genes, which were specific to 4 × rice, was higher than that in 2 × rice; in contrast, H3K36me3 appeared to be slightly ameliorated in gACR-specific genes in 4 × rice compared to 2 × rice (Fig. 4e). These results indicated that methylation of both H3K36me (H3K36me2/3) was required for rice genome duplication and that H3K36me2 may play a dominant role in functional regulation.
Then, we examined how and to what extent ACRs and H3K36me2/3 modulate transcriptional activity in 2 × and 4 × rice. Consequently, three chromatin states were generated from classified genes that were associated with gACRs or iACR in 2 × rice and 4 × rice: only H3K36me2 (me2), combination of H3K36me2 and H3K36me3 (me2/3), and only H3K36me3 (me3). Overall, for gACR-related or iACR-related genes, me2/3-marked genes displayed significantly higher expression levels (Fig. 4f; P value < 0.01 by Wilcoxon rank sum). On average, me3 genes were correlated with higher gene activity in 2 × and 4 × rice, respectively, compared with me2 genes, suggesting that the combining effect (coenrichment) of H3K36me2 and H3K36me3 ought to be correlated with higher transcriptional activity (Fig. 4f; P value < 0.01 by Wilcoxon rank sum). Interestingly, compared with that of genes covered with 2×-specific genic ACRs, the transcriptional level was obviously enhanced in 4×-specific gACRs associated with H3K36me2 and H3K36me3 co-marked genes or only H3K36me3-marked genes (Fig. 4f), suggesting that doubling the genome induced H3K36me2 and H3K36me3 marks that were related with the higher transcriptional activity. However, the expression of only H3K36me3-marked 4x-specific iACR-associated genes showed no significant differences from H3K36me2/3-marked genes and 2×-specific iACR genes (Fig. 4f). These data indicated that, in contrast with intergenic ACRs, genic ACRs play dominant roles in the regulation of gene expression during rice doubling based on the two types of H3K36 histone methylation. Taken together, these analyses suggested that within transcriptional regions, the combination of H3K36me2 and H3K36me3 may be associated with chromatin accessibility in rice autopolyploidization.
Rice Genome Duplication Modulates Global Metabolic Profiling
Autopolyploidization can influence metabolism in some plants (Tan et al. 2019). In rice, however, the effect of autotetraploidization on the accumulation of these metabolites has rarely been investigated. To determine whether genome doubling has an impact on rice metabolism, we sampled aerial tissues at the same stage and performed unbiased global metabolic profiling based on HPLC-Q-TOF/MS (Additional file 1: Figure S8a). A total of 119 metabolites that matched known biochemical parameters were detected (Additional file 8: Tables S7). The replicates of metabolic data showed strong Pearson correlation coefficients (R2 > 0.85) (Additional file 9: Figure S8b), suggesting that the data outputs were reliable with high reproducibility.
To develop a systematic approach based on global metabolomics profiling, different metabolic patterns were identified for 2 × and 4 × rice leaves. During rice genome doubling, obvious changes (more than twofold; P value < 0.05) were detected for specific compounds (Fig. 5a), including 83 upregulated and 36 downregulated DAMs (differential accumulated metabolites; Additional file 1: Figure S9; Additional file 8: Tables S7). Obviously, increased metabolites in various pathways were induced during rice genome duplication (P value < 0.05). Among these accumulations in 4 × rice, the number of secondary metabolites was the highest (Additional file 1: Figure S9). Intriguingly, secondary metabolites (also called “specialized metabolites”) play roles in core plant processes (Yuan and Grotewold 2020), including phenylpropanoids and alkaloid synthesis (Dong and Lin 2021). In detail, more than 75% of the differentially accumulated secondary metabolites were related to phenylpropanoid and alkaloids (Fig. 5a). It was previously observed that flavonoid contents were increased in autopolyploid Hylocereus (Fig. 5a). It was previously observed that contents of flavonoids were increased in autopolyploid Hylocereus line (Cohen et al. 2013), and upregulated genes were associated with secondary metabolism in Stevia rebaudiana autotetraploids compared to diploids (Xiang et al. 2019). One-third of the identified secondary metabolites were phenylpropanoids (Fig. 5a), which are of vital importance for plant development and survival (Dong and Lin 2021). Among the multiple regulatory mechanisms of phenylpropanoid metabolism, transcriptional regulation plays a central role in the regulation of the biosynthesis of phenylpropanoid metabolites and explains almost all the regulatory effects (Yuan and Grotewold 2020; Dong and Lin 2021). To monitor the ACR density and transcriptional levels in diploid and autotetraploid rice leaves, we first mapped DEGs and DAM in the rice phenylpropanoid pathway (Fig. 5b). We observed that cinnamic acid 4-hydroxylase (C4H), which is a cytochrome P450-dependent monooxygenase, and hydroxycinnamoyl transferase (HCT), which leads to the biosynthesis of two major lignin building units, displayed significantly differential transcriptional levels (Fig. 5b). As a result, flavones were detected at much higher levels in autotetraploid rice leaves than in diploid rice leaves (Fig. 5b). Moreover, as shown by the Genome Browser (Fig. 5c), our ATAC-seq and RNA-seq data suggested that the levels of ACR density and expression of C4H were higher in 4 × rice than in 2 × rice; in contrast, the ACR density of the HCT gene and its transcriptional level were higher in diploid rice. Accordingly, DNase-digested PCR and quantitative real-time PCR results confirmed these results (Fig. 5d, e). Together, a positive connection was detected between these metabolites and rice autopolyploidization, suggesting that metabolite accumulation could be used as a biomarker for plant genome duplication and providing some insights into the phenotypic changes induced by rice genome doubling.