Genome-wide analyses of late pollen-preferred genes conserved in various rice cultivars and functional identification of a gene involved in the key processes of late pollen development

Background Understanding late pollen development, including the maturation and pollination process, is a key component in maintaining crop yields. Transcriptome data obtained through microarray or RNA-seq technologies can provide useful insight into those developmental processes. Six series of microarray data from a public transcriptome database, the Gene Expression Omnibus of the National Center for Biotechnology Information, are related to anther and pollen development. Results We performed a systematic and functional study across the rice genome of genes that are preferentially expressed in the late stages of pollen development, including maturation and germination. By comparing the transcriptomes of sporophytes and male gametes over time, we identified 627 late pollen-preferred genes that are conserved among japonica and indica rice cultivars. Functional classification analysis with a MapMan tool kit revealed a significant association between cell wall organization/metabolism and mature pollen grains. Comparative analysis of rice and Arabidopsis demonstrated that genes involved in cell wall modifications and the metabolism of major carbohydrates are unique to rice. We used the GUS reporter system to monitor the expression of eight of those genes. In addition, we evaluated the significance of our candidate genes, using T-DNA insertional mutant population and the CRISPR/Cas9 system. Mutants from T-DNA insertion and CRISPR/Cas9 systems of a rice gene encoding glycerophosphoryl diester phosphodiesterase are defective in their male gamete transfer. Conclusion Through the global analyses of the late pollen-preferred genes from rice, we found several biological features of these genes. First, biological process related to cell wall organization and modification is over-represented in these genes to support rapid tube growth. Second, comparative analysis of late pollen preferred genes between rice and Arabidopsis provide a significant insight on the evolutional disparateness in cell wall biogenesis and storage reserves of pollen. In addition, these candidates might be useful targets for future examinations of late pollen development, and will be a valuable resource for accelerating the understanding of molecular mechanisms for pollen maturation and germination processes in rice. Electronic supplementary material The online version of this article (10.1186/s12284-018-0219-0) contains supplementary material, which is available to authorized users.


Background
The life cycles of all land plants alternate between diploid sporophyte and haploid gametophyte. By the time gametophyte development is finished, a floret of rice contains one carpel with a single egg cell, plus six stamens, each filled with approximately 1000 pollen grains (Prasad et al. 2006;Zhang et al. 2011). Pollen development covers several steps, from stamen meristem specification to pollen grain formation and pollination, and various cell and tissue types are involved. Not only dynamic changes in gene expression but also post-translational control of proteins are important for those processes (Guan et al. 2014). Development of the gametes is modulated by gametophytic and sporophytic gene expression. In the case of the pollen wall, the tapetum contributes to exine and tryphine formation while the gametophytes are responsible for intine formation (Shi et al. 2015). The tapetum, the innermost anther wall layer, serves as a nutritive tissue that secretes pollen wall components, nutrients, and metabolites (Xu et al. 2014). Many tapetum-expressed genes have been shown to regulate anther and pollen wall development (Xu et al. 2014;Shi et al. 2015). In addition, plant hormones such as auxin and gibberellin (GA) control pollen development in sporophytic tissues (Sakata et al. 2014;Cecchetti et al. 2017).
The male gametophyte originates from a pollen mother cell that undergoes meiosis to produce tetrads of haploid microspores (Owen and Makaroff 1995). After their release from those tetrads, the microspores enlarge and then undergo asymmetric mitosis to form bi-cellular pollen with different cell fates (pollen mitosis I). Afterward, the smaller generative cell divides symmetrically to produce two sperm cells (pollen mitosis II). Tri-cellular mature pollen is further processed to complete its development and is then ready to germinate (Bedinger 1992).
High-throughput microarray data provide a powerful tool for identifying genes at the genome scale that control pollen development. Genome-wide investigations have already been performed with the male gametophytic transcriptomes in Arabidopsis and rice (Becker et al. 2003;Honys and Twell 2003;Honys and Twell 2004;Suwabe et al. 2008;Fujita et al. 2010;Wei et al. 2010;Aya et al. 2011). Those studies have revealed that mature pollen grains have a unique transcriptome profile with a higher proportion of selectively expressed genes than in any other tissues (Becker et al. 2003;Wei et al. 2010). This unique transcriptome represents the biological processes required for pollen development, i.e., cell wall metabolism, signalling, and cytoskeleton dynamics. Honys and Twell (2004) and Wei et al. (2010) have reported dynamic changes in the transcriptomes of Arabidopsis and rice, based on their sequential stages of pollen development. Patterns of changes in their transcripts are conserved between those two species. The diversity of these transcripts is greatly decreased as the pollen progresses from uni-cellular microspores to mature grains (Wei et al. 2010). The transition from bi-cellular to tri-cellular pollen was also observed by an increase in the proportion of male gametophyte-specific transcripts (Wei et al. 2010). Competition among pollen grains for an egg cell to produce a diploid zygote is a common phenomenon. Because rice pollen begins to germinate within 2 min after pollination (Chen et al. 2008), most of the required components for this are already present in the mature grains. Analysis by Wei et al. (2010) has revealed that the gene expression profiles of mature grains and germinated grains are significantly and positively correlated (r = 0.99), suggesting that the former stores a set of transcripts for germination. Moreover, inhibition of RNA synthesis by actinomycin D does not interrupt germination, indicating that the RNAs necessary for germination are already present in mature grains (Mascarenhas 1993;Hao et al. 2005).
Success during the late stages of pollen development is critical if crop yields are to be sustained. Applying microarray technologies with in rice, a model system, several research groups have conducted genome-wide transcriptome analyses and identified genes that are preferentially expressed in the pollen (Fujita et al. 2010;Deveshwar et al. 2011;Russell et al. 2012). Although different numbers of genes -453 (Fujita et al., 2010), 735 (Deveshwar et al., 2011), and 1376 (Russell et al., 2012) have been uncovered in those independent studies, the consistency of expression patterns has not been assayed over a range of growing environments and genetic backgrounds. Using Arabidopsis, Suzuki et al. (2008) and Dobritsa et al. (2011) have performed large-scale screening of pollen exine mutants from randomly selected lines and have identified 12 and 14 genes involved in forming the pollen exine. Although numerous gene expression data and genomewide gene-indexed mutant populations are available for rice, no previous attempts have been described to understand the roles of genes in controlling late pollen development on a large scale. Thus, such systematic approaches might help accelerate our identification of rice genes that function in mature pollen development and germination.
A shift in gene expression exists during pollen development and bi-cellular pollen may be a key point for the regulation of shift (Wei et al. 2010). "Late pollen-preferred genes" was defined as preferentially expressed genes at pollen following the bi-cellular stage. In this study, we identified late pollen-preferred genes conserved in japonica and indica cultivars of rice. Here we provide detailed results from our genome-wide analysis of such genes conserved in two subspecies of rice. We also discuss mutants obtained through T-DNA and gene-editing that show defects in gene-transfer through the male gamete.

Results
Meta-expression analysis and genome-wide identification of late pollen-preferred genes conserved in various rice cultivars To identify the late pollen-preferred genes that are conserved among japonica and indica cultivars of rice, we examined publicly available Affymetrix rice microarray data. From the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO; http://www.ncbi.nlm.nih.gov/geo/), we downloaded and analyzed six series of data prepared from developing anthers and pollen grains (Additional file 1: Table S1) (Edgar et al. 2002). The samples were arranged in order according to the process of anther and pollen development in each cultivar. Intensity values for these six series were first normalized with the Affy package in the R program and then log 2 -transformed. As a control, all of the Affymetrix anatomical meta-expression profiles in the Rice Oligonucleotide Array Database (ROAD;), except for two anther samples, were used to check expression patterns in other tissues/organs (Cao et al. 2012). Afterward, we performed K-means clustering (KMC) analysis with the Euclidean Distance Metric and grouped 57,382 probes into 36 clusters based on their expression patterns. Clusters 2 and 35 exhibited the highest expression at the later stages, after bi-cellular pollen (Additional file 2: Figure S1). As a result, we found that 627 genes, represented by 750 probe sets in the rice affymetrix array, showed late pollen-preferred expression patterns in both subspecies (Fig. 1). These gene clusters are listed in Table S2 (Additional file 1).
Expression patterns for late pollen-preferred genes confirmed through the GUS reporter system To verify the patterns of late pollen-preferred expression in our candidate genes, we used two strategies: a promoter trap system and generation of transgenic plants under the control of selected promoters. We have previously described this promoter trap technique using T-DNA that Fig. 1 Heatmap expression analysis of 627 late pollen-preferred genes. Expression patterns conserved among indica (green box) and japonica (red box) rice types are shown over several stages of development. Microarray data numbered 22 for indica rice (5 stages for anthers and 3 for pollen) and 42 for japonica rice (8 stages for anthers and 5 for pollen). Anatomical meta-expression data from ROAD were used to check expression patterns in other tissues. ACF, archesporial cell-forming stage; BG, bi-cellular gametophyte stage; Fl, flowering stage; GP, germinating pollen; Me, meiotic stage; Me1, meiotic leptotene stage; Me2, meiotic zygotene-pachytene stage; Me3, meiotic diplotene-tetrad stage; MP, mature pollen stage; PMe, pre-meiosis; TG, tri-cellular pollen stage; UG, uni-cellular gametophyte stage. Yellow color in heatmap indicates high level of expression; dark-blue, low expression carries the promoterless GUS reporter gene in japonica rice (Jeon et al. 2000). In all, 22 lines were chosen that had an in-frame fusion of the promoterless GUS in the genic region. Expression of that reporter gene was detected at three developmental stages. Subsequently, we identified two lines showing late pollen-preferred GUS-staining patterns ( Fig. 2a and b), as had been expected from our meta-expression analysis depicted in Fig. 1. The efficiency of GUS expression was 9.09% (2/22), which was approximately 16-fold higher than the 0.58% we had previously found (Jeong et al. 2002). This was because, for our current investigation, we chose lines with T-DNA insertions within the late pollen-preferred genes whereas earlier studies used randomly selected samples (Jeong et al. 2002). The T-DNAs were inserted into LOC_ Os11g20384, encoding SacI homology domain-containing protein (Line 1A-13,819), and into LOC_Os07g17310, encoding B12D protein (Line 2D-41,188). Schematic diagrams of those T-DNA insertions are presented in Figure S2 (Additional file 2). In a separate examination, we used transgenic plants harboring promoter and GUS fusion constructs. Six genes in Clusters 2 and 35 were chosen that exhibited expression values above 13 log 2 in mature pollen; their locus numbers and promoter regions are shown in Table S3 (Additional file 1). Their promoter activities had already been tested in Arabidopsis (Oo et al. 2014). LOC_Os11g45730 encodes pectinesterase; LOC_ Os02g50770, plant peroxidase family protein; LOC_ Os01g69020, cell division protein FtsZ family protein; LOC_Os05g46530, plant invertase/pectin methylesterase (PME) inhibitor domain-containing protein; LOC_ Os07g14340, pectinesterase inhibitor domain-containing protein; and LOC_Os04g25190, pollen allergen Lol p2 family protein. The GUS expression patterns within transgenic lines for these genes were monitored in rice at three sequential stages of pollen development ( Fig. 2c-g). Strong GUS activity was detected at the mature pollen stage, as indicated in the heatmap. The pattern of late  -preferred expression by these genes, as shown in Fig. 2, was reconfirmed by real-time PCR analysis. For all eight genes, strong expression was detected at the tri-cellular mature pollen stage (Fig. 3). All of our results demonstrated that the meta-expression data for these genes are a valuable source of novel promoters for controlling traits associated with late pollen development or germination and for discovering the novel functions of such genes.

Biological processes specific to late pollen-preferred genes
We conducted a Gene Ontology (GO) enrichment analysis in ROAD (Jung et al. 2008) to query 627 late pollenpreferred genes belonging to Clusters 2 and 35. Our objective was to examine GO enrichment within the category of 'biological process' with the ROAD GO enrichment tool. As a result, we identified 547 GO terms assigned to 308 genes; the other 319 genes did not have GO annotations (Additional file 1: Table S4). Significant terms in that category were selected with hypergeometric p-values ≤0.05 and enrichment values of at least two-fold. Of these, 16 GO terms were over-represented in late pollen-preferred genes (Table 1). Significantly enriched terms were found for biological processes corresponding to phosphatidylcholine metabolism (22.3 GO fold-enrichment value), clathrin coat assembly (19.7), GA metabolism (18.1), cell wall modifications (17.3), phosphatidyinositol metabolism (12.1), glycogen biosynthesis (10.9), starch biosynthesis (9.7), polysaccharide catabolism (8.5), sexual reproduction (7.4), cytoskeleton organization (7.0), cellulose biosynthesis (6.4), potassium ion transport (3.2), and ATP biosynthesis (2.9). The most abundant terms were for 'protein amino acid phosphorylation' , representing 56 genes. This was followed by 34 genes for transport and 23 for carbohydrate metabolism.

Late pollen-preferred genes functionally classified through MapMan analysis
MapMan allows one to group genes into different functional categories and visualize data through diagrams (Yoo et al. 2015). To classify these late pollen-preferred genes, we primarily used the metabolism and regulation overviews installed in the MapMan tool kit (Fig. 4a). In the metabolism overview, cell wall organization and modification (48 genes), lipid metabolism (10), and metabolism of major carbohydrates (9) were the most (c), LOC_Os02g50770 (d), LOC_Os01g69020 (e), LOC_Os05g46530 (f), LOC_Os07g14340 (g), and LOC_Os04g25190 (h) were analyzed by quantitative real-time PCR for various organs. Transcript levels were normalized to rice Ubi5 and calculated by comparative cycle threshold method. Error bars indicate standard deviation (sd). y-axis, relative expression level to rice Ubi5; x-axis, sample names used for analyses. 3DAPS, seeds at 3 d after pollination; 7DS, 7-day-old shoots; 7DR, 7-day-old roots; ML, mature leaves; MF, mature flower; YP, young panicles (2 cm long); and anthers at bi-cellular gametophyte stage (BG); tri-cellular mature pollen stage (MP); meiosis and tetrad (M/T); and uni-cellular gametophyte stage (UG)  Fig. 4 MapMan analysis of late pollen-preferred genes. a Metabolic overview and (b) regulation overview were analyzed with 627 genes. Cell wall organization and modification, lipid metabolism, and carbohydrate metabolism were dominant MapMan terms in metabolic overview. Red boxes in these overviews indicate late pollen-preferential. Genes associated with transcription factors, protein modification, and protein degradation were identified from regulation overview. In addition, diverse hormone responses, signaling pathway components, and redox responses for late pollen development were visualized. Detailed information about overviews is shown in Table S5.
common (Additional file 1: Table S5). The predominance of the cell wall organization and modification terms was well-matched with our finding that GO terms related to those processes were enriched in the biological process category (Table 1). In particular, cell wall-modifying enzymes such as pectin methylesterases (PMEs) and cellulose synthases were functionally well-characterized during pollen tube growth because the cell walls of those tubes consist of callose in the inner layer plus pectin, cellulose, and hemicellulose in the outer layer (Krichevsky et al. 2007). Our list included 11 PMEs and 4 cellulose synthases, implying that those genes are involved in pollen maturation and germination in rice. Our regulation overview showed that genes associated with transcriptional regulation, protein modification, and protein degradation were the most frequently coupled with late pollen-preferred genes (Additional file 1: Table S5). Of them, 26 genes were assigned to the transcription category and 47 genes were related to protein modification ( Fig. 4b; Additional file 1: Table S5). For protein degradation, 19 genes were found. We also found 12 genes in the category of hormone metabolismfive for ethylene synthesis, three for auxin, and two for GA (Additional file 1: Table S6).

Comparative analysis of late pollen-preferred genes from rice and Arabidopsis
Late pollen-preferred genes in Arabidopsis were identified using a meta-anatomical expression database that we developed from the Arabidopsis Affymetrix array data deposited in the NCBI GEO database (See Materials and Methods). This produced 773 genes (Additional file 2: Figures S3 and S4; Additional file 1: Table S7). We then searched for orthologs between rice and Arabidopsis with a database of orthologous groups among rice, Arabidopsis, Brachypodium, maize (Zea mays), Populus, Vitis vinifera, and Sorghum bicolor from the Rice Genome Annotated Project (RGAP; http://rice.plantbiology.msu.edu/annota-tion_pseudo_ortho.shtml). Rice orthologs for 380 Arabidopsis genes were identified based on the sequence information. Among them, orthologs for 175 Arabidopsis genes exhibited patterns of late pollen-preferential expression (Additional file 1: Table S8). Another search based on 627 rice late pollen-preferred genes led to the identification of 220 Arabidopsis orthologs (Additional file 1: Table  S9). Of them, 133 had Arabidopsis orthologs with late pollen-preferential expression. Therefore, all of these results indicated that functional conservancy was approximately 20%, with 175 (22.64%) of 773 Arabidopsis having rice orthologs and 133 (21.21%) of 627 rice genes having Arabidopsis orthologs. We believe that genes unique to each species, i.e., 407 in rice and 393 in Arabidopsis, will be good targets for uncovering the evolutionary developmental differences in mature and germinating pollens between those two plant systems.

Enriched biological processes in late pollen-preferred genes compared between rice and Arabidopsis
Through our comparative genome and expression analyses, we determined that approximately 20% of late pollen-preferred genes are conserved between rice and Arabidopsis in terms of their sequence homology and expression patterns. We performed MapMan analysis for rice divergent genes without ortholog(s) in Arabidopsis as well as common genes conserved between the two. Several MapMan terms did not overlap between the divergent and common genes. Moreover, MapMan terms for cell wall modification and major carbohydrate metabolism were found only in the rice divergent genes (Fig. 5).

Genetic study of T-DNA insertional mutants for a male-gene transfer defective (MTD1) gene involved in late pollen development
To identify the functional significance of our candidates, we analyzed T-DNA insertion lines for 28 genes Ryu et al. 2004;Jeong et al. 2006). Until now, a mutant line was identified because of distortions in segregation ratios, which were close to 1:1:0 (wild type:heterozygote:homozygote) (Fig. 6a). The gene that causes those distortions in corresponding mutants was named Male-gene Transfer Defective (MTD). We then called the first gene MTD1. MTD1/LOC_Os02g09450 encodes glycerophosphoryl diester phosphodiesterase (GPD), with 767 amino acids. It has nine exons and eight introns (Fig. 6b). We also identified a T-DNA insertional mutant of this gene (mtd1-1), which has a T-DNA insertion in the sixth exon (Fig. 6b).
To clarify how MTD1 participates in gene-transfer through the male gametophyte, we performed reciprocal crosses between heterozygotes and the wild type (WT). From this, we determined that mtd1-1 mutants have a defect in gene-transfer through the male gametophyte (Table 2). That is, a cross between the male (pollen) from an MTD1/mtd1-1 heterozygous plant and the female (ovary) of the WT shows a complete defect (0/35) in genetransfer through the male gametophyte. In contrast, a cross between the female from an MTD1/mtd1 heterozygous plant and a WT male leads to successful genetransfer through the female gametophyte. To examine any morphological defects, we observed mature grains under a bright field microscope but did not found any differences between WT and mtd1-1 mutant pollen.
To confirm the phenotype of mtd1 and understand the function of MTD1, we utilized the CRISPR/Cas9 system, selecting the third exon as a target region of the guide RNA (Fig. 6b). From sequence analysis of the primary transgenic lines, we obtained a homozygote with a 1-bp deletion in the MTD1 coding sequence. That deletion mutant was named mtd1-2. Because we already knew that mtd1-1 has a defect in gene-transfer through the male gametophyte, we checked the mature pollen status of mtd1-2 by staining with auramine O, calcofluor white, and Lugol's iodine (Moon et al. 2013). No differences between WT and mutant pollen were observed with regard to the accumulation of exine (stained by auramine O), intine (calcofluor white), or starch (Lugol's iodine) (Fig. 6c). This meant that MTD1 has no essential role in the production of mature pollen. Therefore, we speculated that this gene might function at the progamic phase, i.e., during pollen germination, pollen tube elongation, and commutation with female organs. We then tested in vitro pollen germination of mtd1-2/mtd1-2. Whereas the WT pollen had elongated tubes, the mtd1-2 pollen exhibited only a small protrusion of those tubes (Fig. 6c). Because of this defect, homozygotic plants of mtd1-2 were sterile.
We also found that there are 13 genes encoding GPD in rice, and nine of them are clustered more closely with MTD1 (Fig. 6d). Of these, MTD1 was predominantly expressed in late pollen stage, which supported our supposition that it has a primary role in this developmental process, as revealed in the loss-of-function mutants (Fig. 6d).

In silico validation of late pollen-preferred genes in various rice cultivars
We delved into the large collection of microarray data focused on anther and pollen development in rice. Having identified candidate genes with late pollen-preferred expression that are conserved among japonica and indica rice enabled us to investigate relevant developmental processes across a more diverse range of varieties than would be possible from examinations with only single variety. We have recently reported that at least 10% of genes showing light-responsive expression patterns differ among japonica ('Nipponbare' , 'TP309' , and 'LG') and indica ('IR24') cultivars (Jung et al. 2008). By analyzing their late pollen-preferred genes, we found that at least 17. 7% of the top 300 in each cultivar do not overlap. This result proved the relevance of our candidate genes for studying late pollen development in japonica and indica rice. In addition, we determined that late pollen-preferred genes that were not conserved among rice types would be good targets for investigating late pollen development within each cultivar.

Hormone metabolism and late pollen-preferred genes
Our MapMan analysis revealed 12 genes within the category of hormone metabolismfive for ethylene synthesis, three for auxin, and two for GA (Additional file 1: Table S6). All of these hormones are involved in late pollen development. For example, in Petunia hybrida, the level of 1-aminocyclopropane-1-carboxylic acid (ACC) in anthers is very low until the day before anthesis, when it then increases by up to 100-fold (Lindstrom et al. 1999), possibly as a result of activity by pPHACS2, an ACC synthase. Research with the ethylene-insensitive Fig. 5 Enriched metabolic pathways in rice late pollen-preferred genes compared to those of Arabidopsis. Red squares, genes unique to rice; green squares, conserved genes in terms of their sequence homology and expression patterns between 2 plant systems mutant Never ripe (Nr) has shown that ethylene plays a significant role in thermotolerance by regulating the level of sucrose in tomato (Solanum lycopersicum) (Firon et al. 2012). Auxin is also necessary for pollen maturation. For example, the auxin biosynthesis-defective yuc2 yuc6 double mutant cannot produce grains (Cheng et al. 2006). The importance of auxin perception has also been demonstrated in a loss-of-function study with auxin receptor genes. There, the pollen grains from tir1 and afb1afb2afb3 multiple mutants age prematurely (Cecchetti et al. 2008). Gibberellic acid also has an essential role; GA-deficient mutants are mainly defective in pollen germination and elongation, while GA-insensitive mutants are mainly defective in pollen development (Chhun et al. 2007). LOC_Os02g17780 encodes entcopalyl diphosphate synthase1 (OsCPS1) while LOC_ Os06g02019 encodes ent-kaurenoic acid oxidase (OsKAO). Both enzymes are active in GA biosynthesis (Sakamoto et al. 2004), and mutants of both genes have significantly reduced frequencies of transmission through the male gametophytes, indicating that pollen germination and elongation essentially depend upon the de novo synthesis of GA in rice (Chhun et al. 2007). In addition, Fig. 6 Estimation of functional dominancy by MTD genes through phylogenomic analysis (a) segregation ratios of heterozygotes and wild-type segregants in mutants of MTD1 by T-DNA insertion, and results of Chi-square test (χ 2 ). b Schematic view of MTD1 structure and mutated position by T-DNA insertions and deletion generated by CRISPR/Cas9. Grey boxes indicate exon; lines, UTR or intron; reverse triangles, T-DNA; red box, target region of CRISPR/Cas9. Genomic DNA sequences from wild type and gtd1-2 are shown. c Phenotypes of pollen grains from wild type and mtd1-2. Germinated pollen and grains stained with Lugol's iodine, auramine O, or calcofluor white were observed under bright field and fluorescence microscopes. Bar = 20 μm. Panicles at filling stage of wild type and mtd1-2 were observed. d Expression patterns according to developmental stages of pollen were analyzed using 22 microarray data from indica rice (5 stages for anthers, 3 for pollen) and 42 microarray data from japonica rice (8 stages for anthers, 5 for pollen). Anatomical meta-expression data from ROAD were used to check expression patterns in other tissues. ACF, archesporial cell-forming stage; BG, bi-cellular gametophyte stage; Fl, flowering stage; GP, germinating pollen; Me, meiotic stage; Me1, meiotic leptotene stage; Me2, meiotic zygotene-pachytene stage; Me3, meiotic diplotene-tetrad stage; MP, mature pollen stage; PMe, pre-meiosis; TG, tri-cellular pollen stage; UG, uni-cellular gametophyte stage. Yellow color in heatmap indicates high level of expression; dark-blue, low expression  (Chhun et al. 2007;Hirano et al. 2008). We also checked the expression patterns of other genes involved in GA biosynthesis and signalling pathways by using our meta-expression data with OsCPS1 and OsKAO (Additional file 2: Fig. S5). Among them, GA3ox1, GA20ox3, and KO2 also showed late pollen-preferred expression patterns. Application of GA suppresses low temperature-induced male sterility (Sakata et al. 2014). Because pollen development is a complicated process accomplished by both microspores and the tapetum, various hormones, e.g., ethylene, auxin, and GA, might be involved in this process. Further experiments should focus on how hormone synthesis and signaling participate in the functioning of late pollen-preferred genes identified here.

Major metabolic processes of cell wall organization during late pollen development in rice and Arabidopsis
During the late stages of pollen development in rice, cell wall organization is the most common MapMan term. Similar results have been reported for Arabidopsis pollenpreferred genes. One of the most distinctive structural features of a pollen grain is its double wall layerintine and exine (Vizcay- Barrena and Wilson, 2006). While the tapetum plays a pivotal role in exine formation, intine synthesis is largely dependent upon activity by the microspore (Nakamura et al. 2010;Yeung et al. 2011). The inner cell wall, intine, is simply composed of cellulose, pectin, and various other proteins (Noher de Halac et al., 2003). During pollen germination, the pollen tubes produce a new wall with two layers, the inner callosic layer and the outer layer, which again contains mainly cellulose, pectin, and other proteins (Donaldson and Knox, 2012). Among the late pollen-preferred genes found in our examination, 45 from Arabidopsis and 48 from rice are related with cell wall synthesis, proving their conserved roles in both species (Additional file 1: Table S10). Our list shows that five genes from Arabidopsis and four from rice are related to cellulose synthesis. Mutations in any of three unique types of components within the cellulose synthase (CESA) complex -CESA1, CESA3, or CESA6result in the deformation of pollen grains owing to a deficiency in cellulose (Persson et al. 2007). Moreover, two Arabidopsis cellulose synthase-like D (CSLD) genes that are preferentially expressed in late pollen -CSLD1 and CSLD4have functions in pollen tube growth (Wang et al. 2011). Proteins from both are located in the Golgi apparatus and are transported to the plasma membrane in the tip region of the elongating tube, where cellulose is actively synthesized (Wang et al. 2011). In csld1 and csld4 mutants, the pollen tube wall is disorganized and exhibits less cellulose deposition. All of those observations suggest that the roles of rice homologs are conserved in their late pollen-preferred expression.
Among the genes investigated here, nine from Arabidopsis and five from rice are involved in the synthesis of cell wall proteins. In Arabidopsis, pollenpreferred Arabinogalactan glycoprotein (AGP)6 and AGP11 encode arabinogalactan glycoproteins associated with the cell wall. More than 50% of the grains from agp6 agp11 are collapsed while the remainder display reduced germination and elongation (Coimbra et al. 2009;Coimbra et al. 2010). Brassica campestris male fertility 8, another pollen-specific AGP gene, functions in maintaining normal intine formation (Lin et al. 2014). One fasciclin-like AGP (FLA) gene, FLA3, is also involved in the development of intine walls (Li et al. 2010). All of this is evidence of conserved, pollen-preferred roles by rice homologs.
The grains of Arabidopsis and rice have 13 and 11 pollen-preferred PMEs, respectively. Among the 60 PME genes in the Arabidopsis ATH1 genome array, 18 are expressed in the pollen (Bosch et al. 2005). In many species, including tobacco (Nicotiana tabacum) and Arabidopsis, the apical end of the pollen tube is one layer composed mainly of pectin (Leroux et al. 2015). Pectins are synthesized in the Golgi apparatus and deposited as highly methyl-esterified polymers (Leroux et al. 2015). The PMEs de-esterify pectins and their activity can lead to cell-wall loosening or stiffening, depending on the apoplastic pH (Tian et al. 2006). Moreover, PME activity in pollen from a vangard1 mutation is only 82% of that in the WT, and grains of the former have reduced tube growth (Jiang et al. 2005). Another pollen-specific PME, AtPPME1, is associated with slower pollen growth and irregularly shaped tubes (Tian et al. 2006). Rapid pollen tube growth after pollination might require strong activity by PME enzymes. In fact, such growth is completed within 50 min for rice while pollen tube discharge occurs in 4. 0 to 9.5 h for Arabidopsis (Faure et al. 2002;Chen et al. 2008). Therefore, for both species, numerous genes related to cell wall organization and modification must be expressed in the mature grains in order to support rapid tube growth.
The biochemical function of MTD1 in plant pollen development has not yet been determined. One such gene from Arabidopsis, SHAVEN3 (SHV3), is required for root hair elongation (Parker et al. 2000;Jones et al. 2006). Furthermore, SHV3 and its paralogs are important factors for primary cell wall organization (Hayashi et al. 2008). Both SHV3-like 4 (SVL4) and SVL5 are specifically expressed in Arabidopsis pollen (Lalanne et al. 2004), suggesting conserved roles in pollen development for MTD1 in rice and SVL4 and SVL5 in Arabidopsis. More detailed functions should be analyzed in future molecular and biochemical studies.
Cell wall modifications and metabolism of major carbohydrates for late pollen development are more important in rice than in Arabidopsis Among the 16 genes related to cell wall modifications, six encode expansin proteins that are known to have cell-wall loosening activity. Although three expansin genes show a pattern of late pollen-preferred expression in Arabidopsis (Mollet et al. 2013), the six in rice are not homologous to those in Arabidopsis. Research with maize has indicated that pollen extracts have expansin-like activity that is unique to the cell walls of grass species but not eudicots (Cosgrove et al. 1997). Expansin facilitates pollen tube penetration through the stigma and style (Tabuchi et al. 2011). Maize pollen β-expansin preferentially binds to xylans and also solubilizes feruloylated arabinoxylan from cell walls in grasses (Tabuchi et al. 2011). Arabinoxylan is enriched in those grass cell walls (Sampedro et al. 2015). Thus, the arabinoxylan-cellulose interaction might be a target of grass pollen β-expansins (Sampedro et al. 2015). These distinct differences in structure between grass and dicot pollen cell walls can explain why MapMan terms related to cell wall modifications are more enriched in rice divergent genes than in those of Arabidopsis (Cho and Kende, 1997;Sampedro et al. 2015).
In Arabidopsis pollen, the development of starch grains begins at the vacuolated microspore stage, and grains are highly accumulated at the bi-cellular stage. At pollen maturity, only a few starch grains remain in the Arabidopsis plastid (Kuang and Musgrave, 1996;Tang et al. 2009). This contrasts with the features of rice pollen, where starch granules are abundant at the mature stage (Zhang et al. 2010). The higher starch content in mature rice pollen might explain why genes related to major carbohydrate metabolism are enriched in rice. Among the genes with MapMan terms related to carbohydrate metabolism that occur only in rice divergent genes, two are involved in the starch synthesis pathway while five genes are part of the pathway for starch degradation. We found one starch synthase and one ADP-glucose pyrophosphorylase (AGPase) in the starch synthesis pathway (Additional file 1: Table  S5). Among them, OsAGPL4 encodes AGPase and is involved in starch accumulation during pollen development. The osagl4 mutant pollen exhibits a starch deficiency in pollen grains that causes defects in gene-transfer through the male gametophyte . Three amylase and two invertase genes participate in the starch degradation pathway. Because rapid growth by pollen tubes is a highly energy-consuming process, storage materials within pollen grains are mobilized for fuel (Goetz et al. 2017). Amylase catalyzes the hydrolysis of starch into sugar and invertase participates in the hydrolysis of sucrose into glucose and fructose. Regarding these genes, one starch degradation-defective mutant, tomato alpha-Glucan water dikinase (LeGWD)/legwd, interrupts gene-transfer through the male gametophyte (Nashilevitz et al. 2009). These differences between rice and Arabidopsis in their storage reserves for late pollen might explain why genes related to the metabolism of major carbohydrates are enriched in rice divergent genes, as coupled with late pollen development.

Conclusion
By comparing the publicly available transcriptomes of sporophytes and male gametes throughout their development, we identified 627 late pollen-preferred genes that are conserved among japonica and indica cultivars of rice. We then performed global analyses of these candidate genes to study the processes of late pollen development, including maturation and germination. Functional classification analysis with a MapMan tool kit revealed that cell wall organization and metabolism are significantly associated with that stage. Comparative analysis of late pollenpreferred genes from rice and Arabidopsis demonstrated that those involved in cell wall modifications and major carbohydrate metabolisms are more important in rice. To evaluate the significance of our candidate genes, we used T-DNA insertional mutant population and the CRISPR/ Cas9 system to determine the function of a rice gene encoding GPD protein by identifying defects in transfer through the male gamete. These candidates might be useful targets for future examinations of late pollen development in rice. Our results provide new tools to understand late pollen development.

Collection of microarray data and identification of mature pollen-preferred genes
We used 64 publicly available rice Affymetrix microarray data prepared from anthers and pollen in NCBI GEO to identify late pollen-preferred genes. For the comparative transcriptome analyses between rice and Arabidopsis, we downloaded Arabidopsis Affymetrix microarray data series GSE5630, GSE5631, GSE5632, GSE5633, GSE6162, GSE6696, GSE12316, GSE17343, and GSE27281. To examine these data, we used the Affy package encoded by R language to normalize the signal intensity and then transformed them to log 2 values. The normalized data with averaged Affymetrix anatomical meta-expression data were then used for further investigations, e.g., KMC analysis, heatmap construction, and identification of the late pollen-preferred genes (Chandran et al. 2016).

MapMan analysis
The MapMan program allows one to group genes into different functional categories and visualize data through various diagrams . To obtain their functional classifications, we uploaded RGAP locus IDs for 627 rice late pollen-preferred genes to the MapMan tool kit. We then investigated the metabolism and regulation overviews based on diverse overviews installed in that kit (Fig. 4). All data are detailed in Table S5 (Additional file 1).

Histochemical GUS assay and microscopic analyses
Histochemical GUS-staining was performed as described by Jefferson et al. (1987) and Dai et al. (1996). The assayed flowers and pollen grains were photographed with an Olympus BX61 microscope (Olympus, Tokyo, Japan).

RNA isolation and RT-PCR analysis
Total RNAs were extracted with Tri Reagent (MRC Inc., Cincinnati, OH, USA). Complementary DNA (cDNA) was synthesized as described previously (Lee and An, 2015). To evaluate the expression patterns of eight pollenpreferred genes that showed GUS activity, we prepared samples from the shoots and roots of 7-day-old seedlings, mature leaves, young panicles (2 cm), mature flowers, developing seeds at 3 d after pollination, and anthers at four different developmental stages. All primers for RT-PCR are listed in Table S11 (Additional file 1). The ubiquitin 5 gene (Ubi5) was used as an internal control.

Rice T-DNA mutant screening
We searched for lines from our T-DNA insertional mutant population that had insertions within our candidate late pollen-preferred genes (Jeon et al. 2000;An et al. 2003). For MTD1, one T-DNA insertional line (mtd1-1) was selected. Seedlings were grown in a half-strength Murashige and Skoog (MS) medium. Their DNA was prepared from 7-day-old plants via the hexadecylotrimethylammonium method. Genotypes were determined by PCR using gene-specific primers and T-DNA primers (Additional file 1: Table S11). The T-DNA insertional lines without homozygous progenies or with rare homologous progenies were further genotyped to check for segregation distortion.

Construction of the CRISPR/Cas9 vector to produce a homozygous mutant
To generate the single guide RNA (sgRNA) and Cas9 expression vector, we synthesized one set of oligomers targeting the third exon of MTD1 and inserted them into the Bsa I sites of the RGEB32 binary vector (Addgene plamid ID: 63142). Ligation products were transformed into Escherichia coli. The RGEB32 vector containing the sgRNA and Cas9 expression cassette was transformed into Agrobacterium tumefaciens LBA4404.

Cytochemical analysis
Pollen grains were immersed in various staining solutions. Grains stained for 15 min with 0.1% calcofluor white were then monitored for the presence of intine under a UV light with an Olympus BX61 microscope. Grains stained with 0.001% auramine O in 17% sucrose were examined for exine, using the FITC channel of the Olympus BX61 microscope. Staining with Lugol's iodine was used to detect the presence of starch.

Additional files
Additional file 1: Table S1. Six series of microarray data comprising 64 slides (GPL2025) associated with anthers/pollen in rice. Table S2. Locus IDs and putative functions of late pollen-preferred genes from rice. Table S3. Locus IDs and promoter regions of genes used for promoter analysis with GUS reporter. Table S4. Classification of GO terms for biological processes associated with late pollen-preferred genes. Table S5. MapMan classification of late pollen-preferred genes. Table S6. Genes related to hormone metabolism term in MapMan. Table S7. Late pollen-preferred genes in Arabidopsis. Table S8. Assignment of rice orthologs to Arabidopsis late pollen-preferred genes. Locus numbers are shown in by red. Table S9. Assignment of Arabidopsis orthologs to rice late pollen-preferred genes. Locus numbers are shown in red. Table S10. MapMan terms related to cell wall organization and modifications in rice and Arabidopsis. Table S11. Primer sequences used in genotyping and real-time PCR. (DOCX 1980 kb) Additional file 2: Figure S1. a. Order of developmental stages for anatomical samples. ACF, formation of archesporial cells; BG, bi-cellular gametophyte; Fl, flowering; Me, meiosis; Me1, meiotic leptotene; Me2, meiotic zygotene-pachytene; Me3, meiotic diplotene-tetrad; MP, mature pollen; PMe, pre-meiosis; GP, germinating pollen; TG, tri-cellular pollen; UG, uni-cellular gametophyte. Red bar, sample containing late pollen. b. Expression graph of 36 clusters after KMC analysis with 57,382 probes. Clusters 2 and 35 exhibited mature pollen-preferential patterns of expression and are marked with red boxes. Figure S2. Schematic representation of 3 promoter trap lines for T-DNA insertions. a. T-DNA was inserted into 17th intron of SacI homology domain-containing protein (LOC_Os11g20384) in Line 1A-13,819 (mtd1-1). b. Line 3A-05916 has T-DNA insertion in B12D protein (LOC_Os07g17310). BL, left T-DNA border; RB, right T-DNA border; Gray boxes, exons; lines, introns. Figure S3. Expression graph after KMC analysis of meta-expression data from Arabidopsis. Clusters marked with red box showed late pollen-preferred patterns. Figure S4. Heatmap for expression profiles of late pollen-preferred genes in Arabidopsis. Figure