Skip to main content

Genome- and Transcriptome-wide Association Studies to Discover Candidate Genes for Diverse Root Phenotypes in Cultivated Rice


Root system architecture plays a crucial role in nutrient and water absorption during rice production. Genetic improvement of the rice root system requires elucidating its genetic control. Genome-wide association studies (GWASs) have identified genomic regions responsible for rice root phenotypes. However, candidate gene prioritization around the peak region often suffers from low statistical power and resolution. Transcriptomics enables other statistical mappings, such as transcriptome-wide association study (TWAS) and expression GWAS (eGWAS), which improve candidate gene identification by leveraging the natural variation of the expression profiles. To explore the genes responsible for root phenotypes, we conducted GWAS, TWAS, and eGWAS for 12 root phenotypes in 57 rice accessions using 427,751 single nucleotide polymorphisms (SNPs) and the expression profiles of 16,901 genes expressed in the roots. The GWAS identified three significant peaks, of which the most significant peak responsible for seven root phenotypes (crown root length, crown root surface area, number of crown root tips, lateral root length, lateral root surface area, lateral root volume, and number of lateral root tips) was detected at 6,199,732 bp on chromosome 8. In the most significant GWAS peak region, OsENT1 was prioritized as the most plausible candidate gene because its expression profile was strongly negatively correlated with the seven root phenotypes. In addition to OsENT1, OsEXPA31, OsSPL14, OsDEP1, and OsDEC1 were identified as candidate genes responsible for root phenotypes using TWAS. Furthermore, a cis-eGWAS peak SNP was detected for OsDjA6, which showed the eighth strongest association with lateral root volume in the TWAS. The cis-eGWAS peak SNP for OsDjA6 was in strong linkage disequilibrium (LD) with a GWAS peak SNP on the same chromosome for lateral root volume and in perfect LD with another SNP variant in a putative cis-element at the 518 bp upstream of the gene. These candidate genes provide new insights into the molecular breeding of root system architecture.


Rice (Oryza sativa) is an indispensable staple crop for sustainable food production in many Asian countries (Muthayya et al. 2014). Since roots play a crucial role in absorbing nutrients and water from the soil, root system architecture needs to be optimized to improve the yield, particularly in unfavorable environments (Gowda et al. 2011; Ahmadi et al. 2014). To genetically improve the root system architecture, the causal genes for root-related phenotypes should be identified and characterized. Although mutant analysis has identified several root-related genes in rice (Mai et al. 2014; Meng et al. 2019), statistical mapping is also a powerful approach for discovering beneficial alleles. Quantitative trait loci (QTL) mapping leverages natural variations to discover beneficial genes or alleles for molecular breeding, typically using a biparental population developed from two accessions with distinct traits. Some QTL for root phenotypes such as root growth angle, root thickness, and root length have been identified (Uga et al. 2011, 2012; Kitomi et al. 2015; Lou et al. 2015; Li et al. 2015), and two of them have been cloned (Uga et al. 2013; Kitomi et al. 2020). Although the genes and alleles identified in these studies have enhanced the molecular breeding of root phenotypes, the QTL mapping requires labor-intensive and time-consuming crossing, phenotyping, and genotyping, starting with biparental population development and ending with map-based cloning.

A genome-wide association study (GWAS) can identify genomic regions for the phenotype of interest using a diversity panel sequenced and genotyped for single nucleotide polymorphisms (SNPs) or insertion/deletion variants (indels), which is less laborious and time-consuming than creating a biparental population for QTL mapping. Beginning with pioneering studies using approximately 200 rice accessions with approximately 30,000 markers for root phenotypes measured in hydroponic cultivation systems (Clark et al. 2013; Courtois et al. 2013), GWAS have been conducted for various root phenotypes under different conditions and at different growth stages (Biscarini et al. 2016; Phung et al. 2016; Bettembourg et al. 2017; Wang et al. 2018a; Zhao et al. 2018, 2021a, b; To et al. 2019; Xu et al. 2020; Zhang et al. 2020; Anandan et al. 2022; Teramoto et al. 2022; Xiang et al. 2022; Hanlon et al. 2023). However, it is difficult to identify small-effect loci for root phenotypes based only on GWAS because the sample size is often limited as the evaluation of root phenotypes requires digging up the root from the soil, which has extremely low throughput.

The transcriptome, an intermediate phenotype that reflects complex genetic responses to the ambient environment, as well as genetic variation among accessions, presents a new opportunity to reveal the genetic control of terminal and/or fitness-related phenotypes (Kremling et al. 2018, 2019; Groen et al. 2020, 2022). Using the same statistical model as the GWAS, we can statistically test the association between the expression profile of each gene and the phenotypic value (transcriptome-wide association study, TWAS). TWAS has an advantage over GWAS in terms of statistical resolution for identifying candidate genes because linkage disequilibrium (LD) does not affect expression profile (Kremling et al. 2019; Li et al. 2021a). Owing to affordable RNA-seq technologies, TWAS has become a popular statistical mapping approach for various crop species such as maize (Hirsch et al. 2014; Lin et al. 2017; Kremling et al. 2019; Hershberger et al. 2022; Wu et al. 2022), sorghum (Ferguson et al. 2021; Pignon et al. 2021), soybean (Li et al. 2021a), rapeseed (Harper et al. 2012; Lu et al. 2014; Tang et al. 2021), and rice (Zhang et al. 2018; Liu et al. 2022). Although most studies have focused on the phenotype and expression profiles quantified in shoots or seeds, the TWAS has also been applied to rice root phenotypes. Lou et al. (2017) performed TWAS using 40,122 transcripts quantified in the roots of 37 rice accessions and identified several genes related to energy metabolism, production, and consumption that shape the deep or shallow root system architecture. In addition to TWAS, differential expression and gene ontology (GO) enrichment analyses have been applied to root transcriptome data to elucidate the molecular mechanisms of rice root system architecture (Takehisa et al. 2012; Kawakatsu et al. 2021). The results of these studies encourage using transcriptome data to accelerate candidate gene searches responsible for root phenotypes.

The expression profile of the candidate genes from the TWAS can further be statistically associated with genome-wide DNA polymorphisms, which helps us combine the GWAS and TWAS results. Considering the expression profile as the response variable of GWAS or QTL mapping, it is possible to identify a genomic region that regulates the expression profile of the gene of interest. This statistical mapping is called expression GWAS (eGWAS) or expression QTL (eQTL) mapping, and has been used to reveal the genomic basis of transcriptome variations in rice (Wang et al. 2010, 2014a; Horiuchi et al. 2015; Kuroha et al. 2017; Campbell et al. 2020; Kashima et al. 2021; Liu et al. 2022). Significant variants in eGWAS can be classified into cis or trans-effects according to the physical and/or genetic distance to the gene tested in the eGWAS (Wittkopp et al. 2004; Kliebenstein 2009). Interestingly, the cis-effects tend to be stronger than the trans-effects in several species, such as rice (Wang et al. 2010, 2014a; Liu et al. 2022), maize (Wang et al. 2018c), and lettuce (Zhang et al. 2017). Several studies have integrated eGWAS with GWAS and TWAS to explore candidate variants for the phenotype of interest by investigating the colocalization (overlap) between GWAS and cis-eGWAS peaks (Liu et al. 2022; Wu et al. 2022).

Taken together, comprehensive statistical mapping using both genome and transcriptome data is a promising approach for identifying candidate genes responsible for root phenotypes. As a compact set to efficiently investigate the high genetic diversity of rice, the World Rice Core Collection (WRC) was developed and recently resequenced (Kojima et al. 2005; Tanaka et al. 2020). Our previous study quantified the expression profiles and root phenotypes of 57 accessions of the WRC, showing subpopulation-specific stress response mechanisms (Kawakatsu et al. 2021). In this study, we applied GWAS and TWAS to the 12 root phenotypes in these 57 rice accessions to identify novel candidate genes responsible for the natural variation of root system architecture in rice using the available genome, transcriptome, and root phenotype datasets from previous studies (Tanaka et al. 2020; Kawakatsu et al. 2021). Furthermore, eGWAS was applied to the candidate genes from the TWAS, and the eGWAS and GWAS peaks were compared to identify a variant related to the expression profile of the candidate genes responsible for root phenotypes. We identified six candidate genes responsible for the root phenotypes using three statistical mappings.

Materials and Methods

Plant Materials and Field Trial for Phenotyping and Sampling

All statistical analyses were performed on the phenotypic values quantified by Kawakatsu et al. (2021) without any additional mathematical calculations (such as no transformation was applied for any phenotype). Therefore, we provide a brief overview of plant materials, field experiments, and phenotyping methods.

In total, three replicates of 57 accessions from the WRC (Kojima et al. 2005) were evaluated in an upland field at the Institute of Crop Science (National Agriculture and Food Research Organization, Ibaraki, Japan; 36.0289 °N, 140.0997 °E) from June 5 to August 1, 2018. The ratio of deep rooting (RDR) was quantified using plastic mesh baskets by calculating the ratio of the number of crown roots penetrating the lower part of the mesh (53°–90° to the horizontal) to the total number of crown roots penetrating the entire mesh (Uga et al. 2009). Using the WINRIZO Pro 2017a software (Regent Instruments, Quebec, Canada), root length (RL), root surface area (RSA), root volume (RV), root diameter (RD), and the number of root tips (NRT) for both crown roots (> 0.2 mm diameter roots, represented by ‘_C’ suffix in the abbreviation) and lateral roots (< 0.2 mm diameter roots, represented by ‘_L’ suffix in the abbreviation) were measured from the root samples collected from the soil using the backhoe-assisted monolith method (Teramoto et al., 2019). Additionally, root dry weight (RDW) of samples dried at 80 °C for three days was measured. Further details of the field experiments and measurement methods are described in Kawakatsu et al. (2021) and Teramoto et al. (2019).

Transcriptome Data Processing

Total RNA was extracted from the crown roots of three plants per accession from the same field experiment using the HighGI method (Yoshino et al. 2020), and equal amounts of three RNA samples extracted from the same accession were pooled before RNA-seq library preparation. RNA-seq libraries were sequenced on a single lane of S4 flow cells with paired-end 150-bp and unique dual index reads using Illumina NovaSeq6000 at Macrogen, Japan. Reads were mapped to the IRGSP-1.0 genome assembly and MSU7 gene models using STAR aligner (Dobin et al. 2013). Uniquely mapped read counts were quantified using featureCounts version 1.6.4 (Liao et al. 2014). This pipeline yielded a read count matrix of 55,986 genes for 61 accessions. Further details of RNA extraction, sequencing, read mapping, and read quantification methods have been described previously (Kawakatsu et al. 2021).

From the read count matrix, fragments per kilobase of exon per million read (FPKM) values were calculated based on the trimmed mean of M value normalization in the {edgeR} package version 3.38.1 (Robinson et al. 2010). As reported previously (Kawakatsu et al. 2021), the log2(FPKM + 1) value was calculated and defined as the expression profile. We defined the gene as not expressed if the expression profile was lower than 1. After calculating the expression profiles, four non-WRC lines were excluded from the dataset. Finally, we excluded 39,085 genes not expressed in more than 50% WRC accessions. This generated an expression profile matrix of 16,901 genes for the 57 WRC accessions.

Genotype Data Processing

The WRC accessions were sequenced as described previously (Tanaka et al. 2020). The paired-end reads were mapped against Os-Nipponbare-Reference-IRGSP-1.0 (Kawahara et al. 2013) pseudomolecules using the bwa mem (Li and Durbin 2009), and the duplicates were removed using Picard MarkDuplicates ( Using the GATK Best Practices for germline SNP/indel discovery (Van del Auwera et al. 2013), 2,805,329 SNPs and 357,639 indels were obtained from all 69 WRC accessions after variant calling and filtering as described by Tanaka et al. (2020). In this study, indels were removed from the association analyses for simplicity.

Considering the small sample size (n = 57) for the GWAS, the statistical power to identify an association between allelic and phenotypic variations was expected to be low, particularly for SNPs with low minor allele frequency (MAF). Therefore, we applied stringent MAF-based filtering to the 57 accessions to retain SNPs with MAF > 10% using VCFtools version 0.1.16 (Danecek et al. 2011), which removed 526,700 SNPs and retained 2,278,629 SNPs. We applied an LD-based SNP pruning method using PLINK version 1.9 (Purcell et al. 2007; Chang et al. 2015) to remove highly collinear SNPs (pairwise LD; r2 > 0.99) within 100 kb by setting the step size of the sliding window to 100 variants. This pipeline generated a SNP genotype dataset comprising 427,751 SNPs for 57 WRC accessions.


GWAS analyzes the strength of the statistical relationship between SNPs and phenotypic values to identify the phenotype-associated genomic regions. GWAS was performed for each root phenotype using 427,751 SNPs in the 57 WRC accessions based on the mixed linear model using the GWAS function in the {rrBLUP} package (Endelman 2011). The genomic relationship matrix was calculated on the same SNP set using VanRaden’s first formula (VanRaden 2008) using the A.mat function in the {rrBLUP} package. As the 57 accessions were divided into four or six subpopulations in Kawakatsu et al. (2021), we calculated the Bayesian information criterion (BIC) for the following three inclusion or exclusion models of the subpopulation as fixed covariates: (i) without subpopulation; (ii) with four subpopulations of admixed (n = 7), aus (n = 19), indica (n = 21), and japonica (n = 10); and (iii) with six subpopulations by dividing the 10 japonica accessions into admixed-japonica (n = 3), temperate-japonica (n = 3), and tropical- japonica (n = 4). The BIC value in the mixed model was computed based on the likelihood described by Kang et al. (2008), implemented in our in-house R script. The model with the lowest BIC value was selected as the optimal statistical model for each phenotype (Additional File 1: Table S1). Manhattan and quantile-quantile (QQ) plots were drawn by the {qqman} package (Turner 2018), and the false discovery rate-adjusted (FDR-adjusted) P-values were calculated by the p.adjust function using the “fdr” option.

To define the peak loci, we considered the physical distance and LD among the significant SNPs for each root phenotype. First, SNPs with FDR-adjusted P-value < 0.10 were defined to be significantly associated with the phenotype. Then, all significant SNP pairs within 100 kb showing pairwise LD (r2) > 0.50 were merged as a single peak locus, assuming that those SNPs were likely to be in LD with the same causal variant.

LD analyses were performed at both the regional and genome-wide scales to define a reasonable genomic region to search for a plausible candidate gene responsible for each GWAS peak. First, the r2 statistic was calculated for all SNP pairs within 500 kb of the peak SNP and visualized as a heatmap using the {LDheatmap} package (Shin et al. 2006). We also calculated genome-wide LD decay using the default method of PopLDdecay software (Zhang et al. 2019) on all 57 WRC accessions and each subpopulation. The LD-pruned 427,751 SNPs were used for the former regional LD analysis because showing too many SNPs in a heatmap is computationally difficult; meanwhile, the 526,700 SNPs before applying LD-pruning (after the MAF-based filtering on the 57 WRC accessions) were used for genome-wide LD decay analysis. According to the results of the LD diagnoses, the search interval for the candidate gene in the GWAS was set to ± 250 kb from the peak SNP (details are shown in the Results section). We used RAP-DB (Sakai et al. 2013) to obtain annotation information (version: IRGSP-1.0, 2022-09-01) for the genes in the search interval.

To further prioritize the candidate genes from the search interval, we tested the statistical dependence between the expression profile of the candidate genes in root samples and the genotype of the GWAS peak SNP using an analysis of variance (ANOVA). For each GWAS peak locus, the expression profile of each gene in the ± 250 kb searching interval was used as the response variable if the gene was expressed in the root. The number of alternative alleles of the GWAS peak SNP (coded as a numerical variable assuming an additive effect) and subpopulation (coded as a four-class factor variable: japonica, indica, aus, or admixed) were included in the model as explanatory variables without considering the interaction between the two variables.

TWAS and GO Enrichment Analysis

TWAS is expected to complement the candidate gene search in GWAS by testing the statistical relationship between expression profiles and phenotypic values. We performed the TWAS using a method similar to that used in previous studies (Kremling et al. 2019; Hershberger et al. 2022; Wu et al. 2022). First, the probabilistic estimation of the expression residuals (PEER; Stegle et al. 2012) analysis was applied to the matrix of the expression profiles of the 16,901 genes for the 57 WRC accessions to reduce the hidden variation caused by experimental confounders. The number of factors in the PEER analysis was set to five based on the visual identification of the “elbow” in the diagnosis plot of the factor relevance (Additional File 2: Figure S1). The statistical model for each root phenotype in the TWAS was identical to the BIC-based optimal model used in the GWAS, by replacing the SNP genotype matrix with the matrix of the residual values from the PEER statistical model. The “P3D” option was set to TRUE in the GWAS function, as the P-values were inflated in the TWAS result if the “P3D” option was set to FALSE (Additional File 2: Figure S2). To identify candidate genes from the TWAS, we first applied the same significance threshold as in the GWAS (FDR-adjusted P-value < 0.10). Furthermore, we investigated the annotation and literature of all genes included in the top 10 strongest associations for each phenotype so as not to miss associations that did not pass our significance threshold but were stronger than the others. Additionally, the Pearson’s correlation coefficient between the expression profile and phenotypic value was calculated for the top 10 genes using all WRC accessions, as well as for each subpopulation.

Since the root system architecture is assumed to be a complex phenotype controlled by many genes, we applied GO enrichment analysis to discover the biological processes strongly related to the genetic variation of the root phenotypes in the WRC panel. For each phenotype, we first extracted the MSU IDs of genes with the top 1% positive and negative associations. As the top 1% associations for NRT, RL, RSA, and RV highly overlapped, particularly within the four phenotypes measured at the same part of the root, we took union of the top 1% genes responsible for these four crown and lateral root phenotypes (RS_C and RS_L, respectively; RS stands for root size) (details are shown in the Results section). Gene enrichment for the GO term related to a biological process was tested for the top 1% gene sets using the enricher function in the {clusterProfiler} package with its default parameters (Yu et al. 2012; Wu et al. 2021), with the 16,901 genes expressed in roots as the reference set. The GO for each transcript was obtained from RAP-DB (“IRGSP-1.0_representative_annotation_2022-09-01.tsv”). To use the ontology data assigned to each transcript in RAP-DB, MSU-ID was converted to RAP-ID based on the ID converter file in RAP-DB (“RAP-MSU_2022-09-01.txt”). If MSU-ID and RAP-ID did not have a one-to-one correspondence, the gene was removed from the enrichment analysis. For each converted RAP-ID, the GO terms were obtained from all potential transcript IDs.

eGWAS for the Candidate Genes in TWAS

To connect the results from TWAS and GWAS, we performed eGWAS for genes possessing the top 10 strongest associations with at least one root phenotype in the TWAS. The statistical method for the eGWAS, including BIC-based model selection, was identical to that used for the GWAS; however, the expression profile was considered as the response variable of the mixed model. If there was at least one significant (FDR-adjusted P-value < 0.10) SNP within 250 kb of the gene position in the eGWAS, the most significant SNP was defined as the cis-eGWAS peak SNP for the gene. As we did not detect any significant trans-eGWAS SNPs for any tested gene, we explained the method used to compare the GWAS and eGWAS results when a cis-eGWAS peak SNP was detected. We calculated the physical distance (bp) and pairwise LD (r2) between the cis-eGWAS peak SNP and the SNP with the lowest P-value on the same chromosome in the GWAS for each relevant phenotype with which the gene possessed the top 10 associations in the TWAS. If the two SNPs were closer than 250 kb and their pairwise LD was > 0.50, we considered the gene to have an overlapping peak between the eGWAS and GWAS. An overview of the analysis pipeline is provided in Additional File 2 (Figure S3).

For the genes eventually selected as overlapping candidates among the GWAS, TWAS, and eGWAS, SNPs and indels from 2 kb upstream to the end of the gene region defined by the MSU7 gene model were extracted from the polymorphic genotype dataset of 2,805,329 SNPs and 357,639 indels (Tanaka et al. 2020) to identify a potential cis-variant for the candidate genes. The SnpEff annotations of the extracted variants were investigated in the TASKE + database (Cingolani et al. 2012; Kumagai et al. 2019). The PLACE database (Higo et al. 1999) visualized in the JBrowse of RAP-DB (Sakai et al. 2013) was also explored to determine whether any of the upstream variants disrupt a promoter motif.


GWAS Identified Three Genomic Regions Responsible for Root Phenotypes

To identify the candidate genes responsible for diverse root phenotypes, we conducted a GWAS for the 12 root phenotypes in 57 WRC accessions using 427,751 SNPs. Three peak SNPs with 10% FDR responsible for 7 out of the 12 phenotypes, causing 10 significant associations between SNPs and phenotypes, were detected (Table 1; Fig. 1; Additional File 2: Figure S4). The most significant peak SNP for seven root phenotypes (RL_C, RSA_C, NRT_C, RL_L, RSA_L, RV_L, and NRT_L) was identified at 6,199,732 bp on chromosome 8, with the highest − logP value (− logP = 8.07) for RL_C. The second peak SNP found at 20,665,890 bp on the same chromosome was significantly associated only with RSA_C (− logP = 5.90; FDR-adjusted P-value = 0.09). The third peak SNP at 17,902,506 bp on chromosome 11 was significantly associated with RSA_L (− logP = 6.27; FDR-adjusted P-value = 0.06) and RV_L (− logP = 6.50; FDR-adjusted P-value = 0.06; Fig. 1B). No significant associations were identified with the remaining five root phenotypes (RV_C, RDR, RDW, RD_C, and RD_L; Additional File 2: Figure S4).

Table 1 Summary of the 10 significant associations detected in GWAS
Fig. 1
figure 1

Manhattan and quantile-quantile plot for (A) RSA_C and (B) RSA_L. In total of three peak SNPs were identified for seven root phenotypes, including the two phenotypes shown as representative results in this figure. The blue horizontal line in the Manhattan plot represents the 10% FDR cutoff

We analyzed both regional and genome-wide LD patterns to determine reasonable genomic intervals for screening the candidate gene(s) responsible for GWAS peaks. The pairwise LD (r2) within 500 kb around the peak SNP did not show any obvious LD blocks (Additional File 2: Figure S5), likely because of the high genetic diversity and small sample size of the WRC panel. When the r2 values between the peak SNP and others were visualized on a regional Manhattan plot, most of the SNPs in high and moderately high LD (r2 > 0.80 and > 0.60, respectively) with the peak SNPs were located within 60 and 250 kb of the peak SNP, respectively (Additional File 2: Figure S6). Additionally, the genome-wide LD decay was calculated, and the mean r2 value decreased to 0.23 in the entire WRC panel when the distance between SNPs was approximately 250 kb (Additional File 2: Figure S7). Based on the results of LD analyses, the search region for candidate genes was set to ± 250 kb from each peak SNP. A total of 70 (for the peak at 6,199,732 bp on chromosome 8), 61 (for the peak at 20,665,890 bp on chromosome 8), and 68 (for the peak at 17,902,506 on chromosome 11) MSU loci were detected within 250 kb of the peak SNP (Additional File 1: Table S2–S4).

We further investigated candidate genes in the search region by leveraging their expression profiles in the root. An ANOVA between allelic variation and the expression profile of candidate genes expressed in the roots was conducted (Additional File 1: Table S2S4), which revealed that the allelic variation of the most significant peak SNP was the most strongly related (P = 1.43 × 10− 13) to the expression profile of OsENT1 gene (LOC_Os08g10450), which is located from 6,142,125 to 6,144,418 bp (approximately 55 kb from the GWAS peak SNP) and encodes an equilibrative nucleoside transporter (Fig. 2C). WRC accessions with the alternative allele (thymine) at this peak SNP showed longer RL_C and lower OsENT1 expression profile than those with the Nipponbare reference type (adenine) in all subpopulations (Fig. 2A and B). Moreover, OsENT1 expression profiles and RL_C phenotypic values were strongly negatively correlated (r = − 0.67; Fig. 2D). These results suggest that OsENT1 is the most plausible candidate gene among the 70 candidates in the ± 250 kb region of the most significant GWAS peak responsible for seven root phenotypes including RL_C.

Fig. 2
figure 2

Relationship between OsENT1 and the GWAS peak SNP at 6,119,732 bp on chromosome 8. We searched for the most plausible candidate gene for the GWAS peak SNP at 6,119,732 based on the root transcriptome data. The WRC accessions with the alternative allele (thymine) at the GWAS peak SNP had (A) longer crown root length and (B) lower OsENT1 gene expression profile than those with the reference allele (adenine), respectively. (C) Histogram of the − logP values of the candidate genes around the peak SNP based on the ANOVA. (D) Scatter plot between OsENT1 gene expression profile and crown root length. The horizontal bars in the violin plots represent median value

Except for the most significant peak SNP, the ANOVA did not find any other candidate genes with strong significance and an interpretable annotation. The top ANOVA hit gene for the second peak SNP was LOC_Os08g33440 (P = 2.35 × 10− 6), which putatively encodes a protein similar to dihydrolipoamide S-acetyltransferase. Besides, OsMADS23 (LOC_Os08g33488, P = 3.00 × 10− 3), which encodes a stress-responsive MADS-box transcription factor and functions as a positive regulator in response to osmotic stress by regulating ABA biosynthesis (Li et al. 2021b), was one of the 12 significant (P < 0.05) genes for the second peak SNP. The top ANOVA hit gene for the third peak SNP was LOC_Os11g31110 (P = 2.42 × 10− 3), which was annotated as a conserved hypothetical protein.

TWAS Suggested Five Novel Associations Responsible for Root Phenotypes

TWAS can extend GWAS-based candidate gene search by testing the statistical association between phenotypic values and expression profiles instead of the SNP genotype. We applied TWAS for the 12 root phenotypes on the 16,901 genes expressed in the root and identified six significant statistical associations under the threshold of FDR-adjusted P-value < 0.10 for four root phenotypes: three genes (LOC_Os01g04630, LOC_Os03g31480, and LOC_Os03g02750) for RD_C, one gene (LOC_Os02g54580) for RV_C and RDW, and one gene (LOC_Os12g32536) for RD_L. Among the five genes, only LOC_Os03g31480 (OsEXPA31) and LOC_Os03g02750 (OsSub25) had a gene symbol in RAP-DB (Additional File 1; Table S5).

In addition to these significant associations, we focused on the genes with the top 10 strongest associations for each phenotype to identify other candidate genes. Since some genes were repeatedly detected in the top 10 associations for multiple phenotypes, 70 unique genes were involved in the 120 gene–phenotype associations (Additional File 1; Table S6). Candidate genes were screened based on their annotation information and correlation with the associated phenotype. We found that 31/70 genes had at least one gene symbol in RAP-DB, and 24/31 genes showed a moderate or strong correlation (absolute Pearson’s correlation > 0.50) with at least one phenotype in the 57 WRC accessions. Based on previous studies of these 24 genes, we selected five candidate genes possibly associated with root phenotypes (Table 2).

Table 2 Summary of the five genes selected from the top 10 associations in TWAS

OsENT1 was one of the five candidates responsible for NRT_L, which was also a candidate gene according to the GWAS. The other candidate genes according to TWAS (OsEXPA31, OsSPL14, OsDEP1, and OsDEC1) were not candidate genes according to the GWAS, but listed as promising candidates according to the results of previous studies. The first candidate gene, OsEXPA31, had the strongest association with RD_C and was one of the six significant (FDR-adjusted P-value < 0.10) candidate genes according to TWAS. Although OsEXPA31 has not been functionally characterized, other α-expansin genes have been characterized for their involvement in root phenotypes, such as primary root length or root hair elongation (Ma et al. 2013; Wang et al. 2014b; Che et al. 2016; Yu et al. 2011). The second candidate gene OsSPL14 was positively associated with RD_C in the WRC accessions (r = 0.76; −logP = 4.32) as well as in each subpopulation (r = 0.70–0.87), which was consistent with the results of a recent mutant-line-based study reporting that the crown roots thickened after increasing OsSPL14 expression in the roots (Song et al. 2022). The third candidate gene OsDEP1 has pleiotropic effects, including primary root elongation under limited phosphorus conditions (Sun et al. 2014; Zhang et al. 2015; Wang et al. 2021), and the positive association between OsDEP1 and RDR (r = 0.54; −logP = 4.08) found in this study implies an uninvestigated function of the gene for the root architecture. Lastly, OsDEC1 was negatively associated with RL_C (r = − 0.64 in the 57 WRC accessions), which seems to be in line with the negative effect of OsDEC1 on internode elongation (Gómez-Ariza et al. 2019; Nagai et al. 2020).

The GO Enrichment Analysis Highlighted Two Biological Processes Responsible for RD_C

If a biological process is related to a root phenotype, the genes involved in that biological process tend to show a strong association with the phenotype in the TWAS. Thus, we applied GO enrichment analysis to the genes with the top 1% positive and negative associations detected in TWAS to identify the biological processes related to the genetic variation of root phenotypes (Additional File 2; Figure S3). When the top 1% positive or negative associations were compared among the 12 root phenotypes, they highly overlapped among the four root-size-related phenotypes (RL, RSA, RV, and NRT) for both crown and lateral roots (Additional File 2; Figure S8). The average overlap rate was 69.5%, ranging from 31.4% (53/169 genes overlapped between the top 1% negative associations for RL_C and RV_C) to 94.1% (159/169 genes overlapped between the top 1% positive associations for RL_L and RSA_L). Therefore, we merged the top 1% associations for the four root size-related phenotypes (RS_C and RS_L for crown and lateral roots, respectively) for enrichment analysis to simplify interpretation. After combining the top 1% associations for the four phenotypes, 303 and 227 genes were positively associated with RS_C and RS_L, respectively, while 320 and 233 genes were negatively associated with RS_C and RS_L, respectively.

GO enrichment analysis identified 11 biological processes involved in the five root phenotypes (Table 3). While only one biological process related to RS_C, RS_L, and RDR, four biological processes related to RD_C and RD_L. Particularly, the top 1% genes negatively associated with RD_C were enriched in two explicable biological processes (GO:0009664 and GO:0006979). GO:0009664 was annotated to “plant-type cell wall organization” and assigned to five genes encoding α-expansin: OsEXPA3 (LOC_Os05g19570), OsEXPA9 (LOC_Os01g14660), OsEXPA18 (LOC_Os03g06040), OsEXPA19 (LOC_Os03g06050), and OsEXPA31 (LOC_Os03g31480). This enrichment is consistent with the known role of expansins in root growth by mediating cell wall loosening (Zhang et al. 2021). GO:0006979 was annotated to “response to oxidative stress” and assigned to six genes encoding peroxidases: OsPOD (LOC_Os01g19020), OsPRX42 (LOC_Os03g25280), OsPRX43 (LOC_Os03g25300), OsPRX54 (LOC_Os04g34630), OsPRX68 (LOC_Os05g04450), and OsPRX102 (LOC_Os07g31610). Although peroxidases are involved in several physiological processes throughout the plant life cycle, one of their major roles is cell wall modification and loosening by regulating the reactive oxygen species level (Passardi et al. 2004). Collectively, our GO enrichment results imply a potential physiological function of expansins and peroxidases in root diameter by regulating cell wall loosening.

Table 3 The 11 enriched biological processes in the GO enrichment analysis

eGWAS Discovered Two Overlaps between GWAS and TWAS

We applied eGWAS to the 70 candidate genes in the TWAS (Additional File 1; Table S6) to identify a strong cis- or trans-effect variant of the candidate gene, which enabled us to connect the results from the TWAS and GWAS (Additional File 2; Figure S3). None of the 70 TWAS candidate genes tested in the eGWAS showed a significant trans-eGWAS peak SNP, probably because of the smaller sizes of the trans-effects than that of the cis-effects (Wang et al. 2010, 2014a; Liu et al. 2022). Significant cis-eGWAS peaks were identified for six of the 70 TWAS candidate genes, which comprised 10 gene–phenotype associations (Additional File 1: Table S7). Two combinations (OsENT1 responsible for NRT_L and OsDjA6 responsible for RV_L) were eventually identified as common associations detected in all three statistical mapping methods (Fig. 3; Additional File 1: Table S7).

Fig. 3
figure 3

Two overlaps between the eGWAS and GWAS peaks. The chromosome-level Manhattan plots were visualized for the (A) eGWAS for OsENT1 and GWAS for NRT_L on chromosome 8 and (B) eGWAS for OsDjA6 and GWAS for RV_L on chromosome 4. The most significant GWAS SNP on the illustrated chromosome was highlighted in larger red dot than the other SNPs. Similarly, the eGWAS peak SNP was highlighted in larger cyan dot than the other SNPs

As expected from the GWAS and TWAS results, one of the two common associations was that between OsENT1 and NRT_L. The eGWAS peak SNP was detected approximately 9 kb upstream of OsENT1 with a highly significant signal (− logP = 14.97) and was located within 50 kb of the GWAS peak SNP responsible for NRT_L (Fig. 3A). We found that 30 of the 43 variants located 2 kb upstream of the end of the OsENT1 region showed high LD (r2 > 0.90) with the eGWAS peak SNP (Additional File 1: Table S8). In particular, a SNP variant at a putative splicing site (7 bp downstream from the first exon) was in almost perfect LD with the eGWAS peak SNP (r2 = 0.96) and showed a visible relationship with OsENT1 expression profile as well as with NRT_L phenotypic values (Additional File 2: Figure S9). Altogether, our results suggest that OsENT1 is the most promising candidate gene responsible for NRT_L, although further experimental validation is required as the pairwise LD between the eGWAS and GWAS peak SNPs was moderate (r2 = 0.56).

The other common association was observed between OsDjA6 (LOC_Os04g46390) and RV_L (Fig. 3B). Five equally significant cis-eGWAS peak SNPs were detected for OsDjA6 from 27,425,399 to 27,820,992 bp on chromosome 4, covering the region of this gene (Additional File 1: Table S7). When the GWAS results for RV_L were compared with the eGWAS results, the most significant GWAS peak SNP at 27,719,033 bp (200 kb downstream of OsDjA6 but between the two eGWAS peak SNPs at 27,625,119 and 27,725,503 bp) on chromosome 4 was in high LD with eGWAS peak SNPs (r2 = 0.97). Moreover, 24 polymorphic variants were present from 2 kb upstream to the end of the OsDjA6 gene region in the 57 WRC accessions, of which nine variants showed high LD (r2 > 0.90) with both eGWAS and GWAS peak SNPs (Additional File 1: Table S9). Among these high-LD variants, two were in an exon but expected to be synonymous variants, three were in an intron, and four were upstream of OsDjA6. We then investigated the four upstream high-LD variants using the PLACE database (Higo et al. 1999) visualized in the JBrowse of RAP-DB, and discovered a SNP variant from adenine (reference allele) to guanine (alternative allele) at 27,504,969 bp (518 bp upstream of the representative MSU7 gene model of OsDjA6) located in a TATA box-like motif, whereas the other three were not located in any putative promoter motif. This SNP exhibited the same segregation pattern to the eGWAS peak SNP in the 57 WRC accessions: eight indica accessions (WRC03, WRC05, WRC07, WRC10, WRC12, WRC13, WRC16, and WRC19) had alternative alleles, whereas the remaining 49 accessions had reference alleles (Additional File 1: Table S10). These eight indica accessions showed a low OsDjA6 expression profile, probably because of TATA-like motif mutation, which may explain their higher RV_L values than that of the other 13 indica accessions (Additional File 2: Figure S10). Thus, this cis-variant in the putative promoter motif is the most plausible source of the negative relationship between OsDjA6 expression profiles and RV_L in indica subpopulation. Collectively, all GWAS, TWAS, and eGWAS supported the association between OsDjA6 and RV_L.


To optimize the root system architecture in rice through molecular breeding, statistical mapping is a promising approach for identifying the candidate genes by leveraging the natural variation in root phenotypes. Thus, we used GWAS, TWAS, and eGWAS to explore the candidate genes related to the natural variation of the 12 root phenotypes using the genotypes of 424,888 SNPs and the expression profiles of 16,901 genes in 57 rice accessions. Our comprehensive statistical analyses identified OsENT1, OsEXPA31, OsDEC1, OsSPL14, OsDEP1, and OsDjA6 as the candidate genes for root phenotypes. Furthermore, four significant genes (LOC_Os01g04630, LOC_Os03g02750, LOC_Os02g54580, and LOC_Os12g32536) and two weakly significant (FDR-adjusted P-value between 5% and 10%) genomic regions associated with at least one root phenotype were identified using TWAS and GWAS, respectively. In addition, GO enrichment analysis highlighted the importance of genes related to cell wall organization and response to oxidative stress in the natural variation in RD_C. While the sample size (n = 57) was limited to detect small-effect genes or loci, our statistical analyses dissected the genetic control of root phenotypes in a diverse panel and identified the candidate genes for molecular breeding and functional genomics of root system architecture.

All statistical analyses suggested that OsENT1 is a candidate gene responsible for NRT_L. Both GWAS and eGWAS identified a significant peak in proximity to this gene, with the sixth strongest association for NRT_L in TWAS, and there was a SNP variant at a putative splicing site showing a high LD with the cis-eGWAS peak SNP. Based on the RiceXPro database (Sato et al. 2011b, 2013), OsENT1 (RiceXPro accession ID: AK059439) expression profiles in roots, particularly in the root elongation zone (RXP_5002; Takehisa et al. 2013), were stronger than that in other organs (RiceXPro dataset ID: RXP_0001; Sato et al. 2011a) and exhibited a positive response to cytokinins in both roots (RXP_1005) and the shoots (RXP_1010). Four genes (OsENT1–4) encoding potential equilibrative nucleoside transporters have been identified in rice. Although the function of OsENT1 in rice is not well understood, an expression analysis in yeast cells suggested that OsENT2 plays a role as a cytokinin transporter (Hirose et al. 2005). Additionally, two equilibrative nucleoside transporter homolog genes in Arabidopsis, AtENT3 and AtENT8, are involved in nucleoside-type cytokinin transport (Sun et al. 2005). While OsENT1 did not show ability to transport nucleoside-type cytokinins, its amino acid sequence was the most homologous (45%) with that of AtENT8 (Hirose et al. 2005). Cytokinins control the cell differentiation rate in the root meristem and, therefore, control root meristem size, crown root, and lateral root formation (Beemster and Baskin 2000; Dello Ioio et al. 2007; Laplaze et al. 2007; Neogy et al. 2021). Thus, we hypothesized that OsENT1 may participate in cytokinin transport, thus affecting root phenotypes.

TWAS and GO enrichment analyses revealed that five α-expansin genes (OsEXPA3, OsEXPA9, OsEXPA18, OsEXPA19, and OsEXPA31) were negatively associated with RD_C, of which OsEXPA31 showed the strongest association and a strong negative correlation with RD_C (r = − 0.69) in the 57 WRC accessions. The expansin proteins, discovered by McQueen-Mason et al. (1992), are a class of cell-wall-loosening proteins that play important roles in mediating plant growth and development (Cosgrove et al. 2002). Although OsEXPA31 has not yet been functionally characterized, the relationships between other OsEXPAs and root morphology have been investigated in rice. OsEXPA8 positively regulates primary root length and the number of lateral roots by mediating cell wall loosening (Ma et al. 2013; Wang et al. 2014b). OsEXPA10 is required for root cell elongation (Che et al. 2016). OsEXPA17 and OsEXPA30 are root hair-specific genes that play crucial roles in root hair elongation (Yu et al. 2011). Interestingly, the protein sequence of OsEXPA31 is largely different from the four above expansins (He et al. 2015), implying a potential functional divergence of OsEXPA31. Therefore, our results suggest an undiscovered role of OsEXPA31 for crown root diameter by regulating cell wall structure.

Most genes with the strongest association with root phenotypes in the TWAS were uncharacterized in rice, whereas three candidate genes had been reported to affect aboveground phenotypes. For instance, OsDEC1, which decelerates internode elongation, was negatively associated with RL_C in our study (r = − 0.64; Gómez-Ariza et al. 2019; Nagai et al. 2020). The well-known yield-related gene, OsSPL14 showed the fifth strongest association with RD_C in the TWAS. Several studies have demonstrated that OsSPL14 optimizes rice plant architecture and improves abiotic stress tolerance (Miura et al. 2010; Jiao et al. 2010; Zhu et al. 2022). OsSPL14 also confers root elongation by modulating PIN2 and PIN10b (auxin efflux carriers) transcription under low nitrogen supply (Wang et al. 2022). In a recent study, the crown root diameter was enlarged in OsSPL14-promoter-mutant plants that highly expressed this gene specifically in roots (Song et al. 2022), which is consistent with our association detected in TWAS. Another yield-related gene, OsDEP1, which is positively regulated by OsSPL14 (Lu et al. 2013), was associated with RDR in the WRC57 accessions. OsDEP1 was first reported to mediate panicle morphology and contribute to grain number improvement (Zhou et al. 2009; Huang et al. 2009), and further studies have demonstrated that it regulates nitrogen use efficiency and drought adaptation (Sun et al. 2014; Zhang et al. 2015). In addition, OsDEP1 modulates root elongation for phosphorus uptake in rice (Wang et al. 2021). Considering the multiple functions of OsDEP1 in plant growth and development (Trusov et al. 2007; Wang et al. 2006; Xu et al. 2016), OsDEP1 may have undiscovered pleiotropic roles in other crucial growth processes in rice. Our results imply that yield-related genes such as OsSPL14 and OsDEP1 improve yield by modulating both shoot and root phenotypes.

Our comprehensive GWAS, TWAS, and eGWAS analyses identified a negative association between the expression profile of OsDjA6 and RV_L. OsDjA6 was characterized as a negative regulator of rice immunity to the blast fungus by regulating the genes involved in salicylic acid pathway, including the transcription factor OsWRKY45 (Zhong et al. 2018). Considering the complex OsWRKY45 regulatory mechanism in balancing plant growth and immune responses (Shimono et al. 2007; Wang et al. 2018b; Ichimaru et al. 2022), the association between OsDjA6 and RV_L may imply an unknown pleiotropic function of OsDjA6 by negatively regulating OsWRKY45.

Our results highlight the advantage of transcriptomics for candidate gene search, as none of the six candidate genes (OsENT1, OsEXPA31, OsSPL14, OsDEP1, OsDEC1, and OsDjA6) could be identified without transcriptome data. Although OsENT1 was detected within ± 250 kb region from the GWAS peak SNP, it was impossible to shed light on its involvement without testing the statistical relationship between the expression profile and phenotypic value. Similarly, TWAS and eGWAS revealed a weak association of GWAS around the cis-regulatory region of OsDjA6. Although a GWAS can identify a genomic region associated with phenotypic variations, resolving the GWAS peak to a single candidate gene is often difficult. In contrast, the gene-level associations from the TWAS enabled us to discover novel associations of genes previously characterized for a shoot phenotype, such as OsSPL14, OsDEP1, and OsDEC1, with the root phenotypes.


Association mapping analyses based on both transcriptome and genome data from the 57 WRC accessions revealed six associations between the genes and root phenotype: OsENT1 was associated with NRT_L, OsEXPA31 and OsSPL14 with RD_C, OsDEP1 with RDR, OsDEC1 with RL_C, and OsDjA6 with RV_L. These genes are promising targets for molecular breeding and functional genomics to understand the complex genetic control of root system architecture in rice.

Data Availability

All data used in the present analyses have been published previously. The raw sequence data for SNP and indel discovery were deposited in the DNA Data Bank of Japan Sequence Read Archive in a previous study (Tanaka et al. 2020). Transcriptome data are available from the Gene Expression Omnibus (GSE162313) and root phenotype data are available in the supplementary file of the same publication (Kawakatsu et al. 2021). All codes for data analysis are available in Figshare (



Analysis of variance


Bayesian information criterion


Expression genome-wide association study


Expression quantitative trait loci


False discovery rate


Fragments per kilobase exon per million reads


Gene ontology


Genome-wide association study


Insertion/deletion variant


Linkage disequilibrium


Minor allele frequency


Number of crown root tips


Number of lateral root tips


Probabilistic estimation of the expression residuals




Quantitative trait loci


Crown root diameter


Lateral root diameter


Ratio of deep rooting


Root dry weight


Crown root length


Lateral root length


Crown root size


Lateral root size


Crown root surface area


Lateral root surface area


Crown root volume


Lateral root volume


Single nucleotide polymorphism


Transcriptome-wide association study


World rice core collection


Download references


We appreciate the editor and anonymous reviewers for their fruitful comments. We would like to thank Editage ( for English language editing.


This work was supported by Cabinet Office, Government of Japan, Moonshot Research, and Development Program for Agriculture, Forestry, and Fisheries (funding agency: Bio-oriented Technology Research Advancement Institution, No. JPJ009237); and JST CREST, Japan (JPMJCR17O1).

Author information

Authors and Affiliations



RT, SW, and SY conceptualized this study. SW and RT conducted the formal analyses. TK, ST, MS, NT, and YU acquired and curated the datasets for this study. SW and RT drafted the original manuscript. SY and YU supervised the study and edited the manuscript. All the authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Yusaku Uga or Shiori Yabe.

Ethics declarations

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronicsupplementary material.

Supplementary Material 1

: Table S1. BIC values and the selected models for GWAS and TWAS. Table S2. Candidate genes for the GWAS peak identified at 6,199,732 bp on chromosome 8. Table S3. Candidate genes for the GWAS peak identified at 20,665,890 bp on chromosome 8. Table S4. Candidate genes for the GWAS peak identified at 17,902,506 bp on chromosome 11. Table S5. Summary of the top ten gene-phenotype associations for the 12 root phenotypes in TWAS. Table S6. Candidate gene selection from the 70 unique genes in the top-ten gene-phenotype associations in TWAS. Table S7. Summary of the six genes with a significant cis-eGWAS peak. Table S8. Summary of the filtering, GWAS, and eGWAS results for the 43 polymorphic variants from 2 kb upsteam to the end of the OsENT1 gene region. Table S9. Summary of the filtering, GWAS, and eGWAS results for the 24 polymorphic variants from 2 kb upsteam to the end of the OsDjA6 gene region. Table S10. SNP genotype of the 57 WRC acccessions at the putative TATA box-like motif, at the GWAS peak SNP, and at the eGWAS peak SNP

Supplementary Material 2

: Figure S1. Diagnostic plot of the factor relevance in the PEER analysis. Figure S2. Quantile-quantile plot of TWAS with or without the P3D option. Figure S3. Overview of the analysis pipeline. Figure S4. Manhattan and quantile-quantile plots for the 12 root phenotypes. Figure S5. LD heatmaps around the GWAS peak SNPs. Figure S6. Regional Manhattan plots colored by pairwise LD (r2) with the peak SNP. Figure S7. Genome-wide LD decay. Figure S8. The number of overlapped genes between the top 1% associations for different root phenotypes. Figure S9. The dependence of (A) the expression profile of OsENT1 and (B) the number of lateral root tips on the SNP at a putative splicing site of OsENT1. Figure S10. The dependence of (A) the expression profile of OsDjA6 and (B) the lateral root volume on the SNP in a TATA box-like motif at an upstream of OsDjA6 in indica subpopulation

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, S., Tanaka, R., Kawakatsu, T. et al. Genome- and Transcriptome-wide Association Studies to Discover Candidate Genes for Diverse Root Phenotypes in Cultivated Rice. Rice 16, 55 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: