Identification of an Elite Core Panel as a Key Breeding Resource to Accelerate the Rate of Genetic Improvement for Irrigated Rice

Juma, Roselyne U.; Bartholomé, Jérôme; Thathapalli Prakash, Parthiban; Hussain, Waseem; Platten, John D.; Lopena, Vitaliano; Verdeprado, Holden; Murori, Rosemary; Ndayiragije, Alexis; Katiyar, Sanjay Kumar; Islam, Md Rafiqul; Biswas, Partha S.; Rutkoski, Jessica E.; Arbelaez, Juan D.; Mbute, Felister N.; Miano, Douglas W.; Cobb, Joshua N.

doi:10.1186/s12284-021-00533-5

Original article
Open access
Published: 13 November 2021

Identification of an Elite Core Panel as a Key Breeding Resource to Accelerate the Rate of Genetic Improvement for Irrigated Rice

Roselyne U. Juma¹^nAff2,
Jérôme Bartholomé ORCID: orcid.org/0000-0002-0855-3828^1,3,
Parthiban Thathapalli Prakash¹,
Waseem Hussain¹,
John D. Platten¹,
Vitaliano Lopena¹,
Holden Verdeprado¹,
Rosemary Murori^1,4,
Alexis Ndayiragije^1,5,
Sanjay Kumar Katiyar^1,6,
Md Rafiqul Islam^1,7,
Partha S. Biswas¹^nAff8,
Jessica E. Rutkoski¹^nAff9,
Juan D. Arbelaez¹^nAff9,
Felister N. Mbute¹⁰,
Douglas W. Miano¹⁰ &
…
Joshua N. Cobb¹^nAff11

Rice volume 14, Article number: 92 (2021) Cite this article

4842 Accesses
18 Citations
6 Altmetric
Metrics details

Abstract

Rice genetic improvement is a key component of achieving and maintaining food security in Asia and Africa in the face of growing populations and climate change. In this effort, the International Rice Research Institute (IRRI) continues to play a critical role in creating and disseminating rice varieties with higher productivity. Due to increasing demand for rice, especially in Africa, there is a strong need to accelerate the rate of genetic improvement for grain yield. In an effort to identify and characterize the elite breeding pool of IRRI’s irrigated rice breeding program, we analyzed 102 historical yield trials conducted in the Philippines during the period 2012–2016 and representing 15,286 breeding lines (including released varieties). A mixed model approach based on the pedigree relationship matrix was used to estimate breeding values for grain yield, which ranged from 2.12 to 6.27 t·ha⁻¹. The rate of genetic gain for grain yield was estimated at 8.75 kg·ha⁻¹ year⁻¹ (0.23%) for crosses made in the period from 1964 to 2014. Reducing the data to only IRRI released varieties, the rate doubled to 17.36 kg·ha⁻¹ year⁻¹ (0.46%). Regressed against breeding cycle the rate of gain for grain yield was 185 kg·ha⁻¹ cycle⁻¹ (4.95%). We selected 72 top performing lines based on breeding values for grain yield to create an elite core panel (ECP) representing the genetic diversity in the breeding program with the highest heritable yield values from which new products can be derived. The ECP closely aligns with the indica 1B sub-group of Oryza sativa that includes most modern varieties for irrigated systems. Agronomic performance of the ECP under multiple environments in Asia and Africa confirmed its high yield potential. We found that the rate of genetic gain for grain yield found in this study was limited primarily by long cycle times and the direct introduction of non-improved material into the elite pool. Consequently, the current breeding scheme for irrigated rice at IRRI is based on rapid recurrent selection among highly elite lines. In this context, the ECP constitutes an important resource for IRRI and NAREs breeders to carefully characterize and manage that elite diversity.

Introduction

Rice (Oryza sativa L.) is one of the world’s major staple crops feeding more than 3.5 billion people (Global Rice Science Partnership 2013). It is believed that by 2050 the global population will be approximately 10 billion (United Nations 2019) and much of this population increase will occur in the regions of Africa and Southern Asia, which are highly dependent on rice. As such, rice will be crucial to ensuring equitable food security for the foreseeable future (Peng et al. 2004; Godfray 2014; Li et al. 2018). Challenges posed by climate change as well as increasing consumer demand further highlight the importance of rice to global food security (Silvern and Young 2013). While agricultural intensification using modernized management practices (Garnett et al. 2013) can help boost productivity, the importance of rice genetic improvement in the context of these management systems is also an important driver of sustainable productivity (Guimaraes 2009; Atlin et al. 2017). The rate at which this genetic improvement occurs is often referred to as genetic gain and in order to deliver improved varieties to the farmers of the twenty-first century, the rate of genetic gain in rice must accelerate relative to twentieth century levels (Atlin et al. 2017).

With the acceleration of genotyping technologies through the early twenty-first century and the subsequent maturation of genomic selection-based breeding strategies, there has been a renewed interest in the application of quantitative genetics to plant breeding programs (Cobb et al. 2019b; Bernardo 2020). To this end the irrigated rice breeding program at the International Rice Research Institute (IRRI) has spent significant effort to develop a modernized approach to rice breeding to substantially and sustainably increase response to selection (Collard et al. 2019). In addition to implementing accelerated single seed descent strategies (Collard et al. 2017), another major pillar of IRRI’s effort to transform rice breeding is the deep characterization of the elite genetic base from which new products are derived. While the characterization and dissection of rice genetic diversity in public germplasm collections has advanced considerably (Li et al. 2014; McCouch et al. 2016; Sun et al. 2017), to be fully leveraged for varietal improvement, it needs to be paired with an equally in-depth characterization of the elite genetic diversity residing in breeding programs across the world.

The irrigated rice breeding program at IRRI has been a source of elite breeding germplasm for decades (Peng and Khush 2003; Mackill and Khush 2018; Collard et al. 2019). This genetic diversity has been utilized in combination with landraces and local varieties to contribute substantially to the yield improvement achieved in Asia to date. The breeding strategies used to achieve this post-Green Revolution yield improvement however, frequently varied according to funding priorities, available technology, and evolution of scientific thinking (see Fig. 1). IRRI’s early breeding effort culminated in the development of IR8, the first widely-adopted semi-dwarf variety of the Green Revolution (Chandler 1982; Peng et al. 1999; Peng and Khush 2003). Though this variety was high yielding, it lacked acceptable cooking and eating quality and therefore was quickly superseded by other varieties that excelled in both grain yield and marketability (Khush 2001). During this time, a focus on improved disease resistance and continued efforts to increase genetic variation led to many new varieties introgressed with genetics from wild species (Brar and Khush 2002, 2018) that were created using strategies such as backcrossing, top crossing, and pedigree breeding methods. IR 36, for instance, resulted from the combination of 13 landraces from six different countries (Khush 2005). This variety displayed good grain quality, early maturity, tolerance to abiotic stresses, and resistance to multiple pests and disease (Peng and Khush 2003). Further advances in grain quality (soft gel consistency, translucent and long slender grains, intermediate amylose content and intermediate gelatinization temperature) were made with the release of IR 64 which resulted from combining extant improved lines with 19 traditional varieties (Mackill and Khush 2018), but which was still heavily based on IR8. A renewed focus on yield improvement in the late 1980s and 1990s sparked the development of an ideotype breeding strategy known as the new plant type (NPT, Fig. 1) (Cassman 1994; Peng et al. 2004; Yadi et al. 2021). With the advent of the molecular marker technologies during the same period, this was quickly followed up by selection strategies based on marker-assisted backcrossing to introduce major genes for biotic or abiotic stress tolerance to produce enhanced versions of existing varieties. This effort was recently coupled with an enhanced focus on bio-fortification in order to couple high yield with high nutritional value. However, post-Green Revolution breeding for quality and disease resistance, while successful, has not brought about the realized genetic gain for yield that is needed to meet the projected demand. More recent approaches aim to integrate principles of quantitative genetics into the breeding strategy by focusing the molecular breeding strategy on well-known high-value haplotypes and using a genomics-enabled rapid recurrent selection strategy to improve quantitative traits mainly through accelerated breeding cycles (Fig. 1).

The objective of this study was twofold: (i) estimate gains in breeding value for yield over the entire history of IRRI’s breeding program for irrigated systems, and (ii) identify and characterize a panel of elite lines from among the available germplasm that balances high breeding value for yield with sufficient genetic variance to preserve long term gain from selection. To this end, we gathered historical data from 102 yield trials in the IRRI phenotypic database (Breeding4Results 2021) spanning the years from 2012 to 2016 and combined it with pedigree data from the International Rice Information System (McLaren et al. 2005; Collard et al. 2019) to estimate breeding values for grain yield for all extant or recently extant lines. These trials included most of the existing advanced material of the breeding program as well as many replicated observations of IRRI released varieties, allowing us to estimate the rate of genetic gain over five decades. The same data was then used to identify high yielding lines from the breeding program to form the IRRI irrigated elite core panel (ECP). Seventy-two lines were ultimately chosen to comprise the elite panel and were subjected to extensive genetic and phenotypic characterization to assess suitability for short-cycle recurrent selection.

Results

Estimation of the Genetic Gain for Grain Yield

Genetic gain for grain yield was estimated as a function of change in breeding value over time. Breeding values for 15,286 lines evaluated in 102 trials conducted between 2012 and 2016 were estimated using a two-stage mixed model analysis (Table 1, Additional file 1: Table S1). The majority of these lines were advanced lines from the breeding program that never achieved varietal status and released varieties from different decades. Eighty percent of the lines originated from crosses that were made after 2009 (Fig. 2A). As expected, the reliability of the breeding values of older lines (generated before 2000) were higher compared to more recent material, with an average value of 0.43 (σ = 0.23) and 0.1 (σ = 0.17), respectively. Breeding values for grain yield ranged from 2.12 to 6.27 t·ha⁻¹. The genetic trend as measured by this analysis of the IRRI irrigated rice breeding program since its initiation in 1960 to 2014 is presented in Fig. 2B. Over this period the linearized genetic trend was estimated to be 8.75 kg·ha⁻¹ year⁻¹ (0.23%). Despite the smaller sample size for the earlier historical periods, an upward trend from 1960 to 1980 is apparent followed by a period of variability in the average breeding value which eventually plateaus around 4.38 t·ha⁻¹ after 2008. In order to interrogate the drivers of this genetic trend further, the equivalent complete generation (EqG, see “Materials and Methods” section) for each line was calculated as an estimate of the number of effective breeding cycles that had taken place prior to the crossing event. EqG is a key indicator of the rate of introduction of new material and the extent at which improved material is recycled into the breeding program. A similar trend to breeding values was also observed for EqG for the same period. Values had exceeded two by the end of the sixties to reach its maximum average value of six in the eighties (Fig. 2C). This was followed by a marked decrease to an average value of four in the nineties the average maximal values after 2000 never exceeded six equivalent generations. In addition, a large variance in EqG was found across lines from the most recent decade with values ranging from 1 to 7.56 highlighting the extensive use of non-improved material in combination with more elite lines.

Table 1 Summary of yield trials used to estimate breeding values for grain yield

Full size table

Eighty-six released varieties included in the dataset were analyzed separately to better characterize the long-term trend in breeding values for yield and its relationship with EqG (Fig. 3). This includes material dating from the Green Revolution and post-Green revolution eras (IR8, IR36), mega-variety (IR64) and more recent high performing releases (IRRI 154, IRRI 156). Altogether, these lines covered a large period from 1962 to 2006. In this period genetic gain for grain yield was estimated based on released varieties to be 17.36 kg·ha⁻¹ year⁻¹ (0.46%; Fig. 3A). When regressing breeding values on EqG we observed significant correlation and estimated the rate of genetic gain per cycle to be 185 kg·ha⁻¹ cycle⁻¹ (4.95%; Fig. 3B).

Retrospective Analysis of Crosses

The pattern of parental selection among crosses made by IRRI’s irrigated breeding program was analyzed over a period of thirty years (1985–2014) to assess the evolution of the crossing strategy and its relationship to EqG. During this period, 13,190 crosses were made. The number of crosses and the proportion of the type of cross varied substantially from one year to the next (Fig. 4A). However, the total number of crosses has been on a downward trend. During this period, most crosses were single crosses (71.9%) or three-way crosses (24.9%) and a small proportion were backcrosses (2.7%), complex crosses (0.4%) or double crosses (0.1%). The proportion of single crosses varied from 42.9% in 2007 to a high of 99.6% in 1989. To further dissect the impact of parental selection and mating design on EqG, crosses from this period were classified as elite by elite (41.4%), elite by non-elite (34.2%) and non-elite by non-elite (24.4%) based on the EqG of the parents. Since an EqG of 4 represented lines from the most advanced available breeding cycle in 1985, any line with an EqG of four or greater was considered elite and any line with an EqG of less than four was considered non-elite. Similar to cross type, the three classes of cross varied substantially from one year to the next (Fig. 4B). Notably, from 1991 to 1997 the proportion of non-elite by non-elite crosses increased dramatically with up to 82% of the crosses falling in this category for that period. This corresponds to a decrease in EqG for the same period and is likely a function of the introduction of new material into the breeding program to achieve the objectives of the NPT initiative (Additional file 2: Fig. S1). During the period 1985–2014, 6,228 unique lines were used as parents and most of them were used only once (65.4%). On the other hand, a few lines were heavily used as parents with 90 lines being used more than 40 times each during the 29-year period. As expected, this list included well-known IRRI varieties such as IR36, IR64, IR72, IRRI 104, IRRI 105, IRRI 118 IRRI 123 and IRRI 154 but also included traditional varieties like Kalimonch, Basmati 370, Shen Nung 89–366 and MD-2 used as donors of alleles with particular value. Interestingly, some of the most used parental lines were crossed during several periods with sometimes more than 20 years between their first and last use (IR64, IR72, IRRI 104, and IRRI 105). This prominent reuse of old material serves to lengthen effective breeding cycles despite advancing the pedigree and is likely one of the primary limitations on the historical rate of genetic gain for grain yield.

Defining the Elite Core Panel (ECP)

The best performing lines in terms of breeding value for yield were selected and filtered based on the reliability of the breeding value estimate and their relatedness to other lines in the dataset based on pedigree (see “Materials and Methods” section). The final ECP was composed of 72 lines falling within the top 2% of breeding values, ranging between 4.93 and 6.01 t·ha⁻¹ with a mean value of 5.27 t·ha⁻¹ (Fig. 5). Most of the selected lines were of medium duration with breeding values for flowering time (days to 50% heading) averaging at 90 days. The average EqG was 5.7 with 90% of the lines having an EqG greater than four. The majority of the lines were developed after 2000 with 37.5% in 2010 and onward (Fig. 5). Interestingly one line (IR05N341) has a significantly lower EqG of 2.44, but has a breeding value of 5158 kg·ha⁻¹ and ranks as 45/72 among ECP breeding values. IR05N341 is an NPT inbred with a number of introduced lines in its pedigree, namely SHEN-NUNG 89–366, KETAN LUMBU, GUNDIL KUNING, and JHUM PADDY 7.

Genetic Characterization of the ECP

In order to quantify the genetic variation available to breeding in the ECP, the panel was genotyped with an amplicon panel of genome-wide markers specifically chosen to be informative among elite indica lines known as the 1k-RiCA (see “Materials and Methods” section). Using publicly available sequence data, the genome-wide SNPs assayed on the ECP lines were compared to the Xian/indica (XI) subpopulation defined by the sequenced 3000 rice genomes (3K-RG) in order to assess the diversity of the elite germplasm relative to a relevant baseline. Principal component analysis revealed that all the ECP lines were mainly clustered in the XI-1B group (Fig. 6A). Not surprisingly, this group includes modern rice varieties from diverse origins with a large representation of material generated by IRRI’s breeding program. Importantly, the selected ECP lines were spread across the entire ‘XI-1B’ group indicating that the selection of ECP lines based on yield performance was still able to capture a large range of diversity within this sub-group. Using linkage disequilibrium measurements from the 1k-RiCA genotype data, the effective population size (N_e) was calculated to be 22, indicating a reasonable genetic diversity considering the census population size of 72. Cluster analysis based on genetic distance among ECP lines revealed two main clusters, which further branched into six sub clusters (Fig. 6B). These clusters varied in size (9 to 14 lines per cluster) but all clusters were similar in terms of breeding values for yield (averaging 5.03 to 5.36 tons ha⁻¹) and also boasted similar EqG measurements (averaging 5.26 to 6.33) with no significant difference between clusters (p value > 0.05).

In addition to genome-wide genetic diversity, we also assessed the allele frequency of 33 high value genes contributing to the resistance to major biotic stresses in rice (rice blast, bacterial leaf blight (BLB), brown plant hopper (BPH), gall midge (GM), rice stripe virus (RSV), rice tungro virus (RTV), rice yellow mottle virus (RYMV), and sheath rot (SR)). Genes assayed display a wide range of frequency, from absent to fixed for the favorable allele/haplotype (Fig. 7). For Blast, the frequency of the favorable allele was high or fixed for three genes (Pita—74%, Pi25—100% and Pid2—100%), moderate to low for five genes (Bsr-d1—6.3%, Pi33—16.7%, Pi54—10%, Pii—19.7% and Ptr—33.3%). For the remaining genes (including Pi9, Pi35, Pi21) the favorable allele was absent from the ECP. For BLB, Xa4 (100%) and Xa26 (75%) presented high allele frequency but with some uncertainty due to missing data. Xa5 and sweet13 were also evidenced in the ECP with frequencies of 26.9% and 31.8%, respectively. Xa21 and Xa7 were also present but very low frequencies (below 3%). Xa13, sweet14, Xa23 were absent in the ECP. Concerning BPH, only BPH17 (15%) and BPH3 (65.5%) were found in the ECP. Finally, favorable alleles for STV11, TBV1, TSV1 and Chit1 were also present in the ECP.

Phenotypic Characterization of the ECP

Blast Disease Screening

The ECP lines were evaluated for their level of resistance against five isolates of Magnaporthe oryzae under controlled conditions. Based on phenotypic measurement, the most virulent isolate was M64-1-3-9-1 and the least virulent was CA89. A wide variation in the resistance to the five blast isolates was found in the ECP with most of the genotypes displaying intermediate resistance to one or more isolates (Additional file 2: Fig. S2). Not surprisingly, the most resistant genotypes to one isolate were usually not the most resistant for the others suggesting specific isolate-host interactions (Additional file 2: Fig. S3). As the specific combination of favorable alleles at one or several genes associated with blast resistance are likely the primary drivers of phenotypic variation, the ECP lines were classified into groups according to their genetic profile at the surveyed blast genes (Table 2). Genes being either fixed positive (Pid2, Pid3) or absent (Pi9, pi21, Pi35, Pi36) were excluded from this analysis. No single allele or combination of alleles was found to perform consistently better across the five isolates. However, the presence of Pi-ta alone or in combination with Ptr, Pi5/Pii, Bsr-d1 or Pi2/Pizt tended to be associated with more resistant phenotypes. For example, the genotype IR12A311 that carries Pi-ta, Ptr and Pi5 was found to be the most resistant across the five isolates. The combination of Pi-ta and Pi2/Pizt was found in IR93346:1-B-13-7-6-1RGA-2RGA-1-B, a line resistant to 4 isolates. Further, six genotypes (IR09N516, IR12A282, IRRI174, IR13N102, IRRI156, IR12A136) among the ten most resistant carried at least Pi-ta. IR11A341, which contains Pi33, was also highly resistant to three isolated and was part of the ten most resistant lines across the five isolates. Further characterization of Pik-h and Pik-m would be useful to refine the classification of the lines and understand the pattern of resistance.

Table 2 Response of the elite core panel to blast disease under controlled environment

Full size table

Bacterial Leaf Blight Disease Assessment

ECP lines were also screened for resistance to bacterial leaf blight infection against 14 known isolates. The two most virulent isolates were PXO 340 and 99 with an average lesion length of 21.9 cm and 21.3 cm, respectively (Additional file 2: Fig. S4). The least virulent isolate was PXO 61 with most of the lines displaying few symptoms (average lesion length of 2.4 cm). As expected, the ECP lines displayed a large phenotypic variability compared to the checks (IRBB lines or the susceptible check; IR 24). Resistant checks were found to display consistently lower symptoms than the majority of ECP lines (Additional file 2: Fig. S4). Unlike with blast, the response between isolates was significantly correlated with values ranging from 0.26 to 0.77 (Additional file 2: Fig. S5). Similar to blast analysis, the presence of a favorable allele for one or several genes associated with BLB resistance was used to classify ECP lines into groups (Table 3). The Xa4 allele was present in most of the ECP material. However, it (alone or in combination with sweet13 and/or Xa26) did not significantly reduce the symptoms compared to the susceptible check (IR 24) or the ECP lines without a known favorable allele for BLB genes (Table 3). Favorable alleles for sweet13 or Xa26 alone did not reduce the severity of the symptoms compared to the check. The presence of xa5 alone or in combination with or other genes conferred a better resistance to most of the isolates (except PXO 99 and to a lesser extent PXO 340). Xa7 (one genotype) and Xa21 were also found in more resistant genotypes across most of the isolates. The five most resistant ECP genotypes across the 14 isolates, with similar values to the IRBB checks, were IR15A4029 (Xa4, xa5), IRRI154 (Xa4, xa5, Xa26), IR12N252 (Xa4, xa5) IR 100,097-B-B RGA-B RGA-8 (Xa4, xa5, Xa26, sweet13) and IR12N249 (Xa4, xa5, sweet13).

Table 3 Response of the elite core panel to bacterial leaf blight under controlled environment

Full size table

Agronomic Performance in Target Environments

Phenotypic characterization of the ECP lines was conducted in eight locations to evaluate the agronomic performance of each line in several of the target environments for the IRRI breeding program (Table 4, Fig. 8). To standardize the comparison, agronomic performance of the ECP lines in the field was compared with the three IRRI checks (IRRI 154, IR 64, and IR 72) as well as with a few local checks commonly grown in each region. In the Philippines, IRRI 154 is used as a local check with high yield potential, and an IRRI Check, and is part of the ECP. The repeatability of the trial was good with values ranging from 0.45 to 0.97 for days to flowering and from 0.29 to 0.97 for grain yield. As expected, the performance of ECP lines compared to the local checks was influenced by the environment as genotype by environment interactions are relatively high for traits like grain yield (Additional file 2: Fig. S6). In four environments, ECP lines presented a better grain yield on average compared to local checks (p value < 0.05, Table 4). In the remaining environments, ECP lines did not perform significantly better than local checks on average. However, in all environments the variance among ECP lines included some genotypes that presented better grain yield than local checks. These results highlight that despite the Philippines-specific data used to select ECP lines, the material remains relevant to extra-Filipino environments and confirms the importance of this panel for the global breeding program.

Table 4 Summary of agronomic trials conducted in target environments to evaluate the performance of the elite core panel

Full size table

Discussion

Leveraging Historical Data to Estimate Breeding Values

This paper presents a brief but systematic review of the six decades of rice breeding for irrigated environments conducted at IRRI since the green revolution. During this time, the drivers of genetic improvement strategy understandably changed as technology, scientific advancement, and funding priorities evolved. While yield gain was the primary outcome of the Green Revolution breeding strategies, the post-Green Revolution era focused more keenly on changes in plant type, grain quality, biotic and abiotic stresses as well as grain yield using a variety of breeding methods (Fig. 1; Peng et al. 1999, 2008; Khush 2001; Peng and Khush 2003). The historical pedigree information available through the International Crops Information System (ICIS) database (Bruskiewich et al. 2003; McLaren et al. 2005; Portugal et al. 2007) permitted the tracking of crosses and the development of new breeding lines back to 1960 and was a powerful resource for making a data driven and quantitative characterization of breeding methodologies. When utilized alongside newer databases for phenotypic information (Collard et al. 2019) the pedigree data allowed the determination of an individual’s breeding value by integrating the correlated response of relatives harboring the same alleles into the analysis (Piepho et al. 2008). This allowed for a ranking of all the available germplasm using an additive metric of genetic merit and generating a better criterion for parental selection than adjusted phenotypic means alone. While the phenotypic dataset used in the meta-analysis was highly unbalanced (i.e. different lines were evaluated in different locations and years), the mixed model approach is generally robust to such assumptions (Damesa et al. 2017) and creates estimates for fixed effects and predictors for random effects that are unbiased and minimize error variance. The two-stage modeling approach further allowed for the integration of varied experimental designs while the use of the relationship matrix in the second stage of the analysis allowed for the borrowing of information from relatives to further narrow the estimates of uncertainty around an individual's performance.

The Drivers of the Genetic Gain for Grain Yield

The historical breeding values generated by this analysis provided a convenient mechanism for estimating the historical genetic trend for grain yield in the program. Breeding programs often evaluate genetic gain in many different ways (Rutkoski 2018) and present the result in units that are not always easily interpreted. Most often, this takes the form of a percentage. Here, to aid interpretation, we report genetic gain as kilograms per hectare per year or kilograms per hectare per cycle. The percentages these levels of gain represent relative to the initial performance are given parenthetically for context.

The realized genetic gain for the irrigated program calculated by regressing all 15,286 lines on the year of their origin (year the cross was made from which they were derived) was estimated to be 8.75 kg·ha⁻¹ year⁻¹ (0.23%) from 1960 to 2014. The estimated rate of genetic gain when the data set was restricted to only IRRI released varieties was estimated to be 17.36 kg·ha⁻¹ year⁻¹ (0.46%) and 186.24 kg·ha⁻¹ cycle⁻¹ (4.95%). Previous reports of genetic gain in this program using an era study with a selection of 7 and 12 released varieties have shown gains of 81 and 75 kg·ha⁻¹ year⁻¹ (~ 1%), respectively (Peng et al. 2000). Estimates of genetic gain for rainfed and drought stress environments in India have been estimated at 34 kg·ha⁻¹ year⁻¹ (0.68%) and 25 kg·ha⁻¹ year⁻¹ (0.87%) using 214–242 advanced breeding lines (Kumar et al. 2021). The Brazilian rice breeding program for upland rice has also reported low gains for grain yield with mean gain of 19.1 kg·ha⁻¹ year⁻¹ (0.67%) over a 26-year period using a meta-analysis of 376 advanced breeding lines (Breseghello et al. 2011). However, in the last decade of their analysis (2002 to 2009), the trend showed an increase in the rate of genetic gain to 45.0 kg·ha⁻¹ year⁻¹ (1.44%). Similar estimates for irrigated rice in Brazil were reported using rapid-cycle recurrent selection and data from 667 selection candidates that were progeny tested in different breeding cycles (766 kg·ha⁻¹ cycle⁻¹; 1.98% per year; Morais Júnior et al. 2017). Interpreting the drivers of genetic trend is not simple and speculation in the absence of a complete record of activities can often be misleading. As such, it is helpful to focus on long term patterns in the data. The steeper positive slope of the genetic trend for yield that emerges when only released varieties are considered is consistent with previous reports and can be considered a strong indicator of the positive impact the breeding program has had over time as it has identified and commercialized superior genotypes. The tenfold difference between the per year and the per cycle estimates of genetic gain demonstrates that selection for improved yield has been highly effective on a per cycle basis. This selection response indicates that adequate levels of genetic variance for yield, adequate intensity of selection, and reasonable values for heritability have been maintained during the post-Green Revolution era. The high correlation between breeding values for yield among released varieties and the estimated equivalent generations indicates that cycle time (as measured by EqG) is an important driver of the observed genetic trend. This is consistent with the well-established relationship between generation interval and response to selection (Cobb et al. 2019b). While informative, specific subsets of the breeding germplasm can bias and increase the uncertainty around genetic trend estimates. Using all 15,286 lines for which digitized phenotypic data exists provides a much stronger foundation for assessing base-line rates of genetic gain than potentially interpretable but arbitrary subsets of the data. As this metric incorporates all breeding material (including historical discards), it is not an effective measure of the genetic gain in commercial releases but can be useful for evaluating the impact of breeding innovations on response to selection over cycles.

Importance of Developing the Elite Core Panel

The contemporary program has moved to a much more intensive recurrent selection strategy based on quantitative genetics principles to drive genetic gain for yield in the context of a disease resistant and high grain quality genetic background. This approach is a natural progression building on previous eras where the focus was on the identification and integration of genetic variation for yield potential traits (Peng et al. 2008). With that, it becomes necessary to systematically evaluate the existing genotypes in the program and select a number of high performing lines to form the basis of a gene pool upon which selection for high breeding value can, in combination with other innovations, drive improved rates of genetic gain (Xu et al. 2017). The 72 lines selected based on breeding value to be part of the ECP essentially represent the initial founders of the recurrent selection program moving forward. While the phenotypic value of this panel should be quickly eclipsed by successive generations of breeding, every new cohort represents an admixture of allelic variance of the panel. Thus, having clearly maintained seed sources for the original lines offers several distinct advantages, including as an elite source of genetic variation to be evaluated alongside the contemporary cohorts for new traits of interest. Such a panel is also helpful for validating trait markers for high-value haplotypes which reduce the occurrence of type I and type II errors when genotyping the progeny (Platten et al. 2019; Cobb et al. 2019a). Once sequenced, the panel also becomes a powerful resource for determining identity-by-descent (IBD) information among progeny cohorts and potentially reducing the need for routine use of high-density markers through the development of a breeding program specific imputation framework (Browning and Browning 2012; Nyine et al. 2019; Wang et al. 2020).

Genetic Diversity Captured by the Elite Core Panel

A natural concern to limiting the breeding program to crosses among such a small number of lines is the reduction in genetic variation that may occur due both to selection and genetic drift. As the ECP lines were selected based on the pedigree-estimated breeding values, the genetic characterization of the panel is a necessary next step to demonstrate its utility as a resource for breeding in a population improvement program (Warburton et al. 2005; Wen et al. 2012). While the panel itself was selected based on breeding value for yield, the mean flowering time compared to the entire dataset has not changed. This is largely a function of including flowering time as a covariate in the model, which factored out confounding effects due to the positive correlation between yield and flowering in rice.

We used the 3K reference genome panel (Wang et al. 2018) to better understand how well the ECP sampled the genetic space within rice genetic diversity at large. Unsurprisingly, it falls within the Xian/indica 1B group which has been generated through breeding activities in Southeast Asia largely by IRRI (Xie et al. 2015; Wing et al. 2018). Presence of sufficient genetic variation among the ECP lines is further supported by the estimated effective population size (Ne) of 22. This value may be underestimated as the markers used for the analysis were specifically designed to be informative in the indica subpopulation and harbored high minor allele frequencies on average (Arbelaez et al. 2019). This level of Ne is similar to what has been calculated in other rice breeding programs. For example, Grenier et al. (2015) showed a Ne in the range of 23–57 in four breeding populations of rice derived through recurrent selection programs. Morais Júnior et al. (2017) observed slightly higher values (40–60) in an irrigated rice breeding population using pedigree data. Values of Ne associate positively with additive genetic variation and the ability of a population to respond to selection for the trait under consideration (Falconer and Mackay 1996). Depletion in variability in population is proportional to Ne, and the time required to deplete the variability or fix one or other alternative allele in a population is a function of Ne, allele frequency (p) in the population, and the selection intensity (Walsh 2003). The theoretical limits of selection response as given by Robertson (1960) postulate that the total response to selection is equal to 2Ne times the initial gain in the first generation assuming genes with additive effects and relatively low selection intensity. This is to say that the Ne of any given generation is equal to the number of effective cycles before half the genetic variability is eroded by selection or drift. Therefore, holding the unlikely assumption that no new introductions were to be made into the program moving forward, the ECP could theoretically support at least 22 breeding cycles before half of the genetic variance is eroded. This could be further extended by the implementation of marker-optimized mating designs such as optimum contribution selection (Akdemir and Sánchez 2016; Akdemir et al. 2019) and targeted pre-breeding activities.

Expected Performance of the ECP Across Target Environments

Many traits in rice are governed by a number of high value, large effect alleles that affect patterns of phenotypic variance across environments (Wei et al. 2021). Some of these alleles (particularly disease and grain quality loci) are extremely valuable and deserve proactive management of their frequencies (Cobb et al. 2019a). Understanding these allele frequencies among the ECP lines is therefore essential for setting breeding strategy. As might be expected, the ECP displayed a wide range of frequencies for major pathogens and pests related genes. A few were essentially fixed for the positive allele; many of these represent indica/japonica differences, where the indica allele is favorable, such as Pi25 or Pid2. The value of these genes has already been captured by the breeding program, thus further improvement of these traits must rely on other genetic variation. Other genes are absent, such as xa13, Xa23, the rice yellow mottle virus resistance genes, among others. The lack of these genes in elite material necessitates some pre-breeding effort to introduce them to the elite pool before their value can be leveraged (Cobb et al. 2019a). Between these two extremes are the genes that can actually be selected in existing breeding material, and so are those contributing to variation in the elite pool. These include genes such as Xa7, Xa21 or TBV1 which although present are very rare and only available from particular lines, thus delaying their full deployment as the program generally cannot risk bottleneck a cohort through just a few lines. A few genes are at appreciable frequencies (but not fixed), and so represent diversity that is easily selectable in the existing breeding populations; these include Pita, Ptr, Pii, BPH32, sweet13, xa5 and TSV1.

Given the importance of bacterial blight and blast resistance specifically to germplasm exchange across Asia and Africa we decided to challenge the ECP lines against several common isolates of both pathogens to check the effectiveness of the gene combinations present in the ECP against blast and BLB disease in these specific genetic backgrounds. It has been reported by Shanti et al. (2010) that a four-gene combination (Xa4, xa5, xa13 and Xa21) is the most effective combination conferring broad-spectrum resistance for bacterial blight. While 30% of the ECP lines contained one or more bacterial blight resistance alleles, none of the ECP lines contained this specific combination as xa13 in particular is at a frequency of zero. However, it is clear the gene combinations that are present offer resistance to all tested isolates except PXO 99. PXO 99 is a common Xanthamonas isolate in the Philippines (Tu et al. 2000) indicating that while the ECP lines are directly useful in many geographies, a targeted backcrossing and MAS approach is needed to increase the frequency of high value alleles currently at low frequency among the breeding progeny of the ECP. Likewise, the blast resistance genes present in most of ECP lines also effectively controlled the manifestation of disease among the ECP lines for the five strains tested, likely due to the high frequency and broad spectrum resistance offered by the Pi-ta locus (Jia et al. 2016).

It is not unexpected that a subset of the breeding germplasm selected based on breeding value for yield would require pre-breeding/backcrossing and MAS to fully address the complexity of trait targets in the product concept. In order to avoid further erosion of the genetic variance (particularly in the form of selective sweeps around low frequency loci) some strategic cautions are warranted. Product development is the primary goal of a breeding program, however when product development is strongly emphasized, it is tempting for breeders to overuse specific high-value lines in a crossing block, at the expense of gene pool management. While every cross may not bring together the complete package necessary for a product release, specific crosses generated with the intention of creating progressively improved allelic combinations (and slowly increase in frequency of major genes) can generate useful lines that can be recycled into the crossing block as parents. Likewise, the use of haplotype-matched backcrossing donors of key genes can be a powerful tool for introducing novel alleles that are at zero or low frequency into the elite gene pool while preserving the availability of genetic variance immediately around them for recombination and the improvement of quantitative traits.

Managing the deployment of single genes within the breeding program due to limitations imposed by their frequency within the ECP is not the only consideration future breeding efforts based on this germplasm resource must consider. Since quantitative traits are not governed by single genes, location main effects and genotype by environment interactions must be routinely accounted for in phenotypic analysis strategies to factor out their strong influence on phenotypic outcomes. Because the primary source of data for determining the breeding values for yield that were used to identify the ECP were trials conducted in the Philippines, an understanding of yield performance in other target environments was also necessary. In order to evaluate the performance of the ECP lines relative to local and IRRI checks in relevant geographies outside the Philippines, six breeding trials were conducted in India, Kenya, and Tanzania. BLUP values centered on the mean performance of each location indicate strong performance of the ECP lines relative to the highest yielding local checks in each location. The performance relative to IRRI varieties and the local checks is a strong indication that the observed genetic variance is manifested as phenotypic variance within each location, indicating that crossing and selection among high performing lines within each breeding zone is likely to result in genetic gain for yield. Further analysis of genotype x environment interactions and the genetic correlations between target environments is warranted to help identify a global testing strategy that best leverages limited public resources available to the IRRI breeding program.

Conclusion

Achieving short- and medium-term genetic gains for yield is a key target for almost every breeding program. In the case of the IRRI’s breeding program for irrigated systems, the rate of genetic gain for grain yield was estimated at 17.36 kg·ha⁻¹ year⁻¹ (0.46%) for released varieties. This rate of gain appears to be largely limited by long cycle times and the re-introduction of old material or landraces into the elite pool. This observation highlights the need to optimize the breeding strategy for quantitative traits by using quantitative genetics principles to get closer to the annual 1.5% gain in grain yield needed to cover the expected increase in rice consumption. The elite core panel identified and characterized in this study is a key component of this optimization. Indeed, recurrent selection with short cycles based on elite-by-elite crosses implemented at IRRI to deliver a higher rate of genetic gain for grain yield requires careful management of the genetic diversity, which starts with a comprehensive characterization of the most elite germplasm.

Materials and Methods

Historical Yield Data

Experimental Studies and Pedigree Information

All the yield data from trials conducted by the irrigated breeding program during the period 2012–2016 across multiple locations were retrieved from the IRRI database. From these trials, the following phenotypic information was extracted: plant height, number of days to flowering, grain yield and number of hills per plot. The phenotypic information extracted were filtered based on the following quality criterion: presence of a randomized experimental design, percentage of missing data for grain yield and flowering time lower than 15%, and harvested area greater than 2 m². We considered an environment as the combination of location, year and season. The environments considered varied in experimental designs according to either a row-column, alpha-lattice, augmented randomized complete block, or ordinary randomized complete block design (RCBD). A total of 102 studies were conducted in 23 environments with a total of 17,216 lines from which 15,286 were sorted out as irrigated rice lines data after filtering (Additional file 1: Table S1). All the studies were conducted in the Philippines, 51 studies having been planted during the wet season and another 51 studies during the dry season. The pedigree information for the selected lines was extracted from the IRRI genealogy management system (McLaren et al. 2005) database using custom scripts. The date of the initial cross was also retrieved for all the breeding lines whose crossing year information was available in the database (16,317 lines). The pedigree information was also used to compute equivalent complete generations (EqG) (Boichard et al. 1997; Gutiérrez et al. 2008; Leroy et al. 2013) for each line. EqG for a given line was calculated as follow:

$$EqG=\sum _{i = 1}^{n}\left(\frac{1}{{2}^{{g}_{i}}}\right)$$

where g_i represents the number of generations between the line and its ancestor i (one for the parents, two for the grandparents, etc.).

Estimating Breeding Values for Lines

A two-stage mixed model analysis (Piepho et al. 2008; Smith and Cullis 2018) using grain yield data as response variable was used to estimate the breeding values of each line. The two-stage mixed model analysis was adopted to account for specific experimental design layouts across the environments (Damesa et al. 2017). In the first stage, each trial or environment (combination of location, year and season) was analyzed separately and best linear unbiased predictors (BLUPs) were extracted per environment using the following baseline mixed-model:

$$y_{ij} = \mu + g_{i} + \cdots + \varepsilon_{i}$$

(1)

where y_ij represents grain yield for ith observation, μ is the overall mean, g_i is the random effect of ith genotype with iid g_i ∼ N(0, Iσ²_g) where σ²_g is genetic variance and ε_ij is the residual error with iid ε_ij ∼ N(0, Rσ²_ε). To account for heterogeneous error variance caused by differences in the numbers of hills harvested from plot to plot and from trial to trial, the diagonal of R was set to h/h_max where, h is the number of hills harvested and h_max is the maximum number of hills harvested in the environment. The … in the model denotes the blocking factors and a covariate for missing hills which were conditional to the trial. These terms were included in the model because they were identified as improving model fit during analyses of individual trials. Blocking factors were considered random if they had more than five levels. The possible blocking factors were modelled to determine which factors led to the lowest Bayesian information criterion (Spilke et al. 2010; Piepho et al. 2015). For trials that followed a row-column design, the possible factors were row and column, for those following a partially replicated design, the possible factors were row, column, replicate, and block, for those following a RCBD or augmented RCBD, the possible factor was replicate, for those following an alpha-lattice design the possible factors were replicate, block nested within replicate, row, and column. The model with lowest Bayesian information criterion was selected and used to extract BLUP of each line and their prediction error variances (PEV) were obtained for each environment. Reliabilities of the BLUPs were estimated according to $r^{2} = 1 - \frac{PEV}{{\sigma_{g}^{2} }}$. The process for BLUP estimation per environment was repeated for days to flowering.

In the second stage model, the BLUPs obtained from the first stage model were de-regressed by dividing by the reliability as described in Garrick et al. (2009), and used as response variable in the second stage pedigree-based mixed model analysis. The de-regressed BLUPs for yield within each environment were modeled according to Bates et al. (2014). The model used is as follows:

$$y_{ij} = \mu + g_{i} + e_{j} + \varepsilon_{ijk}$$

(2)

where y_ij is the de-regressed BLUP of each line in environment j, μ is the overall mean, g_i is a random effect of line i with g_i ∼ N(0, Aσ²_g) where σ²_g is the genetic variance and A is the additive genetic relationship matrix based on pedigree, e_j is a fixed effect of the environment j, ε_ij is the residual error with ε_ij ∼ N(0, Rσ²_ε) where R is a matrix proportional to the residual error covariance matrix and σ²_ε is the error variance. To account for heterogeneous error variance, the diagonal of R was 1/r². In the above model yield was adjusted using days to flowering as covariate in the model. The R packages lme4 (Bates et al. 2015) and pedigreemm (Bates et al. 2014) were used to implement the models.

Assessment of Rate of Genetic Gain

Genetic gain was assessed using breeding values following the procedure reviewed by Garrick (2010). Briefly, for each year, the breeding values obtained were regressed on the year when the cross was made to get the genetic gain trends.