Genetic structure of Thai rice and rice accessions obtained from the International Rice Research Institute

Background Although the genetic structure of rice germplasm has been characterized worldwide, few studies investigated germplasm from Thailand, the world’s largest exporter of rice. Thailand and the International Rice Research Institute (IRRI) have diverse collections of rice germplasm, which could be used to develop breeding lines with desirable traits. This study aimed to investigate the level of genetic diversity and structures of Thai and selected IRRI germplasm. Understanding the genetic structure and relationships among these germplasm will be useful for parent selection used in rice breeding programs. Results From the 98 InDel markers tested for single copy and polymorphism, 19 markers were used to evaluate 43 Thai and 57 IRRI germplasm, including improved cultivars, breeding lines, landraces, and 5 other Oryza species. The Thai accessions were selected from all rice ecologies such as irrigated, deep water, upland, and rainfed lowland ecosystems. The IRRI accessions were groups of germplasm having agronomic desirable traits, including temperature-sensitive genetic male sterility (TGMS), new plant type, early flowering, and biotic and abiotic stress resistances. Most of the InDel markers were genes with diverse functions. These markers produced the total of 127 alleles for all loci, with a mean of 6.68 alleles per locus, and a mean Polymorphic Information Content (PIC) of 0.440. Genetic diversity of Thai rice were 0.3665, 0.4479 and 0.3972 for improved cultivars, breeding lines, and landraces, respectively, while genetic diversity of IRRI improved and breeding lines were 0.3272 and 0.2970, respectively. Cluster, structure, and differentiation analyses showed six distinct groups: japonica, TGMS, deep-water, IRRI germplasm, Thai landraces and breeding lines, and other Oryza species. Conclusions Thai and IRRI germplasm were significantly different. Thus, they can be used to broaden the genetic base and trait improvements. Cluster, structure, and differentiation analyses showed concordant results having six distinct groups, in agreement with their development, and ecologies. Electronic supplementary material The online version of this article (doi:10.1186/1939-8433-5-19) contains supplementary material, which is available to authorized users.


Background
Genetic diversity and population structure of cultivated rice (Oryza sativa L.) have been studied worldwide (Garris et al. 2005;Yu et al., 2003;Zhao et al. 2010;Ali et al. 2011;Chen et al., 2011). However, only few Thai rice germplasm has been included in these studies (Garris et al. 2005;Yu et al., 2003), and to our knowledge, there is no report on genetic structure of Thai commercial cultivars grown in different ecologies in the country. Thus, there is a lack of information on genetic structure of Thai rice. Thailand is the world's largest exporter of rice, and is famous for high-quality, long-grain white rice, because the breeding of Thai rice has been focused on maintaining good grain characteristics and quality. Thailand has a large collection of diverse rice germplasm, including the famous Thai jasmine rice (Chitrakon and Somrith 2003).
In Thailand, rice ecologies can be classified as irrigated, rainfed lowland, deep water, and upland ecosystems. Rainfed lowland is the majority of rice growing area, followed by irrigated, deepwater and upland. Different ecologies also results in different amount of rice production. The highest average rice yield was from irrigated ecosystem, followed by deep water, rainfed lowland, and upland ecosystems. The average rice yield in the wet season is constant at about 2 t/ha (http://www. fao.org/). By its shape and geography, Thailand can be divided into four regions: the mountains and forests of the North; the vast rice fields of the Central Plains; the semi-arid farm lands of the Northeast plateau; and the tropical islands and long coastline of the peninsula South. Each region has different rice-growing environments. The Northern region produces 25% of the total rice production where upland rice is grown on hilly areas, while lowland rice is grown in lower valleys and some terraced fields. The Central region produces 30% of the total rice production where rice is planted almost everywhere across the area in wet season, and in dry season rice is planted to irrigated area of about 450,000 ha. The Northeastern region produces 41% of the total rice mainly rainfed lowland rice. Only a small portion of the total rice is produced from the Southern region where rice is planted in the west and east coasts of the peninsula (www.irri.org).
Oryza sativa is composed of two major subspecies, Indica and Japonica (both tropical and temperate) and several ecotypes. Several efforts have been made to assess the genetic diversity within Oryza sativa at both phenotypic and molecular levels. To estimate genetic diversity among Oryza species, several types of molecular markers, particularly simple sequence repeats (SSR), have been used (Yu et al. 2003;Hashimoto et al. 2004;Garris et al. 2005; Thomson et al. 2007;Wen et al. 2009;Ishii et al. 2011, Zhang et al., 2011. Polymorphisms in the SSR region are considered the results of different replications of repeated sequences, resulting in different sizes of the PCR products. However, alleles with different sequences but having the same length may yield ambiguous results of the phylogenetic analysis. Sequencing SSR products can provide clear information on the evolutionary history of these loci (Sunnucks et al. 2000;Provan et al. 2004). Alternatively, singlestranded conformation polymorphism (SSCP), a simple and rapid method to determine sequence variation in a large number of samples without expensive direct sequencing, was proposed to use for genotyping and mapping genetic diversity in crop plants (Kuhn et al., 2008). SSCP is a very sensitive technique for the detection of single point mutations between different DNA fragments (Grieu et al. 2004;Muangprom et al. 2005). Recently, SSCP has been used in crop studies, such as marker assisted selection (Borchert and Hohe 2009), comparative genomics (Castelblanco and Fregene 2006), phylogenetics (Rousseau-Gueutin et al. 2009) and fitness effects of crop QTLs (Baack et al. 2008). Furthermore, the recent availability of rice genome sequences provides the opportunity to select genes/sequences distributed in the genome as SSCP markers.
The International Rice Research Institute (IRRI) has large collections of characterized rice germplasm. These germplasm could be used to develop breeding lines with desirable traits, and they are available to other countries. Several of IRRI germplasm have been used for improvement of rice breeding programs in Thailand. Understanding the genetic diversity and genetic relationships among Thai and IRRI germplasm is useful for parent selection to produce hybrids or to improve rice population (Moose and Mumm 2008). Although Thailand is famous for its rice, genetic characterizations of Thai rice at the molecular level are very limited. Therefore, the aims of this study were to evaluate the level of genetic diversity and to assess genetic relationships of Thai rice germplasm, and germplasm with desirable traits obtained from IRRI.

InDel marker development and polymorphisms of the SSCP markers
By testing the 4 selected rice accessions with the 98 InDel markers, only markers that were presented as single copy and showed polymorphism in at least 3 out of the 4 rice accessions were selected for genetic analysis. A total of 19 InDel markers were used to evaluate genetic diversity in 101 rice accessions (Table 1). These InDel markers were chosen from 9 out of 12 rice chromosomes and most of them were genes annotated with diverse functions, as listed in Table 2.

Genetic diversity and genetic difference among groups of populations
Using 19 SSCP InDel markers, genetic diversity of Thai rice, IRRI germplasm, and other Oryza species were 0.436, 0.322, and 0.547, respectively (Table 4). To determine genetic difference among the three groups, we performed AMOVA and pairwise analyses. The AMOVA results showed that 15.06% of the variation was caused by differences among groups, while the remaining 84.94% was caused by differences within groups. The pairwise F st estimates among these three groups indicated that all the three groups were significantly different from each other.  Because IRRI germplasm was used to improve rice breeding in Thailand, in this study we tested for genetic diversity and genetic difference among the groups of Thai and IRRI rice samples to determine the effects of each classification, such as improved cultivars, breeding lines, and landraces. The results showed that genetic diversity of Thai rice were 0.367, 0.448 and 0.398 for improved cultivars, breeding lines, and landraces, respectively. On the other hand, genetic diversity of IRRI improved and breeding lines were 0.327 and 0.297, respectively (Table 4). To determine genetic difference among the six groups, which are Thai improved cultivars, Thai breeding lines, Thai landraces, IRRI improved cultivars, IRRI breeding lines, and the other Oryza species, the AMOVA and pairwise analyses were performed. The AMOVA results showed that 13.04% of the variations were caused by differences among groups, while the remaining 86.96% were caused by differences within groups. Pairwise F st estimates among groups ranged from 0.029 to 0.349. There were no significant difference among Thai improved cultivars, Thai local breeding lines and Thai landraces. Similarly, there were no significant difference between IRRI improved and IRRI breeding lines. However, all Thai groups, classified as Thai improved cultivars, Thai local breeding lines and Thai landraces were significantly different from IRRI improved and IRRI breeding lines ( Table 5).

Clustering of rice accessions using SSCP InDel markers
The UPGMA cluster diagram differentiated 4 species of other Oryza species and showed two major groups that correspond to the Indica and Japonica subspecies  Figure 1). Group (G) I was Japonica rice. Using Nipponbare as a representative for the temperate Japonica and Azucena (IRRI breeding line) as a representative for the tropical Japonica, five Thai and one IRRI accessions were grouped with Japonica rice lines by clustering closer to Azucena. All the Thai accessions in this group are upland rice. Group (G) II were clusters of Indica, which includes four sub-groups: GII-1, GII-2, GII-3, and GII-4, and additional two isolated single accessions. The GII-1 had 11 accessions which were all Thai landraces and Thai local breeding lines, except O. rufipogon, and one improved cultivars (No. 8, SRN1). KDML105 (No. 100), which is the famous Thai jasmine rice, was also sorted into this group by clustering with its derivative (No. 10, RD15) and a Thai landrace (No. 37, KD). The GII-2 had eight accessions that were all TGMS types, with the exception of No. 29. The GII-3 was the largest group with 60 accessions, most of which were IRRI germplasm. It should be noted that seven of the Thai improved lines and three Thai breeding lines were in this group (Figure 1). The GII-4 had eight accessions, which majority were deep water lines. The two isolated single accessions were Thai irrigated line (improved, No. 6, PTT1) and Thai rainfed central plain (breeding, No. 13, LPT123).

Genetic structure and differentiation
Using the data of 19 polymorphic InDel markers, the model-based method was performed to determine the genetic structure among all 101 samples. The Bayesianbased clustering method demonstrated that the highest log likelihood score was obtained when the number of populations (k) was equal to six. The population structure based on the k = 6 showed similar results to the UPGMA tree ( Figure 1) by sorting rice into 6 different color-coded   accessions in this group were Thai landraces (No. 30,31,35,37), while the remaining accessions were all Thai commercial lines from the Northeast.
To determine genetic diversity and genetic difference in the rice samples according to population structure, we reanalyzed the data according to the population structure in Figure 2. The results showed that genetic diversity of Japonica, TGMS, deep water and mixtures, IRRI germplasm, the other Oryza species, and Thai landraces and breeding lines were 0.256, 0.239, 0.303, 0.255, 0.565, and 0.355, respectively ( Table 6). The AMOVA results showed that 35.28% of the variations were caused by differences among groups, while the remaining 64.72% were caused by differences within groups. Pairwise F st estimates among groups ranged from 0.204 to 0.680, and indicated that these groups were significantly different from each other.

Discussion
Molecular characterization is the alternative approach to overcome several limitations of morphological characterization, which are high experimental cost, long evaluation time, and environmental effects. We reported genetic analysis in rice using groups of SSCP InDel markers, most of which were developed from putative rice genes containing short InDel. The SSCP technique can overcome the limitation of the SSR technique, which could not distinguish different DNA sequences when the DNA fragments are of the same length. DNA separation on SSCP gels is based on both size and conformation, which is determined by the DNA primary structure. Typically, single-copy amplifications showed the twoband SSCP profile, indicating a separation of sense and anti-sense strands. Figure 2 Estimated population structure using k = 6; individual rice line is represented by a vertical bar broken into colored segments, with lengths in proportion to Q values: red, Japonica; green, TGMS; dark blue, Deep water rice and mixtures; yellow, IRRI germplasms; pink, the other Oryza species; light blue, Thai landraces and breeding lines. The numbers marked below each line indicate the rice accession numbers as shown in details in Table 1; 1-26 and 100 are commercial Thai rice lines; 27-37 are landraces with selected traits; 38-42 are the other Oryza species; 43-99 are germplasm with desirable traits from IRRI; 101 is Nipponbare. Previously, SSR have been used to determine genetic variation in rice. The reported number of allele per locus, genetic diversity, and Polymorphism information content (PIC) were ranged from 4.8-14.0, 6.2-6.8 and 0.63-0.70, respectively (Ni et al. 2002;Garris et al. 2005;Pessoa-Filho et al. 2007;Ram et al. 2007). Very recenly, Ali et al. (2011) genotyped 409 Asian rice accessions originated from 79 countries representing all the major rice growing regions of the world using 36 SSR markers. They reported an average of 9.17 alleles per marker (range from 2 to 24), a mean genetic diversity of 0.68, and an average PIC of 0.63. In addition, Chen et al. (2011) studies genetic diversity of 300 rice accessions representing major geographic areas of rice growing countries in the world using 372 SNP markers. They detected 744 alleles at 372 markers, an average gene diversity of 0.358, and an average PIC of 0.285. Using 19 SSCP InDel markers to determine genetic variation in 101 rice accessions, we found that the 19 markers produced the average number of allele of 6.68, and the average PIC of 0.44. Our results on average allele per locus, genetic diversity, and PIC were higher than that reported by the study using SNP marekers (Chen et al., 2011). Comparing to other studies using SSR, our result on the average number of allele per locus was comparable to several studies (Ni et al. 2002;Yu et al. 2003). However, the average PIC produced by our method was lower than that reported by the other studies using SSR, concordant with the earlier study in pearl millet, which reported the average PIC value of 0.49 by SSCP relative to the SSR value of 0.72 tested on the same genotype panel (Bertin et al. 2005).
The subset of 19 markers selected from the total of 98 markers provided the resulting groups corresponding to the Indica and Japonica subspecies, and correlated quite well with their ecologies and their known information on their development. Previous study showed that genetic diversity was also determined in genus Orysza using 11 ISSR markers selected from 30 ISSR markers (Joshi et al. 2000). Small numbers of markers can be used to estimate genetic diversity as shown in the earlier study when a subset of 30 markers provided the same results as using all 111 markers with the same genetic distance matrices and dendrograms (Ni et al. 2002). Similarly, Ali et al. (2011) also showed that a subset of 36 SSR markers gave nearly the same results as using 169 SSR markers for population structure analysis.
Here we showed that SSCP InDel markers can be used to study plant breeding. SSCP Indel gene-based markers are very specific and can utilize their known positions in the rice genome. In addition, SNP and InDel are abundant in rice (Feltus et al. 2004;Shen et al. 2004;Chen et al., 2011), which allows for the development of InDel markers even in small target areas.
Our results from the analysis of the 3 main rice groups showed that the other Oryza species had the highest genetic diversity, followed by Thai rice lines and IRRI germplasm. Similar to the study using Indian germplasm, the genetic diversity of the other Oryza species in this report is higher than that of the cultivated rice (Ram et al. 2007). However, the genetic diversity of the other Oryza species used in this report (0.55) is higher than the genetic diversity of the 7 wild rice species (0.436) reported by Ram et al. (2007). Our results indicated that Oryza brachyantha, No. 40, (FF genome) is the most divergent species among the other Oryza species, which is also in agreement with reports from previous studies (Joshi et al. 2000, Jacquemin et al. 2009Lu et al. 2009). Similarly, Oryza officinalis No. 42, (CC genome) and Oryza latifolia No. 41, (CCDD genome) are grouped together, and separated from Oryza brachyantha, supporting that they are in the officinalis complex, including diploid CC and tetraploid CCDD genomes (Joshi et al. 2000).
The selected IRRI germplasm (including 57 rice accessions from several countries) showed lower genetic diversity than that of the Thai commercial cultivars (improved and breeding lines) and Thai landraces. The Thai breeding lines had the highest genetic diversity while the IRRI breeding lines had the lowest. Interestingly, genetic diversity of Thai landraces was lower than that of Thai breeding lines, possibly because the Thai landraces were selected only from the North and the Central parts of the country, while the breeding lines were from all over Thailand.
Our cluster analysis showed two major groups of Oryza sativa corresponding to the Indica and Japonica subspecies, similar to other studies (Wen et al. 2009;Chen et al., 2011). Both cluster and structure analyses separated a group of other Oryza species and showed that O. rufipogon (No. 38), which is considered the progenitor of Oryza sativa, was most related to Oryza sativa. Interestingly, the famous Thai jasmine rice, KDML105 (No. 100) was grouped with Thai landraces and breeding lines, and O. rufipogon, which indicates that the strain is native to Thailand. Our analyses support the existences of five subpopulations of Oryza sativa, similar to earlier studies (Garris et al. 2005;Zhao et al. 2010, Ali et al., 2011Chen et al., 2011). Several rice accessions included in our study were used in the report by Zhao et al. (2010), and classifications of their subpopulations were concordant. However, we did not have known samples of aus and GroupV subpopulations (Zhao et al. 2010) in this study, thus we can not indicate if some of our subpopulations were aus and GroupV. Some of Thai upland rice lines and one IRRI germplasm were grouped with a well known tropical Japonica, Azuzena, which further suggests that these upland rice lines are also tropical Japonica. All irrigated rice lines, which were improved cultivars, have genetic backgrounds of IRRI germplasm supporting the results of cluster and structure analyses. The results from diversity analysis showed that the Thai germplasm were more diverse than the tested IRRI germplasm, concordant with the results from structure analysis.
Although Thailand is quite small, 513115 sq. km. (approximately the same size as France), different ecologies and geography in distinct parts of the country could have some effects on our rice diversity. The first three of the four rice ecologies: irrigated, deep water, upland, and rainfed lowland ecosystems, showed distinct groups for both cluster and structure analyses. The irrigated rice lines, planted in irrigated areas in the Central plain, were grouped together and showed similar genetic structure to IRRI germplasm. Most of the tested upland rice lines were planted in hilly areas in the North, and they were grouped with Japonica rice. Deep water rice lines were planted in specific areas having high levels of water in the Central plain, and they were clustered together in a distinct group. Interestingly, three out of the four rainfed lowland rice lines from the South showed genetic structure that is similar to the deep water rice. In addition, groups that were sorted by population structure also displayed significant genetic difference among them.
All tested TGMS lines controlled by 3 tgms genes were grouped together by cluster analysis into one sub-group of Indica. Results from structure analysis also supported this information, with the exception of tms2 KDML105, which showed genetic structure similar to the group of the other Oryza species. Results from cluster and structure analyses indicated that tms2 KDML105 (No. 50) was quite distinctive from the other TGMS lines, and it contained several different genetic fragments, including the other Oryza species (Q = 0.369), TGMS (Q = 0.266), Thai landraces and breeding lines (Q = 0.141), and IRRI germplasm (Q = 0.101), in agreement with its genetic background containing some part of Japonica genome (Pitnjam et al. 2008), and its development (Lopez et al. 2003). C21489 (No. 29), a cold-resistance, Thai landrace with no available information on sterility, is also clustered with the TGMS sub-group. However, the structure analysis showed that it had lower Q value.

Conclusions
Our study showed the utility of SSCP InDel markers for genetic analysis of Thai and IRRI rice germplasm, as alternative to SSR markers. The resulting genetic structure and differentiation of these samples were in agreement with their ecologies and their known information on their development. The results indicated that genetic diversity of Thai commercial rice lines (improved cultivars and local breeding lines) and Thai landraces were higher than that of the tested rice germplasms obtained from IRRI. Our molecular analysis indicated that some of our cultivars were japonica rice, and genetic diversity is present in this set of the available germplasms. Differentiation analysis indicated that groups of IRRI germplasm were significantly different from Thai groups. Thus, these germplasm can be used to broaden the genetic base and trait improvements in rice breeding programs. Cluster and structure analyses showed concordant results having six distinct groups, and differentiation analysis supported that they were significantly different from each other. The results also indicate that TGMS lines which could be used as female parents were different from the other groups making them good candidates used to create rice hybrids having high yields. Genetic diversity and genetic relationship among these germplasms will be useful for parental line selection used in rice breeding programs and in hybrid production.

Plant materials
A total of 101 rice accessions, which include 43 Thai accessions, 57 germplasm obtained from the International Rice Research Institute (IRRI), and Nipponbare were used in this study (Table 1). The Thai accessions included 27 commercial cultivars, 11 landraces, and 5 other Oryza species. The commercial cultivars were cultivars grown in all rice ecologies through out the country such as irrigated area, rain fed low land, up land, and deep water. These commercial cultivars were 11 improved and 16 local breeding lines. The improved lines were cultivars with high yielding and/or cultivars with agronomic desirable traits. The improved cultivars were classified by their development through crossing among local cultivars and/or with other genetic sources, and their recent pedigrees were known (Table 1). The local breeding cultivars had not been bred through modern breeding procedures and their precise pedigrees were unknown. The landraces were local lines, which have not been planted for commercialization, but they have different special traits (Table 1). The other Oryza species included O.glaberrimma, African cultivated rice, and four species of wild rice: O. rufipogon, O.branchyantha, O. latifolia, and O. officcinalis. The rice accessions obtained from IRRI were groups of germplasm having agronomic desirable traits, including temperature-sensitive genetic male sterility (TGMS), new plant type, early flowering, and biotic and abiotic stress resistances (Table 1). Two accessions, KDML 105 (the premium jasmine rice) and Nipponbare, which are well known Indica and Japonica rice respectively, were used as controls for genetic diversity analysis and as references for control of allele sizing variation between electrophoresis runs. Nipponbare was not included in