Whole-Genome Sequencing of Rice Mutant Library Members Induced by N-Methyl-N-Nitrosourea Mutagenesis of Fertilized Egg Cells

Although targeted genome editing technology has become a powerful reverse genetic approach for accelerating functional genomics, conventional mutant libraries induced by chemical mutagens remain valuable for plant studies. Plants containing chemically induced mutations are simple yet effective genetic tools that can be grown without regard for biosafety issues. Whole-genome sequencing of mutant individuals reduces the effort required for mutant screening, thereby increasing their utility. In this study, we sequenced members of a mutant library of Oryza sativa cv. Nipponbare derived from treating single fertilized egg cells with N-methyl-N-nitrosourea (MNU). By whole-genome sequencing 266 M1 plants in this mutant library, we identified a total of 0.66 million induced point mutations. This result represented one mutation in every 146-kb of genome sequence in the 373 Mb assembled rice genome. These point mutations were uniformly distributed throughout the rice genome, and over 70,000 point mutations were located within coding sequences. Although this mutant library was a small population, nonsynonymous mutations were found in nearly 61% of all annotated rice genes, and 8.6% (3248 genes) had point mutations with large effects on gene function, such as gaining a stop codon or losing a start codon. WGS showed MNU-mutagenesis using rice fertilized egg cells induces mutations efficiently and is suitable for constructing mutant libraries for an in silico mutant screening system. Expanding this mutant library and its database will provide a useful in silico screening tool that facilitates functional genomics studies with a special emphasis on rice.


Background
Nearly a century has passed since two geneticists, Muller (1927) and Stadler (1928), first reported that X-rays could induce mutations and increase the mutation rate compared with spontaneous mutations. After this landmark discovery, additional mutagens have been discovered and used in plant breeding and plant genetic research. As of 2021, 3365 cultivars (229 plant species) originating from artificially produced mutants have been developed and released (the Joint FAO/IAEA mutant database administered by the Food and Agriculture Organization/International Atomic Energy Agency, 2021). Of the 3365 cultivars, 25.3% (853 cultivars) are rice (Oryza sativa L.) mutants. This fact indicates that Open Access † Takahiko Kubo and Yoshiyuki Yamagata have contributed equally to this work *Correspondence: takubo@agr.kyushu-u.ac.jp 1 Faculty of Agriculture, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan Full list of author information is available at the end of the article artificial mutants are good sources of genetic variation for cultivar improvement, especially in rice.
Chemical mutagens are relatively easy to use agents. Alkylating agents such as ethyl methanesulfonate (EMS) and N-methyl-N-nitrosourea (MNU) are frequently used to induce mutations in plants. Alkylating agents alkylate guanine nucleobases, resulting in G/C to A/T transitions irrespective of the genomic region. Mature, dry seed is most commonly used in chemical mutagenesis treatments because alkylating agents are toxic for growing plants. Satoh and Omura (1979) developed an effective mutagenesis method using MNU treatment of fertilized egg cells at the single-cell stage immediately after fertilization. Chemical mutagen treatment of fertilized egg cells yields a higher mutation efficiency than dry seeds for two reasons: (1) M 1 chimeric plants appear less frequently, and (2) selection of mutant cells due to competition between normal and mutant cells is reduced. The mutation efficiency of rice fertilized egg cells at the single-cell stage is twice as high as that of dry seeds (Satoh and Omura 1979).
Many genes responsible for morphological and physiological abnormalities have been identified using the MNU mutant libraries, demonstrating their usefulness in forward genetic approaches (Viana et al. 2019). Since chemical mutagens induce mutations randomly throughout the genome, a mutation screening step is required for each target gene. Targeting Induced Local Lesions In Genomes (TILLING) is one of the major screening methods to identify mutations in the targeted gene . The TILLING method has identified numerous missense and nonsense mutations in genes that have not been previously identified in forward genetic screenings and have helped elucidate gene functions in plants (Kurowska et al. 2011). Thus, the MNU mutant libraries are useful as forward and reverse genetic tools in plant functional genomics.
As mentioned above, the TILLING method is an effective technology for obtaining mutants by reverse genetic screening; however, the conventional TILLING method requires a substantial amount of laboratory work to PCR screen mutants for each target gene (e.g., PCR evaluation of more than 2000-4000 individuals for nonsense or missense mutations). Recent next-generation sequencing (NGS) technology allows cost-and time-efficient determination of whole-genome sequences from multiple samples. NGS has been proposed to promote the mass identification of induced mutation sites and the construction of an in silico TILLING system as an online platform for screening mutations in target genes . Since rice has the smallest genome among cereal species and a high-quality reference genome, genomewide sequencing of many individuals is becoming feasible. In this study, we sequenced a small MNU mutant library consisting of 266 M 1 plants derived from the Japanese rice cultivar, Nipponbare, to examine the feasibility of an in silico TILLING system.

Plant Materials and Mutagenesis
The rice cultivar Nipponbare (Oryza sativa L. ssp. japonica) maintained at Kyushu University was used for mutagenesis. MNU mutagenesis of fertilized egg cells was conducted according to the method of Suzuki et al. (2008). Briefly, panicles with freshly pollinated flowers were dipped in a 1.0 mM MNU solution for 45 min at approximately 25 °C at 18 h after flowering. The M 1 plants were grown and self-pollinated to obtain the leaf samples and M 2 seeds in 2017-2019. The seed-setting frequency was measured from three panicles of each M 1 plant.

Whole-Genome Sequencing (WGS)
Total genomic DNA from the M 1 plants was extracted from leaf samples frozen in liquid nitrogen using the CTAB method with some modifications (Murray and Thompson 1980). DNA libraries were constructed with a TruSeq Nano DNA Library Prep Kit (Illumina Co., Ltd.) and a NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB). Pair-ends sequencing (2 × 150 bp) was conducted using an Illumina HiSeq X, HiSeq 2500, and NovaSeq 6000 (Illumina Co., Ltd.).

Validation of the NGS Variants
Validation analyses of the NGS variants were accomplished by PCR and Sanger sequencing on an ABI3730xl DNA analyzer with BigDye Terminator v3.1. The same DNA samples used for the NGS analysis of the M 1 plants were used for Sanger sequencing. Template DNAs containing the aimed mutations were amplified by PCR with KOD-Plus ver.2 polymerase (Toyobo, Osaka, Japan). Primers were designed based on the Nipponbare genome sequence using Primer3 (Untergasser et al. 2012) with default parameters. Primer sequences are listed in Additional file 1: Table S1. The PCR conditions were 35 cycles at 94 °C for 15 s, 60 °C for 20 s and 68 °C for 30 s. Amplicons were purified with NucleoSpin Gel and PCR Cleanup kits (Takara-Bio) and used for Sanger sequencing. An equal amount of leaf tissue from 8-12 M 2 individual plants was collected to investigate the transmission frequency of mutations. DNA extracted from the bulked M 2 samples was used for PCR and Sanger sequencing.

Mutation Detection by Whole Genome Sequencing
In this study, we refer to MNU-induced single nucleotide substitutions and insertions/deletions as single nucleotide variants (SNVs) and InDels, respectively. To develop a mutant library for screening in silico mutants, cv. Nipponbare was mutagenized with MNU. Since Satoh et al. (2010) reported a weak negative correlation between the seed-setting percentage of M 1 plants and the mutation rate, we selected 266 plants with a reduced ability to set seed from 4191 M 1 individuals grown in 2017-2019. As a pilot study, we first sequenced the complete genomes of all 266 M 1 plants by NGS analysis. The obtained shortread sequences (5.2-19.1 Gb for each individual, Average 8.5 Gb) were mapped onto the Nipponbare reference genome to identify small mutations such as SNVs and InDels. The average depth of coverage for each M 1 individual genome was 14.9 times, and the average breadth of coverage of the reference genome was 95.3% (read depth ≧ 4) (Additional file 1: Table S2). Unique heterozygous variants present only in a single plant were defined as positive variants based on GATK joint genotyping for variant calls of multiple samples. Initially, 1,163,678 putative variants were detected [low-quality variants were cut off at a default setting quality value (QV) < 30 on the joint genotype vcf output] (Additional file 2: Fig. S1). To validate and determine reliable mutation sites in the M 1 plants, 101 SNVs from eight individual M 1 plants and their progeny (bulked M 2 plant samples) were sequenced by Sanger sequencing (Additional file 2: Fig. S2). These SNVs were randomly selected from a broad range of QVs of SNVs having functional effects on IRGSP-1.0 genes, as described in the next paragraph "Characterization of SNVs". Filtering at a threshold of QV > 80 was found to lead to a higher true positive percentage (98.7% in QVs > 80 threshold) (Additional file 2: Fig. S3, Additional file 1: Table S3). Therefore, we filtered out the lower quality SNVs (QVs > 80) and eventually obtained 656,669 SNVs from 266 M 1 plants. The average number of SNVs per individual was 2468.7, ranging from 134 to 13,222, and the mutation rate was estimated to be 6.4 × 10 6 per nucleotide (6.4 SNVs/Mb) (Additional file 1: Table S2). Of these SNVs, 91.2% were G/C to A/T transitions (Ts), whereas the A/T to G/C transitions and transversions (Tv) were less frequent (8.8%) ( Table 1). The ratio of Ts/ Tv was 14.81. The SNVs were equally distributed across the rice genome, both within and among chromosomes  S4). The false-positive percentage of InDel mutations was higher (31.3%, 5 of 16 InDels) than SNVs in the Sanger sequencing validation results (Additional file 1: Table S5).

Functional Characterization of SNVs
We annotated the functional effect of SNVs with the snpEff program. Most of the SNVs were located in intergenic regions (22.6%) or upstream/downstream regions (33.3/33.5%); 4.1% of SNVs were located in exons (Fig. 2). Of approximately 80 thousand SNVs found in the IRGSP-1.0 annotated gene database, 50,405 and 1753 SNVs led to missense and nonsense mutations, respectively (Table 2). More than 60% of all annotated genes we investigated (37,662 genes in IRGSP-1.0 and 55,718 genes in MSU7) had missense mutations, and approximately 5% had nonsense mutations. Of the 37,662 IRGSP-1.0 genes, 3248 genes had SNVs annotated as having high impact effects, including the loss of start and stop codons or splicing acceptor/donor sites (Additional file 1: Table S6). Therefore, approximately 70% of the annotated rice genes were covered by either nonsense or missense mutations in the 266 M 1 mutants. In total, 17,052 genes had a synonymous variant that was classified as having a low-impact effect (Additional file 1: Table S6). We next investigated nucleobase bias in the upstream and downstream (± 20 bp) regions of all mutated guanine bases (G). A remarkable bias in nucleotide frequency was found at − 1 bp relative to G, where purine bases (A and G) were increased (15.8% for A and 18.7% for G) compared with randomly selected G bases (Additional file 2: Fig. S5). Moreover, a slight increase in T (6.4%) was found at + 1 bp. Other than at the − 1 and + 1 bp positions, there were no remarkable biases in the base composition within the 40 bp region we examined.

Phenotypic Effects of Mutations
To test the utility of our sequenced mutant library, we investigated the transmission frequencies of point  Table S7). For the InDels, 72.7% (8 of 11) were transmitted to the M 2 progeny. We assume that some of the reduced transmission frequencies can be attributed to mutations affecting genes responsible for gamete development. Next, we investigated the relationship between the mutation rate and the phenotypic change frequency. The seed-setting percentages of the 266 M 1 plants used for WGS analysis ranged from 0 to 87.1% (Average 34.7%). A weak negative correlation (r = − 0.43, p < 0.001) was observed between the number of SNVs per individual and the seed-setting frequency of the M 1 individuals (Fig. 3). In particular, M 1 individuals with lower levels of seed settings (0-20% seed setting) tended to have a larger number of SNVs. This observation was consistent with results from a previous study in which the mutation rate was calculated by the occurrence of chlorophyll mutants in the M 2 progeny (Satoh et al. 2010). To further test the phenotypic effects of the induced mutations found in the NGS analysis, small M 2 populations (N = 15) derived from each M 1 individual were grown, and their phenotypes were characterized. The M 2 progeny exhibited a wide range of phenotypic variation in vegetative traits such as plant height, tiller number, leaf color, and reproductive traits such as spikelet shape and number of spikelets (Additional file 1: Table S8).

Discussion
Conventional method for mutagenesis in plants includes treatment of dry seeds with a chemical mutagen. In recent years, there have been an increasing number of reports on genome-wide mutation analysis of EMSmutants obtained using treated dry seeds (Sidhu et al. 2015;Jiao et al. 2016). However, there are no reports on the mutants obtained using fertilized egg cells treated with mutagens. In this study, we developed the Nipponbare mutant library using mutants induced by MNU treatment of the fertilized egg cells toward the construction of an in silico TILLING system. This is the first study to characterize features of genome-wide DNA mutations induced by MNU-treatment of plant fertilized egg cells. In our MNU-mutant library, approximately 656 K SNVs and 3.1 K small InDels were identified in 266 M 1 plants. This result was equivalent to a mutation rate of one nucleotide change per 146 kb. This mutation rate was two-fold higher than that of other mutant libraries in diploid plant species (one nucleotide change in 294 kb in rice and 367 kb in tomato), indicating the high efficiency of mutation rate in our mutant library, and also suggesting our library as a prospective mutant resource for the construction of an in silico TILLING system. Our data showed that the most frequent changes were G/C to A/T transitions (91.2%). This finding is consistent with a previously proposed mechanism in which guanines are predominantly alkylated, and the guanines mismatched with thymine are replaced with adenines during DNA replication . Other mutation types (A/T to G/C transitions and transversions) were also found but were minor (8.8%). The results from previous studies and our study consistently indicate that chemical mutagenesis with an alkylating agent generates G/C to A/T transitions predominantly, although there are small variations among different plant species (e.g., 62% in wheat (Sidhu et al. 2015), 70% in tomato (Shirasawa et al. 2016), 96% in tobacco (Udagawa et al. 2021), 99% in Arabidopsis .

Sequence Specificity of MNU Targets
We found that purine bases (A and G) were enriched at − 1 bp of mutated G nucleotides compared with randomly selected G nucleotides (Additional file 2: Fig. S5). A weak bias for nucleotide T was also found at + 1 bp. Studies in other plants have reported several different patterns of nucleotide frequency biases. A sorghum EMS mutant library had a high proportion of C nucleotides at − 2 bp and + 1 bp relative to the mutation site (Jiao et al. 2016). A TILLING study by Greene et al. (2003) reported an excess of purines at the ± 1 positions of mutated G nucleotides in Arabidopsis. This bias at the − 1 bp position was consistent with our result, but the bias for the + 1 bp position was different. We hypothesize that these differences in biases may be due to the distinctive treatment methods used to mutagenize different plant species.

How can We Develop the Mutant Library Efficiently for In Silico TILLING?
The mutation frequency was weakly correlated with the seed-setting percentage of M 1 individuals, a result consistent with a previous report by Satoh et al. (2010). Notably, M 1 individuals with a lower seed-setting percentage (< 60%) tended to have a high mutation rate (some plants had over 7000 SNVs in Fig. 3). This observation suggested that selecting M 1 individuals with a low seed-setting percentage (20-60%) was the most efficient way to develop the sequenced mutant library. Approximately half of the M 2 lines in our mutant library showed remarkable morphological and physiological changes during growth, such as low germination, leaf color variation, and plant height. These phenotypic changes could be caused by any nonsense or missense mutation detected by WGS and can be used as good sources of germplasm for functional genomics investigations.

Toward the Development of an In Silico TILLING System
Clustered regularly interspaced short palindromic repeats technology (CRISPR/Cas9), a method of gene editing, is a straightforward approach for obtaining target genes; however, gene editing techniques still need substantial laboratory work and tools, including at least two months to generate the mutants via Agrobacterium transformation for monocot species. In some countries, government regulations dictate that gene-edited plants must be treated in the same way as genetically modified organisms. Unlike gene targeting techniques, the whole-genome sequenced mutant library is ready for use. Researchers can start characterizing interesting mutants without heeding regulations, such as a fully closed greenhouse and the need to autoclave plant waste. We are developing an easy-to-use online screening system that allows the identification of mutations in the genes of interest by an in silico method. Mutant genome sequences will be available to the public for plant functional research through this online screening system. When this database is completed, the working time to find mutants of interest will be shortened from several weeks to several minutes. Additional WGS analysis of the residual M 1 individuals is now being conducted (266 of 1384 mutants with reduced seed setting levels are reported in our study). Our results show that 266 M 1 individuals account for 8.6% of the genes encoded by the rice genome as determined by high-impact mutations. This result implies that approximately 3000 M 1 individuals would be sufficient to cover the entire rice genome. Therefore, a suitable population size for the mutant library would be more than 1500 individuals to cover half of all genes with high-impact SNVs and have 3.5 missense mutations per gene.

Conclusions
In summary, we resequenced 266 rice mutants derived from MNU-treatment of fertilized egg cells and found 0.66 million induced point mutations. Over 60% of all annotated rice gene models harbored the nonsynonymous mutations in this mutant library. In the future, this proposed mutant library and its prospective database will allow rice researchers and other plant geneticists to study rice genes without requiring an immense amount of time and effort to make transgenic plants.