Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data
- Yoshihiro Kawahara1,
- Melissa de la Bastide2,
- John P Hamilton3,
- Hiroyuki Kanamori1,
- W Richard McCombie2,
- Shu Ouyang4,
- David C Schwartz5,
- Tsuyoshi Tanaka1,
- Jianzhong Wu1,
- Shiguo Zhou5,
- Kevin L Childs3,
- Rebecca M Davidson3, 6,
- Haining Lin3, 7,
- Lina Quesada-Ocampo3,
- Brieanne Vaillancourt3,
- Hiroaki Sakai1,
- Sung Shin Lee1,
- Jungsok Kim1,
- Hisataka Numa1,
- Takeshi Itoh1Email author,
- C Robin Buell3 and
- Takashi Matsumoto1
© Kawahara et al.; licensee Springer. 2013
Received: 6 November 2012
Accepted: 30 January 2013
Published: 6 February 2013
Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group).
The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community.
A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies.
KeywordsOryza sativa Nipponbare Unified rice reference genome Pseudomolecules Minimum tiling path Optical mapping Genome re-sequencing Next-generation sequencing
The International Rice Genome Sequencing Project (IRGSP) completed the sequencing of the japonica rice cultivar Nipponbare in 2005 (International Rice Genome Sequencing Project 2005). In this project, the consortium employed a clone-by-clone sequencing strategy after construction of a minimum tiling path (MTP) for each chromosome. Subsequently, two genome assemblies were independently produced. One by the Rice Genome Annotation Project initially located at The Institute for Genomic Research and now at Michigan State University (MSU) and another by the Rice Annotation Project (RAP) (Ouyang et al. 2007; Tanaka et al. 2008). The two sets of pseudomolecules differed slightly due to differences in selection of the clones underlying the MTP and the lengths of gap insertions used by the two projects. The two genome assemblies have made it difficult for the rice community to move between the resources produced by the two annotation groups. To facilitate genomic-enabled research, a unified, single genome assembly of the Nipponbare rice reference genome was constructed by updating the MTP, validating the final MTP with optical mapping data, and error-correcting the unified assembly using next generation re-sequencing data.
The finished quality of the reference genome sequence of Nipponbare rice in 2005 was estimated to be less than one error in 10 kb (International Rice Genome Sequencing Project 2005). Recent advancements in sequencing technologies have enabled re-sequencing of multiple rice genomes. To date, several groups have performed genome-wide genetic diversity analyses among rice cultivars. The genome of an elite japonica rice cultivar, Koshihikari, which is closely related to Nipponbare, was sequenced using the Illumina platform and single nucleotide polymorphisms (SNPs) between Nipponbare and Koshihikari were estimated on an average to be at least one per 5.7 kb (i.e., 1.8 × 10-4 per site) (Yamamoto et al. 2010). Sequencing of 517 rice landraces identified approximately 3.6 million SNPs, a frequency of 9.32 per kb (Huang et al. 2010). Since the polymorphism differences between rice cultivars are limited, a high quality reference genome sequence is essential for the comparison of closely related rice cultivars (Zhang et al. 2010; Lu et al. 2010; Huang et al. 2012; Xu et al. 2012; Yang et al. 2012) so that errors can be minimized. Here, we report a single high-quality reference genome sequence for rice from the japonica cultivar Nipponbare (Os-Nipponbare-Reference-IRGSP-1.0) in which the MTP and bacterial artificial chromosome (BAC)/P1 artificial chromosome (PAC) assemblies were validated using optical map data. Fine-scale validation was performed using error correction and whole genome re-sequencing data obtained from two next generation platforms, the Illumina Genome Analyzer II/IIx and Roche GS FLX.
Results and discussion
Construction of a minimum tiling path validated by an optical map
To revise the physical map, clone orders were manually examined using an optical map of the rice genome (Zhou et al. 2007). Eight new clones (six BACs: AC151599.2, AC157835.1, AC161790.1, AC167227.1, AC157500.1, AC174464.1; one PAC: AP004805.1; one PCR product: AC150775.1) were added to the MTP. One BAC clone (AP005604.3) was found to completely overlap with other clones and was excluded. The directions of two fosmid clone sequences (AC151105.2 and AP009057.1) were resolved. Seven Syngenta contig sequences (Goff et al. 2002) were mapped to the ends of continuous sequences (contigs) or in physical gaps (Additional file 1). Fifty-three physical gaps, including novel gaps identified in this study, remained in the genome assembly (Additional file 2); note, one gap was removed by visual inspection of GS-FLX reads (see "Detection of misassembling and erroneous large gaps"). In addition, gaps in the genome sequence at 19 telomeres were annotated by the insertion of Ns at the end of the chromosomes (Additional file 2).
The final assembled genome was composed of 3,475 genomic sequences: 2,482 BACs, 901 PACs, 37 fosmids, 2 plasmids, 5 PCR products, 41 partial sequences from genomic clones, and 7 Syngenta contigs (Additional file 3). The genome size is 373,173,519 bp after error correction with Illumina whole genome re-sequencing reads (see "Detection of small sequencing errors by mapping Illumina reads"). The total gap size between contigs was estimated by fluorescence in situ hybridization (Additional file 4) to be 9,598,219 bp and 7,290,600 bp by optical mapping (Additional file 5) resulting in a total assembly size of 382,771,738 bp and 380,464,119 bp, respectively. Since the rDNA regions, which are estimated to span a total of 3.7 Mbp (Ohmido et al. 2000; Oono and Sugiura 1980), are not included in this estimate, the actual genome size of the Nipponbare cultivar is 384.2-386.5 Mbp. Thus, the new assembly covers 96.6-97.1% of the entire Nipponbare rice genome.
Detection of small sequencing errors using Illumina-platform generated reads
Assessment and processing of re-sequencing datasets used in this study
Original read length (bp)
Number of reads
Initial purity filtered reads
Remaining after low quality trimming
Remaining after adaptor trimming
Remaining after pairingb
Statistics of mapping results by BWA
Number of pre-processed reads
Uniquely & properly mappedb
Unpaired PEf, 76bp
Coverage and number of effective sites for detection of variation
NIAS + CSHL
We classified all nucleotides into five categories: "reference type," "sequencing error," "allele (within an individual)," "allelic difference between individuals," and "low depth" (see Methods and Additional file 7). The classification is based on the type and frequency of nucleotide at each site for the three datasets. Our survey of SNP-type sequencing errors detected 3,447 sites that showed high frequencies of non-reference type bases and were classified into "sequencing error" (Additional file 8 and Additional file 9). Since BWA generates gapped alignments, small insertion/deletion (indel) type variants (1–4 bp gaps) could also be detected: 642 sites were insertions and 797 were deletions. Thus, 4,886 errors were found in the 321 Mbp of the reassembled reference genome. The IRGSP estimated that the error rate of the reference genome sequence was less than one per 10,000 nucleotides (International Rice Genome Sequencing Project 2005). In fact, the average frequency of sequencing errors estimated by our Illumina re-sequencing data was 0.15 errors per 10,000 nucleotides, substantially lower than the estimation by the IRGSP. While this may be an under-estimation of the error rate due to our inability to map reads to all nucleotides in the assembled rice genome and the lack of the requisite read depth at every nucleotide, we were able to assess 321 of the 373 Mb assembly for errors.
Detection of misassembly
Long read sequences were generated using the Roche GS FLX platform and used to identify misassembly of BAC/PAC clones as well as large indel sequencing errors. A total of 1.0 Gbp from 2,706,353 reads was produced. The reads were filtered to remove low quality reads, yielding 2,705,634 reads with an average read length of 377 bp. To detect large gaps in the reference assembly, these reads were aligned to the pseudomolecules using Megablast (Zhang et al. 2000). We selected reads that mapped unambiguously to two different chromosomal positions within a 1 Mbp interval and where the total of two high scoring pairs reported by Megablast covered >90% of the original read. From this initial set of alignments, 205 reads were found to meet these criteria. A total of 200 of these reads were removed through manual inspection and five large indel gaps were retained. Three were insertions, indicating that additional nucleotides should be inserted into the reference genome, while the other two were deletions. The two deletion errors were further validated by mapping the Illumina reads to the long reads that matched these regions. In one case, a physical gap that had originally been inserted into the reference was found to be unnecessary based on our long read validation, and this gap was removed. For the other four cases, the erroneous indels were located near tandem repetitive sequences.
Nomenclature, data availability and annotation of the Os-Nipponbare-Reference-IRGSP-1.0 genome
We have named this updated assembly "Os-Nipponbare-Reference-IRGSP-1.0" to signify that it is from rice (Oryza sativa), the Nipponbare cultivar, a high quality reference assembly, from the IRGSP, and version 1.0. We envision that future assemblies of rice will be of draft quality and from other entities, and as is the case with the Nipponbare rice genome, will be updated as new sequencing datasets become available in the future. We are proposing this nomenclature for other rice genome assemblies as an informative way for the community to readily interpret the origin, quality, and iteration of rice genome sequences. One objective in this study was to provide a single unified set of pseudomolecules for two parallel annotation efforts, the RAP (Tanaka et al. 2008, http://rapdb.dna.affrc.go.jp/) and the MSU Rice Genome Annotation Project (Ouyang et al. 2007, http://rice.plantbiology.msu.edu/) in which both annotation projects have now updated their annotation with the underlying Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules and provided this via their project websites.
The annotation of the Rice Annotation Project Database (RAP-DB) was updated for Os-Nipponbare-Reference-IRGSP-1.0. A total of 4,993 loci from the RAP annotation on the IRGSP build 5 genome were deprecated in this release. Of those, 4,766 (95%) loci became obsolete due to the overlaps with repetitive sequences on the new pseudomolecules. The exon-intron structures of genes and their splicing variants were identified or predicted as described previously (Tanaka et al. 2008). As a result, 37,872 loci, of which 35,681 have protein-coding potential, were determined. In addition, 6,642 variants of alternative splicing were found. Furthermore, a number of functional annotations were carefully examined through literature surveys. Manual curation efforts improved the gene structures and functional descriptions of 97 loci. All the information including the repeat-masked genome assembly can be downloaded from the RAP-DB (http://rapdb.dna.affrc.go.jp/download/irgsp1.html). For users' convenience, previous versions based on obsolete genome assemblies are also available at the RAP-DB Legacy database (http://rapdblegacy.dna.affrc.go.jp/).
Annotated genes and gene models from the MSU Rice Genome Annotation Project (Ouyang et al. 2007) were transferred from the previous MSU pseudomolecule build to the Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules and functional annotation updated to reflect new evidence datasets. The current release of the MSU Rice Genome Annotation Project (Release 7) based on the Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecules contains 56,081 loci encoding 66,433 gene models. Of these, 39,102 loci (49,110 gene models) are non-transposable element related. A total of 1,240 loci from the MSU Release 6.1 annotation set were deprecated in this new release. Functional annotation of the loci involved searches against the UniRef 100 database, Pfam domain database, Interpro and alignments to a wide range of expression profile datasets including Sanger-derived Expressed Sequence Tags, full-length cDNAs, Massively Parallel Signature Sequences, Digital Gene Expression, and mRNA-seq datasets (http://rice.plantbiology.msu.edu/expression.shtml). The Rice Genome Annotation Project Release 7 database and tools are available for searching, download, and analysis at http://rice.plantbiology.msu.edu/. An install of the Generic Genome Browser (Stein et al. 2002) providing 83 tracks of annotation including the MSU Rice Genome Annotation Project Release 7 data is available at http://rice.plantbiology.msu.edu/cgi-bin/gbrowse/rice/.
Comparison of the RAP and MSU annotation datasets show a high degree of concordance between the two annotations with 33,708 loci overlapping by at least 1bp between the two annotation sets. However, as the two approaches weight ab initio gene predictions and transcript/protein evidence differently (Ouyang et al. 2007, Tanaka et al. 2008), there are differences in the two datasets with more protein-coding genes predicted in the MSU than the RAP annotation.
The genome assembly of Oryza sativa (cv. Nipponbare), which had been constructed and provided independently by two groups, was unified. Sequencing errors were thoroughly examined so that in-depth analyses among rice cultivars will be possible. Furthermore, our survey of alleles found variations within the Nipponbare cultivar that are mostly attributable to outcrossing, residual heterozygosity, and/or somatic mutations through the standard process of propagation in individual laboratories. Our ability to detect a number of allelic differences in the three Nipponbare rice individuals surveyed suggests that allelic differences should not be readily dismissed in SNP assessments of rice diversity. The high-quality reference genome assembly presented here is an invaluable resource for studies of emerging re-sequencing data and is available in the RAP-DB (http://rapdb.dna.affrc.go.jp/) and the MSU Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/) websites.
Sequence data sources
The IRGSP clone and PCR sequences of the O. sativa (japonica group, cultivar Nipponbare) genome deposited in the International Nucleotide Sequence Databases as of 25 February 2010 were used in construction of the MTP. In addition, sequence reads generated by the Syngenta rice genome sequencing project (Goff et al. 2002) were assembled and used to extend contigs.
For the next-generation DNA sequencing of an NIAS individual, total genomic DNA was prepared from nuclei isolated from Nipponbare rice young leaves (two weeks after germination) using the CTAB method (Murray and Thompson 1980). The DNA samples were fragmented by a nebulizer or Branson Sonifier 250 (Danbury, CT). Sequencing libraries were constructed following the protocols with Illumina Genomic DNA Sample Preparation Kit and Roche GS DNA Library Preparation Kit, respectively. Illumina genome sequencing was performed by Illumina Genome Analyzer II/IIx with the Illumina version 2 sequencing kit. GS-FLX genome sequencing was performed using the Roche GS LR70 Sequencing Kit. The sequence reads are available at the DDBJ Sequence Read Archive (DRA000651).
For the CSHL individual, ~5 μg of Nipponbare rice genomic DNA was used as input for standard Illumina libraries. The DNA was sheared by adaptive focused acoustics using the Covaris (Woburn, MA) instrument and end-repaired using T4 DNA polymerase, Klenow fragment, and T4 polynucleotide kinase. Fragments were then treated with Klenow fragment (3’ - 5’ exonuclease) to add a single 3’ deoxyA overhang and ligated to standard paired-end Illumina adapters. Qiagen (Valencia, CA) columns were used for purification between steps. The fragments were size-selected at ~225 bp (including adapters) using agarose gel electrophoresis. The actual insert size excluding adapters was ~150 bp. The library was then PCR amplified using Phusion DNA polymerase in HF buffer for 14 cycles and quantified using the Agilent BioAnalyzer (Santa Clara, CA). All libraries were normalized to 10 nM before loading on the Illumina sequencers. Production sequencing was performed using Illumina GAIIx instruments with paired-end modules using the Illumina version 3 sequencing kits. The library was sequenced with 76 bp paired-end read lengths. Sequence data was processed using the Illumina GAPipeline v1.1 and v1.3.2 (Firecrest/Bustard v1.9.6 and Firecrest/Bustard v1.3.2). The sequence reads are available at the Sequence Read Archive of NCBI (SRX032913).
Syngenta rice genome sequences (Goff et al. 2002) were filtered by using IRGSP rice genomic sequences with similarity searches. The filtered sequences were then assembled; 50 large Syngenta contigs (between 4 kb and 40 kb), a total of 748 kb were used for potential gap filling.
Construction of the genome assembly based on a minimum tiling path and validation with the optical map
The MTP of the IRGSP clones for each of the 12 pseudomolecules were updated prior to validation with the optical map. We manually checked information on the clone order and clone overlaps in the IRGSP physical map. For the overlapping regions, phase 3 BAC/PAC sequences, which have finished, quality sequences without gaps (see http://www.ncbi.nlm.nih.gov/projects/genome/glossary.shtml), were preferentially chosen. Any ambiguous nucleotides, such as Rs and Ys, were converted to Ns. The lengths of physical gap between contigs were estimated by FISH (International Rice Genome Sequencing Project 2005). A total of 1,000 Ns were inserted in at each physical gap. Other gaps in the original entries were left unchanged.
There was another BAC/PAC sequence that aligned better against the optical map;
A BAC/PAC/Syngenta contig extended the sequence into a physical gap;
A reverse orientation of a BAC/PAC was aligned better against the optical map;
A BAC/PAC/Syngenta contig was aligned perfectly against the optical map within a physical gap of a pseudomolecule;
A new gap was clearly identified between two neighboring clones;
A gap was eliminated when an overlap of two clones that were supposed to flank a gap was found.
In total, there were 23 modifications made in the tiling paths of the 12 chromosomes. Additional discordances were derived from the sequences of the BACs/PACs, rather than from the pseudomolecule construction; no attempt was made to correct the assembly of the BAC/PAC sequences to match the optical maps.
The lengths of rDNA regions were estimated to span 0.2 Mbp for 5S rDNA on chromosome 11 (Ohmido et al. 2000) and 3.5 Mbp for 17S and 25S rDNA on chromosome 9 (620 daltons/bp; Oono and Sugiura 1980) in a haploid set.
Error corrections with Illumina and 454 reads
To detect small sequencing errors (single nucleotides and indels of 1–4 bases) in the newly assembled reference genome, Illumina Genome Analyzer II/IIx generated reads were used. The genomes of two different individuals of Nipponbare were independently re-sequenced at NIAS and CSHL. Low quality bases (<Q20) were trimmed from both 5’- and 3’ ends of the read until two or more consecutive bases with a high quality score (≥Q20) were observed. Next, the Illumina adapter sequences (5' P-GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG) and (5’ ACACTCTTTCCCTACACGACGCTCTTCCGATCT) were removed using the fastx_clipper, which is part of the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Reads that were <32 bp in length were discarded for further analyses. If only one read of a paired-end read set was discarded in these preprocessing steps, the other read was regarded as a single-end read and named "unpaired." All qualified reads were aligned to the reference genome using BWA v0.5.8a with default options (Li and Durbin 2009). The NIAS single-end reads and CSHL unpaired reads were aligned in the single-end mode using the BWA command “samse”. The CSHL paired-end reads were aligned in the paired-end mode using the BWA command “sampe”. The reads that matched to multiple genomic positions were discarded. A pile up alignment file of all uniquely mapped reads with a mapping quality value of ≥20 was generated using SAMtools v1.8 (Li et al. 2009). To avoid erroneous detection of variants, only sites with a read depth of 10 or more were selected.
By comparing the Illumina reads with the reference genome, each aligned site was first classified into four categories: "reference type (R)," "non-reference type (N)," "allelic (A)," and "low depth (L)" for each of three sets (NIAS, CSHL and NIAS + CSHL) (Additional file 7). If a site had less than 10 reads, the site was "low depth (L)," which means we were unable to assess the site due to low sampling. If ≥80% of the reads were identical to the reference base, the site was classified as "reference type (R)". If ≥80% of the reads were discordant with the reference base, the site was classified as "non-reference type (N)". If there were two alleles with ≥40% read support, the site was classified as "allelic (A)". Since we have two data sets from NIAS and CSHL, the classifications of the three sets (NIAS, CSHL and NIAS + CSHL) were combined and reexamined to decide the genotype for each site (Additional file 7): "reference type", "sequencing error (Additional file 9)", "alleles between individuals” (Additional file 10), "alleles within individuals” (Additional file 11), and "low depth". SNPs classified as allelic variations were annotated based on the RAP-DB gene models using SnpEff v. 3.1 (Cingolani et al. 2012) (Additional file 12).
The genome of the same NIAS individual used in the Illumina re-sequencing was sequenced using the Roche GS FLX platform. Low quality bases (<Q20) were trimmed by the same method as that for the Illumina reads. Repetitive sequences were detected in each read using RepeatMasker Open-3.0 (http://www.repeatmasker.org/) with the MIPS Repeat Element Database (mips-REdat) version 4.3 (http://mips.helmholtz-muenchen.de/plant/genomes.jsp; Spannagl et al. 2007) and the Triticeae Repeat Sequence Database release 10 (http://wheat.pw.usda.gov/ITMI/Repeats/). All preprocessed reads were aligned to the reference genome using Megablast (version 2.2.24) with the following options: -F 'm D' -U T -e 1e-10 (Zhang et al. 2000).
Re-validation and annotation of final assembly
The final, error-corrected pseudomolecules were virtually digested with SwaI and aligned against the rice optical map. A total of 144 major discordances were annotated in the current build of rice pseudomolecules (Additional file 5). Among these discordances, there were 53 physical gaps, including 19 telomeric gaps. The sizes of the gaps range from 0.6 kb to 2.4 Mb, measured by adding up the sizes of the un-matched optical fragments at the gap locations. The discordances were grouped into 5 classes:
Class 1: Physical gaps in the pseudomolecule;
Class 2: Missed fragment or extra fragment: only the discordances with ≥5 kb missed/extra fragment(s) were annotated;
Class 3: Significant size difference: mostly the size differences were ≥5 kb;
Class 4: Multiple different SwaI sites in the same area;
Class 5: Multiple different SwaI sites and multiple un-matched SwaI fragments. The total sizes of the pseudomolecule and the optical map at the location were not comparable, indicating possible misassembly).
Nipponbare Genome Re-sequencing Project
Genome sequencing and assembly team
Yoshihiro Kawahara, Melissa de la Bastide, John P Hamilton, Hiroyuki Kanamori, W Richard McCombie, Shu Ouyang, David C Schwartz, Tsuyoshi Tanaka, Jianzhong Wu and Shiguo Zhou.
MSU annotation team
Kevin L Childs, Rebecca M Davidson, Haining Lin, Lina Quesada-Ocampo and Brieanne Vaillancourt.
RAP annotation team
Hiroaki Sakai, Sung Shin Lee, Jungsok Kim and Hisataka Numa.
Takeshi Itoh, C Robin Buell and Takashi Matsumoto.
Bacterial Artificial Chromosome
Cold Spring Harbor Laboratory
DNA Data Bank of Japan
International Rice Genome Sequencing Project
Michigan State University
Minimum Tiling Path
National Institute of Agrobiological Sciences
National Center for Biotechnology Information
P1 Artificial Chromosome
Rice Annotation Project
Single Nucleotide Polymorphism.
Funding for this work was provided in part by a grant from the U. S. National Science Foundation (DBI-0834043) to CRB, and a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural Innovation, GIR1001).
- Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6: 80–92. 10.4161/fly.19695PubMed CentralView ArticlePubMedGoogle Scholar
- Eichten SR, Foerster JM, de Leon N, Kai Y, Yeh C-T, Liu S, Jeddeloh JA, Schnable PS, Kaeppler SM, Springer NM: B73-Mo17 near-isogenic lines demonstrate dispersed structural variation in maize. Plant Physiol 2011, 156: 1679–1690. 10.1104/pp.111.174748PubMed CentralView ArticlePubMedGoogle Scholar
- Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W-L, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, et al.: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 2002, 296: 92–100. 10.1126/science.1068275View ArticlePubMedGoogle Scholar
- Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng Q, Liu K, Huang T, Zhou T, Jing Y, Li W, Lin Z, Buckler ES, Qian Q, Zhang Q-F, Li J, Han B: Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 2010, 42: 961–967. 10.1038/ng.695View ArticlePubMedGoogle Scholar
- Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, Li W, Guo Y, Deng L, Zhu C, Fan D, Lu Y, Weng Q, Liu K, Zhou T, Jing Y, Si L, Dong G, Huang T, Lu T, Feng Q, Qian Q, Li J, Han B: Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet 2012, 44: 32–39.View ArticleGoogle Scholar
- International Rice Genome Sequencing Project: The Map-Based Sequence of the Rice Genome. Nature 2005, 436: 793–800. 10.1038/nature03895View ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25: 1754–1760. 10.1093/bioinformatics/btp324PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25: 2078–2079. 10.1093/bioinformatics/btp352PubMed CentralView ArticlePubMedGoogle Scholar
- Lu T, Lu G, Fan D, Zhu C, Li W, Zhao Q, Feng Q, Zhao Y, Guo Y, Li W, Huang X, Han B: Function annotation of the rice transcriptome at single-nucleotide resolution by RNA-seq. Genome Res 2010, 20: 1238–1249. 10.1101/gr.106120.110PubMed CentralView ArticlePubMedGoogle Scholar
- McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, Sun Q, Flint-Garcia S, Thornsberry J, Acharya C, Bottoms C, Brown P, Browne C, Eller M, Guill K, Harjes C, Kroon D, Lepak N, Mitchell SE, Peterson B, Pressoir G, Romero S, Oropeza Rosas M, Salvo S, Yates H, Hanson M, Jones E, Smith S, Glaubitz JC, Goodman M, Ware D, et al.: Genetic properties of the maize nested association mapping population. Science 2009, 325: 737–740. 10.1126/science.1174320View ArticlePubMedGoogle Scholar
- Murray MG, Thompson WF: Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res 1980, 8: 4321–4325. 10.1093/nar/8.19.4321PubMed CentralView ArticlePubMedGoogle Scholar
- Ohmido N, Kijima K, Akiyama Y, de Jong JH, Fukui K: Quantification of total genomic DNA and selected repetitive sequences reveals concurrent changes in different DNA families in indica and japonica rice. Mol Gen Genet 2000, 263: 388–394. 10.1007/s004380051182View ArticlePubMedGoogle Scholar
- Oono K, Sugiura M: Heterogeneity of the ribosomal RNA gene clusters in rice. Chromosoma 1980, 76: 85–89. 10.1007/BF00292228View ArticleGoogle Scholar
- Ossowski S, Schneeberger K, Lucas-Lledó JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M: The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 2010, 327: 92–94. 10.1126/science.1180677View ArticlePubMedGoogle Scholar
- Ouyang S, Zhu W, Hamilton J, Lin H, Campbell M, Childs K, Thibaud-Nissen F, Malek RL, Lee Y, Zheng L, Orvis J, Haas B, Wortman J, Buell CR: The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res 2007, 35: D883-D887. 10.1093/nar/gkl976PubMed CentralView ArticlePubMedGoogle Scholar
- Sakai H, Itoh T: Massive gene losses in Asian cultivated rice unveiled by comparative genome analysis. BMC Genomics 2010, 11: 121. 10.1186/1471-2164-11-121PubMed CentralView ArticlePubMedGoogle Scholar
- Spannagl M, Noubibou O, Haase D, Yang L, Gundlach H, Hindemitt T, Klee K, Haberer G, Schoof H, Mayer KFX: MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res 2007, 35: D834-D840. 10.1093/nar/gkl945PubMed CentralView ArticlePubMedGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res 2002, 12: 1599–1610. 10.1101/gr.403602PubMed CentralView ArticlePubMedGoogle Scholar
- Tanaka T, Antonio BA, Kikuchi S, Matsumoto T, Nagamura Y, Numa H, Sakai H, Wu J, Itoh T, Sasaki T, Aono R, Fujii Y, Habara T, Harada E, Kanno M, Kawahara Y, Kawashima H, Kubooka H, Matsuya A, Nakaoka H, Saichi N, Sanbonmatsu R, Sato Y, Shinso Y, Suzuki M, Takeda J-I, Tanino M, Todokoro F, Yamaguchi K, Yamamoto N, et al.: The Rice Annotation Project Database (RAP-DB): 2008 update. Nucleic Acids Res 2008, 36: D1028-D1033.PubMedGoogle Scholar
- Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, Sarkar D, Valouev A, Churas C, Kidd JM, Kohn S, Runnheim R, Lamers C, Forrest D, Newton MA, Eichler EE, Kent-First M, Surti U, Livny M, Schwartz DC: High-resolution human genome structure by single molecule analysis. Proc Natl Acad Sci USA 2010,107(24):10848–10853. 10.1073/pnas.0914638107PubMed CentralView ArticlePubMedGoogle Scholar
- Valouev A, Li L, Liu Y, Schwartz DC, Yang Y, Zhang Y, Waterman MS: Alignment of Optical Maps. J Comput Biol 2006, 13: 442–462. 10.1089/cmb.2006.13.442View ArticlePubMedGoogle Scholar
- Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L, Li J, He W, Zhang G, Zheng X, Zhang F, Li Y, Yu C, Kristiansen K, Zhang X, Wang J, Wright M, McCouch S, Nielsen R, Wang J, Wang W: Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol 2012, 30: 105–111.View ArticleGoogle Scholar
- Yamamoto T, Nagasaki H, Yonemaru J-I, Ebana K, Nakajima M, Shibaya T, Yano M: Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics 2010, 11: 267. 10.1186/1471-2164-11-267PubMed CentralView ArticlePubMedGoogle Scholar
- Yang C-C, Kawahara Y, Mizuno H, Wu J, Matsumoto T, Itoh T: Independent Domestication of Asian Rice Followed by Gene Flow from japonica to indica. Mol Biol Evol 2012,29(5):1471–1479. 10.1093/molbev/msr315View ArticlePubMedGoogle Scholar
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7: 203–214. 10.1089/10665270050081478View ArticlePubMedGoogle Scholar
- Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, Zhuang R, Lu Z, He Z, Fang X, Chen L, Tian W, Tao Y, Kristiansen K, Zhang X, Li S, Yang H, Wang J, Wang J: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res 2010, 20: 646–654. 10.1101/gr.100677.109PubMed CentralView ArticlePubMedGoogle Scholar
- Zhou S, Bechner MC, Place M, Churas CP, Pape L, Leong SA, Runnheim R, Forrest DK, Goldstein S, Livny M, Schwartz DC: Validation of rice genome sequence by optical mapping. BMC Genomics 2007, 8: 278. 10.1186/1471-2164-8-278PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.