Phylogenomics databases for facilitating functional genomics in rice
© Jung et al. 2015
Received: 5 February 2015
Accepted: 14 July 2015
Published: 30 July 2015
The completion of whole genome sequence of rice (Oryza sativa) has significantly accelerated functional genomics studies. Prior to the release of the sequence, only a few genes were assigned a function each year. Since sequencing was completed in 2005, the rate has exponentially increased. As of 2014, 1,021 genes have been described and added to the collection at The Overview of functionally characterized Genes in Rice online database (OGRO). Despite this progress, that number is still very low compared with the total number of genes estimated in the rice genome. One limitation to progress is the presence of functional redundancy among members of the same rice gene family, which covers 51.6 % of all non-transposable element-encoding genes. There remain a significant portion or rice genes that are not functionally redundant, as reflected in the recovery of loss-of-function mutants. To more accurately analyze functional redundancy in the rice genome, we have developed a phylogenomics databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases. In this review, we introduce key features and applications of these databases. We expect that they will serve as a very useful guide in the post-genomics era of research.
Functional redundancy remains a large obstacle in functional genomics studies
Progress in functional genomics studies can be significantly inhibited by functional redundancy existing within a genome. This redundancy can be identified through analysis of genome sequences and transcriptomic data. The whole-genome rice sequence, completed by the International Rice Genome Sequencing Project (IRGSP) consortium, indicates the presence of up to as many as 3,865 paralogous protein families in rice (IRGSP 2005). These include 21,998 proteins out of 42,653 total non-transposable element (non-TE)-related proteins predicted by the Michigan State University Rice Genome Annotation Project (MSU-RGAP; http://rice.plantbiology.msu.edu/) team (Lin et al. 2008). This suggests that a gene within the rice genome has a 51.6 % possibility of being functionally redundant.
Several plant gene family databases are available for such analyses such as GreenPhyl V4 (http://www.greenphyl.org/cgi-bin/index.cgi) (Rouard et al. 2011), Plant Gene Family Database (Sakai et al. 2013), and SALAD Database (http://salad.dna.affrc.go.jp/salad/en/) (Mihara et al. 2010). These databases provide tools for phylogenetic analysis and have been applied to determine the similarity of assigned functions of gene families. However, the presence of predicted genetic redundancy based on intra-family examinations does not always accurately predict functional redundancy. For example, a genome-wide survey of predicted light-responsive genes in rice and functional analysis of T-DNA insertional mutants revealed that four of the tested family members have defects associated with normal growth or chlorophyll biosynthesis (Jung et al. 2008). These genes are highly expressed at leaf tissue. These results suggest that such highly expressed members of a gene family are good targets for functional investigations.
To further explore this idea (Jung et al. 2015), we analyzed the phylogenetic relationship and expression patterns of members of 79 gene families with known function. Of these 79 gene families, 65 carry at least one member that is highly expressed. We found that the redundancy of these families was limited to two or three members of each family. This study confirmed that phylogenomics analysis integrating gene expression data within a phylogenetic context is an effective strategy to select genes for functional genomics studies.
Construction of phylogenomics databases for six large gene families in rice
Rice Kinase Database
Summary of the types of data integrated in phylogenomics databases
Providing information or data
locus IDs from MSU-RGAP and The Rice Annotation Project Database (RAP-DB; http://rapdb.dna.affrc.go.jp/), family and sub-family names, domain positions, NCBI blast result
TE-relatedness, existence of EST/cDNA, and Program to Assemble Spliced Alignments (PASA) status
Orthologs in Plants
orthologs from 12 plant species, (i.e., Brachypodium distachyon, Panicum virgatum, Sorghum bicolor, Zea mays, Arabidopsis thaliana, Cucumis sativus, Glycine max, Medicago truncatula, Mimulus guttatus, Populus trichocarpa, Ricinus communis, and Vitis vinifera)
(Berglund et al. 2008)
Transmembrane Domain (TM), N-terminal Myristoylation Site (Myrist), N-terminal Signal Peptide (SignalP), Chloroplast Transit Peptide (ChloroP), and predicted Subcellular Localization
mutant lines and corresponding flanking sequence tags from eight institutes
(Chandran and Jung 2014)
experimentally validated network of protein–protein interactions based on Yeast Two-Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods
(Ding et al. 2009)
Digital Northern Data
normalized frequency of ESTs in selected tissues/organs
MPSS mRNA Data
meta-expression data from 70 libraries
(Nakano et al. 2006)
MPSS Small RNA Data
meta-expression data from six libraries
(Nakano et al. 2006)
meta-expression data from the six microarray platforms including Affymetrix, Agilent22K, Agilent44K, BGI/YALE60K, NSF20K, and NSF45K (http://ricephylogenomics.ucdavis.edu/description.shtml)
(Cao et al. 2012)
Rice GT database
Glycosyltransferases constitute a large group of enzymes that form glycosidic bonds through the transfer of sugars from activated donor molecules to acceptor molecules. They are critical to the biosynthesis of plant cell walls. The Rice GT Database was created to integrate and host functional genomics information for all putative rice GTs (Cao et al. 2008). This database contains information about 609 potential GT genes (loci) that correspond to 769 transcripts (gene models). Those loci have been identified from the rice genome through similarity searches that utilized GT sequences available from the Carbohydrate Active enZymes (CAZy) database (http://www.cazy.org/) (Egelund et al. 2004). Based on domain compositions and sequence similarities, we have classified rice GTs into 41 CAZy families, including one unknown class. Following analysis with Inparanoid, we can suggest that 282 'rice-diverged' GTs have no orthologs in sequenced dicot species (e.g., A. thaliana, P. trichocarpa, M. truncatula, and R. communis) (Sonnhammer and Ostlund 2014). Similar to the RKD, we have developed a platform to display user-selected functional genomics data on a phylogenetic tree. These include all integrated data except interactome data (http://ricephylogenomics.ucdavis.edu/cellwalls/gt/).
Rice GH database
Glycoside hydrolases (GHs) catalyze the hydrolysis of glycosidic bonds in cell wall polymers and, along with GTs, are major contributors to plant cell architecture (Sharma et al. 2013). Several GHs have been identified from the rice genome based on sequence similarity searches that used GH sequences in the CAZy database. The rice genome encodes 437 GH genes corresponding to 614 gene models that have been classified into 34 families. Using the massive datasets available in public databases, we have created a phylogenomics database of rice GHs (http://ricephylogenomics.ucdavis.edu/cellwalls/gh/) that integrates multiple data types. The new sets incorporate structural features, orthologous relationships, mutant availability, and gene expression patterns for each GH family within a phylogenomics context (Sharma et al. 2013). After comparing them with dicot GHs, we believe that 138 GH genes are possibly monocot-diverged. By integrating and analyzing these phylogenetic and expression data, researchers should be able to identify potential targets for engineering cell wall structure and stress tolerance. Other features of the GH database are similar to those of the GT database.
Rice TF database
A transcription factor binding to specific DNA sequences controls the rate of transcription of genetic information from DNA to messenger RNA (Todeschini et al. 2014). Rice TFs have been retrieved from the Plant Transcription Factor Database (http://plntfdb.bio.uni-potsdam.de/v3.0/) (Zhang et al. 2011). This Rice TF Database (http://ricephylogenomics.ucdavis.edu/tf/) hosts 2,385 genes corresponding to 3,119 models classified into 80 families. It integrates and provides functional genomics information for all putative rice TFs and other predicted transcriptional regulators. Like other databases, we have integrated multiple data types, such as structural features, orthologous relationships, mutant availability, and gene expression patterns for each TF family within a phylogenomics context. Other features are similar to those of the GT database.
Rice transporter database
A transporter is a membrane protein involved in the movement of ions or small molecules (Saier et al. 2014). Transporter proteins exist permanently within and span the membrane across which substances are transferred. The rice genome contains 1,211 potential transporter genes (loci) corresponding to 1754 gene models (Ren et al. 2007). These sequences have been retrieved from the Transporter Protein Analysis Database (TransportDB; http://www.membranetransport.org/), which was created to merge and provide functional genomics information for all putative rice transporters. Like for the other databases, we have integrated multiple data types that include structural features, orthologous relationships, mutant availability, and gene expression patterns for each transporter family (http://ricephylogenomics.ucdavis.edu/transporter/). Other features are similar to those of the GT database.
Rice Cytochrome P450 database
Cytochrome P450 monooxygenases belong to the superfamily of proteins containing a heme cofactor. They have roles in the terminal oxidation of electron transfer chains. The rice genome has 302 genes that encode P450s corresponding to 341 transcripts. Rice P450s have been retrieved from the Cytochrome P450 Database (http://drnelson.uthsc.edu/CytochromeP450.html), and mapped onto the MSU-RGAP ver 6 genome annotation. The Rice P450 Database (http://ricephylogenomics.ucdavis.edu/p450/) was created to integrate and provide functional genomics information for all putative rice P450s. As with the other databases, we have integrated multiple data types for each P450 family. The other features are similar to those of the GT database.
Applying a phylogenomics approach to estimate functional redundancy within a gene family
OsNIP3;1 (LOC_Os10g33924) is a boron transporter. P Plant growth cannot be sustained under boron-deficient conditions when expression of that gene is knocked down using RNAi. A phylogenetic tree has indicated that OsNIP3;1, OsNIP3;2, and OsNIP3;3 cluster together. OsNIP3;1 is the dominantly expressed gene family member (Fig. 3) (Liu et al. 2007). Both OsPIP1;2 and OsPIP1;1 are ubiquitously expressed suggesting functional redundancy between the two. Antisense suppression of OsPIP1;1 causes partially defective phenotypes during seed germination, but overexpression of that gene does not stimulate the occurrence of a more normal phenotype. Even though OsPIP1;3 is closely related to OsPIP1;1 and OsPIP1;2, meta-expression data indicates that OsPIP1;3 expression is highest in the radical and root. Rice plants that are silenced for OsPIP1;3 are defective in seed germination whereas overexpression of OsPIP1;3 enhances germination (Liu et al. 2007). These results demonstrate that OsPIP1;3 has a major role in seed germination in contrast to the other genes in that clade. These observations confirm that phylogenomic analysis, that integrates global expression data with phylogenetic analysis, is a useful method for identifying distinct roles for closely related gene family members.
Conclusion and Prospect
Functional genomics studies of rice genes belonging to a single family can be facilitated by phylogenomic analysis that integrates diverse types of biological information in a single page view. Future prospects to advance analysis of gene function include coupling phylogenomic analyses with computational predictions of gene function. For example, we have recently generated a probabilistic functional gene network for rice, called RiceNet (Lee et al., 2011; Lee et al., 2015). We used the GH database phylogenomics database to identify 17 GH glycoside hydrolase gene family members (Sharma et al. 2013). We used these seventeen candidate GH genes to query RiceNet v1. We found that these nine genes are highly predicted to function in the same biological process as cellulose synthase and cellulose synthase-like genes of rice, suggesting a potential role for these nine GH genes in cell wall biosynthesis.
This work was carried out with the support of “Cooperative Research Program for Agriculture Science & Technology Development (Project title: Global identification and functional study of rice genes for enhancement of root development and nutrient use efficiency using genome information, Project No.PJ01100401)” Rural Development Administration, Republic of Korea, the Ramalingaswami Fellowship from the Department of Biotechnology, Government of India to RS, and funding from The Joint BioEnergy Institute, the Office of Science, Office of Biological and Environmental Research, U.S. Department of Energy under Contract No. DE-AC02-05CH11231 to PCR.
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A (2011) NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res 39:D1005–1010PubMed CentralPubMedView ArticleGoogle Scholar
- Cao PJ, Bartley LE, Jung KH, Ronald PC (2008) Construction of a rice glycosyltransferase phylogenomic database and identification of rice-diverged glycosyltransferases. Mol Plant 1:858–877PubMedView ArticleGoogle Scholar
- Cao P, Jung KH, Choi D, Hwang D, Ronald PC (2012) The Rice Oligonucleotide Array Database: an atlas of rice gene expression. Rice 5:17PubMedView ArticleGoogle Scholar
- Chandran AKN, Jung KH (2014) Resources for systems biology in rice. J Plant Biol 57:80–92View ArticleGoogle Scholar
- Dardick C, Chen J, Richter T, Ouyang S, Ronald P (2007) The rice kinase database. A phylogenomic database for the rice kinome. Plant Physiol 143:579–586PubMed CentralPubMedView ArticleGoogle Scholar
- Dash S, Van Hemert J, Hong L, Wise RP, Dickerson JA (2012) PLEXdb: gene expression resources for plants and plant pathogens. Nucleic Acids Res 40:D1194–1201PubMed CentralPubMedView ArticleGoogle Scholar
- Ding X, Richter T, Chen M, Fujii H, Seo YS, Xie M, Zheng X, Kanrar S, Stevenson RA, Dardick C, Li Y, Jiang H, Zhang Y, Yu F, Bartley LE, Chern M, Bart R, Chen X, Zhu L, Farmerie WG, Gribskov M, Zhu JK, Fromm ME, Ronald PC, Song WY (2009) A rice kinase-protein interaction map. Plant Physiol 149:1478–1492PubMed CentralPubMedView ArticleGoogle Scholar
- Egelund J, Skjot M, Geshi N, Ulvskov P, Petersen BL (2004) A complementary bioinformatics approach to identify potential plant cell wall glycosyltransferase-encoding genes. Plant Physiol 136:2609–2620PubMed CentralPubMedView ArticleGoogle Scholar
- IRGSP (2005) The map-based sequence of the rice genome. Nature 436:793–800View ArticleGoogle Scholar
- Jung KH, Lee J, Dardick C, Seo YS, Cao P, Canlas P, Phetsom J, Xu X, Ouyang S, An K, Cho YJ, Lee GC, Lee Y, An G, Ronald PC (2008) Identification and functional analysis of light-responsive unique genes and gene family members in rice. PLoS Genet 4, e1000164PubMed CentralPubMedView ArticleGoogle Scholar
- Jung KH, Cao P, Seo YS, Dardick C, Ronald PC (2010) The Rice Kinase Phylogenomics Database: a guide for systematic analysis of the rice kinase super-family. Trends Plant Sci 15:595–599PubMedView ArticleGoogle Scholar
- Jung KH, Kim SR, Giong HK, Nguyen MX, Go HJ, An G (2015) Genome-wide identification and functional analysis of genes expressed ubiquitously in rice. Mol Plant. 8:276-289Google Scholar
- Lee I, Seo YS, Coltrane D, Hwang S, Oh T, Marcotte EM, Ronald PC (2011) Genetic dissection of the biotic stress response using a genome-scale gene network for rice. Proc Natl Acad Sci USA 108: 18548–18553PubMed CentralPubMedView ArticleGoogle Scholar
- Lee T, Oh T, Yang S, Shin J, Hwang S, Kim CY, Kim H, Shim H, Shim JE, Ronald PC, Lee I (2015) RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res 43:W122–127.PubMed CentralPubMedView ArticleGoogle Scholar
- Lin H, Ouyang S, Egan A, Nobuta K, Haas BJ, Zhu W, Gu X, Silva JC, Meyers BC, Buell CR (2008) Characterization of paralogous protein families in rice. BMC Plant Biol 8:18PubMed CentralPubMedView ArticleGoogle Scholar
- Liu HY, Yu X, Cui DY, Sun MH, Sun WN, Tang ZC, Kwak SS, Su WA (2007) The role of water channel proteins and nitric oxide signaling in rice seed germination. Cell Res 17:638–649PubMedView ArticleGoogle Scholar
- Mihara M, Itoh T, Izawa T (2010) SALAD database: a motif-based database of protein annotations for plant comparative genomics. Nucleic Acids Res 38:D835–842PubMed CentralPubMedView ArticleGoogle Scholar
- Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC (2006) Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res 34:D731–735PubMed CentralPubMedView ArticleGoogle Scholar
- Nguyen MX, Moon S, Jung KH (2013) Genome-wide expression analysis of rice aquaporin genes and development of a functional gene network mediated by aquaporin expression in roots. Planta 238:669–681PubMedView ArticleGoogle Scholar
- Ren Q, Chen K, Paulsen IT (2007) TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Res 35:D274–279PubMed CentralPubMedView ArticleGoogle Scholar
- Rouard M, Guignon V, Aluome C, Laporte MA, Droc G, Walde C, Zmasek CM, Perin C, Conte MG (2011) GreenPhylDB v2.0: comparative and functional genomics in plants. Nucleic Acids Res 39:D1095–1102PubMed CentralPubMedView ArticleGoogle Scholar
- Saier MH Jr, Reddy VS, Tamang DG, Vastermark A (2014) The transporter classification database. Nucleic Acids Res 42:D251–258PubMed CentralPubMedView ArticleGoogle Scholar
- Sakai H, Lee SS, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang CC, Iwamoto M, Abe T, Yamada Y, Muto A, Inokuchi H, Ikemura T, Matsumoto T, Sasaki T, Itoh T (2013) Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol 54, e6PubMed CentralPubMedView ArticleGoogle Scholar
- Sharma R, Cao P, Jung KH, Sharma MK, Ronald PC (2013) Construction of a rice glycoside hydrolase phylogenomic database and identification of targets for biofuel research. Front Plant Sci 4:330PubMed CentralPubMedGoogle Scholar
- Sonnhammer EL, Ostlund G (2014) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 43:D234–239PubMed CentralPubMedView ArticleGoogle Scholar
- Todeschini AL, Georges A, Veitia RA (2014) Transcription factors: specific DNA binding and specific gene regulation. Trends Genet 30:211–219PubMedView ArticleGoogle Scholar
- Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B, Sultana R, Cheung F, Wortman J, Buell CR (2005) The institute for genomic research Osa1 rice genome annotation database. Plant Physiol 138:18–26PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang H, Jin J, Tang L, Zhao Y, Gu X, Gao G, Luo J (2011) PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database. Nucleic Acids Res 39:D1114–1117PubMed CentralPubMedView ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.