Data Sources
- SCC gene data
- Databases There are two pathway databases that were used to retrieve SCC related genes: AraCyc and KEGG. Figure 1 shows the example of AraCyc database that provide information of SCC related genes that were computationally predicted as well as genes that were identified from publications (experiments).
- Co-expression Data Users can view the co-expression database at the browser page and datasets tab of a potential SCC gene description. There were three co-expression databases used to identify the genes that are potentially involve in sulfur biosynthesis pathway:
- Publications Users can view the list of publications used to retrieve SCC related genes at the browser page and datasets tab. There will be a list of references if that gene was retrieved from the publications.
- SCCs data
- Databases KNApSAcK was used to identify SCCs specifically produced by the Brassicales plants.
- Publications Users can view the list of publications used to retrieve SCCs at the browser page and datasets tab. There will be a list of references if that compounds was retrieved from the publications.
- Relevant information data
- AraCyc: AraCyc (https://www.arabidopsis.org/biocyc/) is a tool for visualizing biochemical pathways of Arabidopsis thaliana (Mueller et al. 2003).
- AraNet: genome-scale functional gene network for Arabidopsis thaliana, constructed by integrating 19 types of genomics data and can be explored through a web-server (http://www.inetbio.org/aranet) to identify candidate genes for traits of interest (Lee et al. 2015).
- ATTED: ATTED-II (http://atted.jp) is a coexpression database for plant species to aid in the discovery of relationships of unknown genes within a species (Aoki et al. 2016).
- GeneMANIA: GeneMANIA (http://www.genemania.org/) is a flexible, user-friendly web interface for generating hypotheses about gene function, analyzing gene lists and prioritizing genes for functional assay (Warde-Farley et al. 2010)
- KEGG: Kyoto Encyclopaedia of Genes and Genomes (KEGG) (https://www.genome.jp/kegg/) is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (Kanehisa et al., 2016).
- KNApSAcK: The KNApSAcK Metabolite Activity DB (http://kanaya.naist.jp/KNApSAcK/) is integrated within the KNApSAcK Family DBs to facilitate further systematized research in various omics fields, especially metabolomics, nutrigenomics and foodomics (Nakamura et al. 2013).
- NCBI Gene: NCBI Gene (https://www.ncbi.nlm.nih.gov/gene/) integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
- PubMed: PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) contains more than 27 million publications from MEDLINE, life sciences journals and online books.
- UniProtKB/ Swiss-Prot: UniProt DB (https://www.uniprot.org/) supplies a comprehensive, high-quality and freely accessible resource of protein sequence and functional information (The UniProt Consortium, 2015).
- References
SuCComBase integrates all the genes that are related to SCC biosynthesis in A. thaliana from various sources. The three main sources are databases, expression studies as well as publications.
Figure 1. Example of genes that were computationally identified and experimentally validated.
147 known SCC genes were used to identify potential SCC genes based on coexpression data and managed to identify a total of 778 potential SCC genes from the three co-expression databases (Figure 2).
Figure 2. Identification of potential SCC genes from three co-expression databases.
This database also provide the list of SCCs that were specifically produced in various Brassicales plants in SuCComBase: papaya (C. papaya), cabbage (B. rapa), broccoli (B. oleracea) and A. thaliana.
Relevant information data are the data that are related to SCCs and SCC genes. This section provides all of the sources, where all these data were taken from.
- Aoki,Y. et al. (2016) ATTED-II in 2016: a plant coexpression database towards special online collection. Plant Cell Physiol., 57, 1–9.
- Bateman,A. et al. (2017) UniProt: The universal protein knowledgebase. Nucleic Acids Res., 45, D158–D169.
- Brown,G.R. et al. (2015) Gene: A gene-centered information resource at NCBI. Nucleic Acids Res., 43, D36–D42.
- Kanehisa,M. et al. (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res., 44, D457–D462.
- Lee,T. et al. (2014) AraNet v2 : an improved database of co-functional gene networks for the study of Arabidopsis thaliana and 27 other nonmodel plant species. Nucleic Acids Res., 1–7.
- Mueller,L.A. et al. (2003) AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiol., 132, 453–460.
- Nakamura,Y. et al. (2014) KNApSAcK metabolite activity database for retrieving the relationships between metabolites and biological activities. Plant Cell Physiol., 55, 1–9.
- Warde-Farley,D. et al. (2010) The GeneMANIA prediction server : biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res., 38, W214–W220.