Sulfur Containing Compound Database

SuCComBase Database

  1. SuCComBase Schema
  2. SuCComBase currently consists of 10 tables. Figure 1 shows all the 10 tables with the connections between table to table.

    Figure 1. SuCComBase schema

  3. Table Information
    1. genes: This is the main table of SuCComBase. This table has basic information including Entrez Gene ID, UniProt ID, gene symbol and gene function. It contains a total of 147 SCC-related genes in A. thaliana involving genes that encode proteins that involve in the glucosinolate and camalexin biosynthetic pathways based on experiments reported in various publications and pathway databases (KEGG and AraCyc).
    2. genes_references: This is the pivot table between genes and references.
    3. references: This table contains 196 published literature that were used to identify SCCs as well as known SCC-related genes in A. thaliana that is the main data in SuCComBase.
    4. results: This table contains a total of 4026 orthologous SCC-related genes in papaya (C. papaya), cabbage (B. rapa) and broccoli (B. oleracea). These orthologues were identified from sequence similarity search using known SCC-related genes in A. thaliana against the three Brassicales plants. The cutoff values are sequence identity over 40% and e-value lesser than 1e-50.
    5. organisms: This table lists all information of different Brassicales in SuCComBase: papaya (C. papaya), cabbage (B. rapa) and broccoli (B. oleracea).
    6. coexpressions: This table contains 778 A. thaliana genes that may involve in SCC biosynthetic pathway based on bioinformatic analysis using coexpression data retrieved from AraNet, GeneMANIA and ATTED. These genes are known as potential GSL genes that may also involve in GSL biosynthesis.
    7. putatives: This table contains 92 computationally predicted glucosinolate and camalexin genes from the pathway databases (KEGG and AraCyc).
    8. putatives_type: This table connects putatives and types.
    9. compounds: contains a total of 84 SCCs specifically produced by A. thaliana (42 SCCs), papaya (5 SCCs), cabbage (16 SCCs) and broccoli (21 SCCs). All of the information were generated from publications and several of them can be also identified from the KNApSAcK database.
    10. compounds_references: This table links compounds with references that corroborate the production SCCs from publications.

  4. Data Types Structure organization
  5. In SuCComBase, there are three types of data that are the SCC genes, SCCs and references. Figure 2 shows the organization of the data types where SCC genes in A. thaliana are the main data in SuCComBase.

    Figure 2. SuCComBase data types structure organization