EzCatDB

EzCatDB Tutorial

A database of Enzyme Catalytic Mechanisms

Description of the EzCatDB Database

The EzCatDB database analyzes and classifies enzyme catalytic mechanisms on the basis of information from literature and data that are derived from entries in the Protein Data Bank (PDB). Each data set contains corresponding enzyme information, such as E.C. number, PDB entries with their annotated ligand information and active site residues, information on catalytic mechanisms, and links to other databases, such as UniProtKB, CATH, KEGG, PDBsum, and PubMed.

References

Please cite the following references:

Nagano N., Nakayama N., Ikeda K., Fukuie M., Yokota K., Doi T., Kato T., Tomii K. (2015) "EzCatDB: the enzyme reaction database, 2015 update." Nucleic Acids Research, 43, D453-458.

Nagano N., Noguchi T., Akiyama Y. (2007) "Systematic Comparison of Catalytic Mechanisms of Hydrolysis and Transfer Reactions Classified in the EzCatDB Database." PROTEINS: Structure, Function, and Bioinformatics. 66, 147-159.

Nagano N., Nakagawa Z., Arita M., Tsukamoto K., Noguchi T. (2007) "EzCatDB" Nucleic Acids Research, Molecular Biology Database Collection entry number 613.
Nucleic Acids Research, Molecular Biology Database on-line compilation 2005, 6.1. Enzymes and Enzyme Nomenclature, 613.

Nozomi Nagano (2005) "EzCatDB: The Enzyme Catalytic-Mechanism Database." Nucleic Acids Research, 33, Database Issue, D407-D412.

Main features of the EzCatDB Database

Search page: The search system of this database allows specification, in various ways, of the enzymes that are sought (see below).

Hierarchic Classification of Catalytic Mechanisms, RLCP: This is a novel classification of enzyme catalytic mechanisms. It classifies catalytic mechanisms at four levels: basic Reaction (R), Ligand group involved in catalysis (L), Catalysis type (C), and residues/cofactors located on Proteins (P). "Basic reactions" are reaction types such as hydrolysis, phosphorolysis, and transfer, which are related mostly to the primary number of the E.C. numbers. Reactive parts of ligand molecules are considered at the second level (L). At the third level (C), the catalytic mechanisms are classified systematically based on information that is related to existence of cofactors, types of nucleophiles, acid, base, stabilizer, and modulator, SN2/SN1 (or associative/dissociative) reactions, and on how these catalytic groups function. At the fourth level (P), the residues/cofactors of proteins involved in catalytic reactions are classified.
(see RLCP Tutorial)

Table of ligand annotation: For each PDB entry, annotation of ligand molecules that are bound to the enzyme structures was performed by hand, then tabulated for corresponding cofactors, substrates, and products. The table also includes intermediate and transition-state data.

CatDom: Enzyme Catalytic Domains; Released (March 2005): This page collects catalytic domain structures from EzCatDB entries, with a hierarchic scheme. These structures are classified based on CATH classification by Professor Orengo at University College London, which clusters protein domains at four levels: Class(C), Architecture(A), Topology(T) & Homologous superfamily(H). The upper three levels of the hierarchical classificaiton correspond to so-called domain fold, such as Rossmann fold (CATH 3.40.50.-) and TIM barrel fold (CATH 3.20.20.-).

Compound list; Released (June 2005): This page lists compounds from EzCatDB entries. The compound data are based on KEGG COMPOUND code. For these data, figures of chemical structures, which have been produced by ARM Draw, developed by Dr Masanori Arita for Metabolomics project, are also available at the compound sites (06/06/2006-).
Those compound data, which are not included in the KEGG COMPOUND, are assigned six-digit codes that begin with "L". For those compound data, a mol format file is also available at the compound sites. Moreover, intermediate data, which have six-digit codes that begin with "I", are also available.

E.C. number; Released (August 2005): This page lists E.C. numbers from EzCatDB entries, with a hierarchic scheme. In the fifth level, related enzyme data (DB codes, enzyme name, and information on catalytic domain) are listed for each E.C. number. (see Enzyme Nomenclature at IUBMB)

UniProtKB entry list (Swiss-prot released in June 2006; and released as UniProt list in August 2014): This page lists UniProtKB accession numbers from EzCatDB entries. (see UniProtKB)

Links

PDB: Protein Data Bank.
UniProtKB: Curated protein sequence database (Protein knowledgebase) at the European Institute of Bioinformatics (EBI).
KEGG: Enzyme and Chemical Compound database at KEGG (Kyoto Encyclopedia of Genes and Genomes).
CATH: Protein Structure Classification database at UCL.
PDBsum: PDB data summary database at EMBL-EBI.
PDBj: Protein Data Bank Japan at the Protein Research Institute, Osaka Univ.
PubMed: Online bibliographic database at NCBI.
Catalytic Site Atlas (CSA): Database of enzyme catalytic residues in enzymes of 3D structures at EMBL-EBI.
MACiE: Database of enzyme reaction mechanisms at EMBL-EBI.

New links (April 2016 ~)
Protein-related databases (through UniProtKB)
Pfam: Protein family database at EMBL-EBI.
RefSeq: NCBI Reference Sequence Database.
MEROPS: the Peptidase database at Sanger Center.
CAZy: the Carbohydrate-Active enZYmes Database.
ESTHER: the database of the alpha/beta-hydrolase fold superfamily of proteins at Inra, France.

PoSSuM: Pocket Similarity Search using Multi-Sketches.

Compound-related databases
PubChem: NCBI Compound Database.
ChEBI: Chemical Entities of Biological Interest; the Compound database at EMBL-EBI.

Summary of each EzCatDB entry

An example of an entry: Cytidylate kinase

DB code: entry ID in this database.

RLCP catalysis type: Classification of catalytic mechanism (see above).

CATH: Enzymes in this database are classified based on CATH homologous family (or superfamily). (Catalytic domains are annotated for CATH domains.)

E.C.: Enzyme Commision number.

CSA: Related entries (with literature) & links to Catalytic Site Atlas.

MACiE: Related entry & link to MACiE database.

Related DB codes: Links to other entries in this database that belong to the same homologous family.

Enzyme Name: Nomenclature annotated in the UniProtKB and KEGG Enzyme databases.

KEGG pathways: Related metabolic pathways, along with links to the KEGG metabolic pathway databases.

UniProtKB information: Basic information on activity and cofactor, based on which EzCatDB data are annotated, along with links to the corresponding UniProt data.

Table of ligand annotation: Annotation of ligand molecules of PDB structures have been performed by hand and tabulated for corresponding cofactors, substrates, and products. For multidomain enzymes, the PDB chains are subdivided into several domains based on the CATH domain definition. Ligand-binding sites are annotated based on those domains.

Active-site residues: Catalytic residues, metal-binding residues, modified residues, and information regarding mutation are annotated for each PDB structure. For multidomain enzymes, the PDB chains are subdivided into several domains based on the CATH domain definition. The residues are annotated based on those domains.
3D-structures of active-sites for each PDB entry can be viewed by molecular viewing softwares, such as Rasmol and PyMOL. The active site at a whole PDB file or at chain level can be selected. Please see the description below for the usage of Rasmol, if you cannot use them properly.
Rasmol ID

This icon indicates molecular graphic viewing for the active site of the corresponding PDBid. By clicking this icon, viewer, such as Rasmol, will start.
Rasmol chain

This icon indicates molecular graphic viewing for the active site of the corresponding PDB chain. By clicking this icon, viewer, such as Rasmol, will start.
mmCIF ID

This icon indicates the mmCIF data for the active site of the corresponding PDBid. By clicking this icon, the mmCIF data in text format can be downloaded.
mmCIF chain

This icon indicates the mmCIF data for the active site of the corresponding PDB chain. By clicking this icon, the mmCIF data in text format can be downloaded.

Catalytic mechanism: References on catalytic mechanism are listed.

References: Related references, with links to the PubMed abstract page.

Comments: These give additional information, particularly with regard to catalytic mechanisms. Such information can also be included if the 3D-structures of the catalytic domains are indeterminate.

RDF files: for each entry, RDF files in N-triples, Json-LD, XML, and Turtle are now available. These files can be downloaded in each entry.

To view 3D-structures of active-sites
For Rasmol users:
Please install Rasmol in your machine from the Home page for Rasmol.
For Windows machines, you will have to associate the extension ".rsm" files with raswin.exe.
Please click on the "Default Programs" from Start Menu, and select and "Association a file type or protocol with a program" from the "Default Programs". Please confirm that "Current Default" is raswin.exe for Name of ".rsm" file. If you changed it, please click on "Save".

In addition to Rasmol, other molecular graphic viewers, such as PyMOL and Mercury, can be used for the active-site viewing.

How to use the

For sequence search, please use either EzCat-BLAST or EzCat-FORTE. EzCat-BLAST is suitable for quick searches, whereas EzCat-FORTE is for searching remote homologues.

DB code: You can enter the code in the "DB code" column and click the "search" button at your right if you already have a DB code for this database.

You can use the form below to specify the enzymes to look for:

Enzyme Name in UniProtKB: To search by a specific "protein name" or "synonyms" in the UniProtKB data, enter it into this column and click the "search" button below.

Enzyme Name in KEGG: To search by a specific "name" in KEGG enzyme data, enter it in this column and press the "search" button below.

E.C. number: You can input the number in the four rows of "E.C." if you have a specific Enzyme Commission (E.C.) (four-digit) number. Here, you can abbreviate some numbers. For instance, you can search by E.C. number, "3.2.-.-" or "-.2.1.-", by leaving the third and fourth rows or the first and fourth rows empty, respectively (see Enzyme Nomenclature at IUBMB).

CATH domain (structure classification): You can input CATH (four-digit) numbers in the four rows to search by a specific protein domain structure. Even without such numbers, to look for enzymes that have alpha-structure domain, beta-structure domain, mixed alpha-beta structure domain, or few secondary structures, you can input 1, 2, 3, or 4, respectively, in the first row, and leave the others empty.

PDB: You can search using a specific PDB entry code (four-digit code).

KEGG pathway: You can search for enzymes that are involved in a specific metabolic pathway using a KEGG metabolic pathway code, which is eight-digit code that begin with "map".

UniProtKB; Accession number: You can search for a specific UniProtKB accession number (six-digit code).

Active-site residues: You can input as many as three active-site residues, such as catalytic residues and metal-binding residues, by three-letter codes of amino acids (such as ASP and HIS).

PubMed ID: You can search for enzyme data that are cited in a specific paper, by a specific reference ID from the PubMed bibliographic database.

Author name (reference): You can search for enzyme data that are cited in specific papers, by the author family name (such as "Nagano").

Key word (reference title): You can search for enzyme data that are included in specific papers, using the key words in the title.

KEGG compound ID: You can search for enzymes that interact with a specific compound or ligand as cofactors, substrates, or products. This search is conducted using a six-digit KEGG Compound code that begins with "C".

Ligand names: You can search for enzymes that interact with a specific ligand as cofactors, substrates, or products. You can input up to three ligand names in the right columns. Those ligand molecules can be specified by selecting "cofactors", "substrates", or "products" from the left menu.

Ligand types: You can search for enzymes that interact with a specific ligand as cofactors, substrates, or products. You can select up to three ligand types (such as "amino acids", "nucleotide", and "polysaccharide") from the right menu. Those ligand molecules can be specified by selecting "cofactors", "substrates", or "products" from the left menu.

Ligand annotation types: You can search for enzyme data in which all ligands (for cofactors, substrates, products, or intermediates) have been annotated as "Unbound" for all PDB data, by selecting "All Unbound" from the right-hand menu, for each category (cofactors, substrates, products, or intermediates), selected from the left-hand menu. The entries, in which any ligand molecule has been annotated as "Unbound" for all the PDB data, can be retrieved by selecting "Unbound", for each category. Furthermore, because it is difficult to annotate some ligands such as water molecules (H2O) and H+ (proton) ion, the entries with any ligand that cannot be annotated can be retrieved by selecting "Not annotated". Selecting "All Unbound/Not annotated" from the right menu can retrieve entries with either 'Unbound' or 'Not annotated' for all ligands.
Moreover, enzyme data are searchable for any ligand (for cofactors, substrates, products, or intermediate) that has been annotated as "Bound" or "Analogue" for any PDB data, by selecting "Bound" or "Analogue Bound", respectively, from the right-hand menu, for each category (cofactors, substrates, products, or intermediates), selected from the left-hand menu. Selecting "Bound or Analogue Bound" can retrieve entries with either "Bound" or "Analogue Bound" similarly.
For all categories (cofactors, substrates, products, or intermediates), "-" can be selected from the left-hand menu. Regarding substrates and products, all entries are presumed to belong either to the "All Unbound/Not annotated" category or to the "Bound or Analogue Bound" one.

Catalytic domain: Enzyme data are searchable for those whose catalytic domain structures have been determined and annotated, by checking "include". You can search for data by checking "exclude" to exclude those enzyme data. "All" is selected as the default to include all data.

Active-site residues annotated: You can search for enzyme data whose active-site residues have been annotated by checking "include". You can search for data by checking "exclude" to exclude those enzyme data. "All" is selected as the default to include all data.

Literature for catalytic mechanisms: You can search for enzyme data from any publication that has reported on catalytic mechanisms by checking "include". You can search for data by checking "exclude" to exclude those enzyme data. "All" is selected as the default to include all data.

CSA: You can search for the enzyme data, with corresponding entries in Catalytic Site Atlas (CSA) to this database, by checking the "include". You can search by checking "exclude" to avoid those enzyme data. "All" is selected as the default to include all data.

MACiE: You can search for the enzyme data, with a corresponding entry in MACiE to this database, by checking the "include". You can search by checking "exclude" to exclude those enzyme data. "All" is selected as the defautl to include all data.

RLCP classification: You can search for enzyme data for which a catalytic mechanism has been classified in RLCP classification, by checking "include". You can search by checking "exclude" to avoid those enzyme data. "All" is selected as the default to include all data.

The above information can be combined to specify enzymes in the following way:
To search for those enzymes with alpha-beta structures, E.C. 3.1.-.-, and magnesium as cofactor, you can: input "3" in the first row of CATH, "3" and "1" in the first and second rows of E.C., respectively; then select 'cofactor' from the left menu of ligand name, and input 'magnesium' in its right-hand column. You then obtain the enzyme data list by clicking the select button.

Update info

Annotated by

Hiroko Inoura (2017-)

Naoko Nakayama (2012-2016)

Fumi Osawa (2009-2012)

Yuko Chujo (2008-2010)

Yixuian Song (2008-2010)

Aiko Hiraki (2007-2009)

Eri Yoshimoto (2008-2009)

Zen-ichi Nakagawa (2005-2008)

Fuyan Sun (2005-2006)

Mayumi Hisanaga (2005-2006)

Yuko Hasegawa (2002-2005)

Keitarou Nonaka (2003-2004)

Kenji Morita (2002-2003)

Munehiro Sugiyama (2002)

Junko Some (2003)

System developed & maintained by

Shota Matsumoto (Lifematics)(2018 - )

Daisuke Satoh (Level Five Co. Ltd.; Lifematics)(2015 - 2018)

Takuo Doi (Level Five Co. Ltd.; Lifematics)(2013 - )

Dr Kiyonobu Yokota (Level Five Co. Ltd.)(2013 - 2016)

Dr Kazuyoshi Ikeda (Level Five Co. Ltd.)(2013 - 2016)

Masaru Fukuie (Level Five Co. Ltd.)(2013 - 2015)

Naofumi Sakaya (IMS Lab.Inc.)(2002 - 2011)