Two well-known classification schemes are outlined below.
1. SCOP :
The SCOP (Structural Classification of Proteins) database maintained at the MRC Laboratory of Molecular Biology and Centre for Protein Engineering describes structural and evolutionary relationships between proteins of known structure. Because current automatic structure comparison tools cannot reliably identify all such relationships, SCOP has been constructed using a combination of manual inspection and automated methods. The task is complicated by the fact that protein structures show such variety, ranging from small, single domains to vast multi-domain assemblies. In some cases, (e.g., some modular proteins), it may be meaningful to discuss a protein structure at the same time both at the multi-domain level and at the level of its individual domains.
In the SCOP classification scheme proteins are classified in a hierarchical fashion to reflect their structural and evolutionary relatedness. Within the hierarchy there are many levels, but principally these describe the family, superfamily and fold. The boundaries between these levels may be subjective, but the higher levels generally reflect the-dearest structural similarities.
- Family : Proteins are clustered into families with clear evolutionary relationships if they have sequence identities > 30%. But this is not an absolute measure - in some cases (e.g., the globins), it is possible to infer common descent from similar structures and functions in the absence of significant sequence identity (some members of the globin family share only 15% identity).
- Superfamily : Proteins are placed in superfamilies when, in spite of low sequence identity, their structural and functional characteristics suggest a common evolutionary origin.
- Fold: Proteins are classed as having a common fold if they have the same major secondary structures in the same arrangement and with the same topology, whether or not they have a common evolutionary origin. In these cases, the structural similarities could have arisen as a result of physical principles that favor particular packing arrangements and fold topologies.
SCOP is accessible for keyword search via the MRC Laboratory Web server.
http://scop.mrc-lmb.cam.ac.uk/scop/
2. CATH :
The CATH (Class, Architecture, Topology, Homology) database [http://www.cathdb.info/] is a hierarchical domain classification of protein structures maintained at UCL (University College London). The resource is largely derived using automatic methods, but manual inspection is necessary where automatic methods fail. Different categories within the classification are identified by means of both unique numbers (by analogy with the enzyme classification or E.C. system for enzymes) and descriptive names. Such a numbering scheme allows efficient computational manipulation of the data. There are five levels within the hierarchy:
- Class is derived from gross secondary structure content and packing. Four classes of domain are recognized:
- mainly α ,
- mainly β ,
- α-β , which includes both alternating α-β and α+β structures, and
- those with low secondary structure content.
- Architecture describes the gross arrangement of secondary structures, ignoring their connectivities; it is currently assigned manually using simple descriptions of the secondary structure arrangements (e.g., barrel, roll, sandwich, etc.).
- Topology gives a description that encompasses both the overall shape and the connectivity of secondary structures. This is achieved by means of structure comparison algorithms that use empirically derived parameters to cluster the domains. Structures in which at least 60% of the larger protein matches the smaller are assigned to same topology level.
- 35% sequence identity and are thought to share a common ancestor, i" lfo="4" listitemdepth="1" orderedlistitem="true"> Homology groups domains that share > 35% sequence identity and are thought to share a common ancestor, i.e. are homologous. Similarities are first identified by sequence comparison and subsequently by means of a structure comparison algorithm.
- Sequence provides the final level within the hierarchy, whereby structures within homology groups are further clustered on the basis of sequence identity. At this level, domains have sequence identities >35% (with at least 60% of the larger domain equivalent to the smaller), indicating highly similar structures and functions.
CATH is accessible for keyword interrogation via UCL's Biomolecular Structure and Modelling Unit Web server.
The current experiment is designed to investigate the structure of proteins using on-line tools for protein classification viz. CATH and SCOP.