4. Quaternary structure :

Quaternary structure is the structure formed by several protein molecules (polypeptide chains), usually called protein subunits, which function as a single protein complex. The individual subunits are usually not covalently connected, but might be connected by a disulphide bond. Not all proteins have quaternary structure, since they might be functional as monomers. The quaternary structure is stabilized by the same range of interactions as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically, it would be cafled a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. The subunits are usually related to one another by symmetry axes, such as a 2-fold axis in a dimer. Multimers made up of identical subunits may be referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits may be referred to with a prefix of "hetero-" (e.g. a heterotetramer, such as the two alpha and two beta chains of haemoglobin). Changes in quaternary structure can occur through conformational changes within individual subunits or through reorientation of the subunits relative to each other. It is through such changes, which underlie cooperativity and allostery in "multimeric" enzymes, that many proteins undergo regulation and perform their physiological function.

Many proteins share structural similarities, reflecting, in some cases, common evolutionary origins. The evolutionary-process involves substitutions, insertions and deletions in amino acid sequences. For distantly related proteins, such changes can be extensive, yielding folds in which the numbers and orientations of secondary structures vary considerably. However, where, for example, the functions of proteins are conserved, the structural environments of critical active site residues are also conserved.

In an attempt to better understand sequence/structure relationships and the underlying evolutionary processes that give rise to different fold families, a variety of structure classification schemes have been established. The nature of the information presented by a structure classification scheme is entirely dependent on the underlying philosophy of the approach, and hence on the methods used to identify and evaluate structural similarity. Structural families derived, for example, using algorithms that search and cluster on the basis of common motifs will be different from those generated by procedures based on global structure comparison; and the results of such automatic procedures will differ again from those based on visual inspection, where software tools are used essentially to render the task of classification more manageable.

Two well-known classification schemes are outlined in next page.

1. SCOP :

The SCOP (Structural Classification of Proteins) database maintained at the MRC Laboratory of Molecular Biology and Centre for Protein Engineering describes structural and evolutionary relationships between proteins of known structure. Because current automatic structure comparison tools cannot reliably identify all such relationships, SCOP has been constructed using a combination of manual inspection and automated methods. The task is complicated by the fact that protein structures show such variety, ranging from small, single domains to vast multi-domain assemblies. In some cases, (e.g., some modular proteins), it may be meaningful to discuss a protein structure at the same time both at the multi-domain level and at the level of its individual domains.

In the SCOP classification scheme proteins are classified in a hierarchical fashion to reflect their structural and evolutionary relatedness. Within the hierarchy there are many levels, but principally these describe the family, superfamily and fold. The boundaries between these levels may be subjective, but the higher levels generally reflect the-dearest structural similarities.

  • Family : Proteins are clustered into families with clear evolutionary relationships if they have sequence identities > 30%. But this is not an absolute measure - in some cases (e.g., the globins), it is possible to infer common descent from similar structures and functions in the absence of significant sequence identity (some members of the globin family share only 15% identity).
  • Superfamily : Proteins are placed in superfamilies when, in spite of low sequence identity, their structural and functional characteristics suggest a common evolutionary origin.
  • Fold: Proteins are classed as having a common fold if they have the same major secondary structures in the same arrangement and with the same topology, whether or not they have a common evolutionary origin. In these cases, the structural similarities could have arisen as a result of physical principles that favor particular packing arrangements and fold topologies.

SCOP is accessible for keyword search via the MRC Laboratory Web server.

http://scop.mrc-lmb.cam.ac.uk/scop/

2. CATH :

The CATH (Class, Architecture, Topology, Homology) database [http://www.cathdb.info/] is a hierarchical domain classification of protein structures maintained at UCL (University College London). The resource is largely derived using automatic methods, but manual inspection is necessary where automatic methods fail. Different categories within the classification are identified by means of both unique numbers (by analogy with the enzyme classification or E.C. system for enzymes) and descriptive names. Such a numbering scheme allows efficient computational manipulation of the data. There are five levels within the hierarchy:

  1. Class is derived from gross secondary structure content and packing. Four classes of domain are recognized:
    1. mainly α,
    2. mainly β,
    3. α-β, which includes both alternating α-βand α+β structures, and
    4. those with low secondary structure content.
  2. Architecture describes the gross arrangement of secondary structures, ignoring their connectivities; it is currently assigned manually using simple descriptions of the secondary structure arrangements (e.g., barrel, roll, sandwich, etc.).
  3. Topology gives a description that encompasses both the overall shape and the connectivity of secondary structures. This is achieved by means of structure comparison algorithms that use empirically derived parameters to cluster the domains. Structures in which at least 60% of the larger protein matches the smaller are assigned to same topology level.
  4. 35% sequence identity and are thought to share a common ancestor, i" lfo="4" listitemdepth="1" orderedlistitem="true"> Homology groups domains that share > 35% sequence identity and are thought to share a common ancestor, i.e. are homologous. Similarities are first identified by sequence comparison and subsequently by means of a structure comparison algorithm.
  5. Sequence provides the final level within the hierarchy, whereby structures within homology groups are further clustered on the basis of sequence identity. At this level, domains have sequence identities >35% (with at least 60% of the larger domain equivalent to the smaller), indicating highly similar structures and functions.

CATH is accessible for keyword interrogation via UCL's Biomolecular Structure and Modelling Unit Web server.