OPERATING INSTRUCTIONS
                          ----------------------

			      PROCHECK v.3.0

    Programs to check the Stereochemical Quality of Protein Structures

	 Roman A Laskowski*, Malcolm W MacArthur*, David K Smith^,
   David T Jones*, E Gail Hutchinson*, A Louise Morris*, Dorica Naylor,
		     David S Moss# & Janet M Thornton*


		* Biomolecular Structure and Modelling Unit
	     Department of Biochemistry and Molecular Biology
			    University College
			       Gower Street
			     London WC1E 6BT,
				 ENGLAND.

		       # Crystallography Department
			     Birkbeck College
			       Malet Street
			      London WC1E 7HX
				 ENGLAND.

			     ^ NMR Laboratory
		      Biomolecular Research Institute
			     381 Royal Parade
				Parkeville
			       Victoria 3052
				AUSTRALIA.


      (The preparation of this suite of programs was partly funded by
			   Oxford Molecular Ltd)

			        March 1994



1. Introduction
---------------

These Operating Instructions describe how to run the PROCHECK suite of
programs (Laskowski et al., 1993) for assessing the "stereochemical
quality" of a given protein structure. The instructions assume that the
programs have already been installed on your computer system. If this is
not the case, please refer to the separate Installation Guide which deals
with the installation procedures for your particular system.

The aim of PROCHECK is to assess both the overall stereochemical quality of
a given protein structure, as compared with well-refined structures at the
same resolution, and to give an indication of its local, residue-by-residue
reliability.

To assess a structure, the program makes use of a number of parameters that
have been found to be good indicators of stereochemical quality, described
in detail in Morris et al. (1992). These parameters, which are for the most
part not included in standard refinement procedures (and so are less likely
to be biased by them), are listed in Table 1 of Appendix A.

The checks also make use of "ideal" bond lengths and bond angles, as
derived from a recent and comprehensive analysis (Engh & Huber, 1991) of
small molecule structures in the Cambridge Structural Database, CSD (Allen
et al., 1979) - now numbering over 100,000 structures. These "ideal" values
are listed in Table 2 of Appendix A.

The PROCHECK programs produce a number of plots, together with a detailed
residue-by-residue listing. These are described in Appendices D and E. The
plots are output in PostScript format (Adobe Systems Inc., 1985), and so
can be printed off on a PostScript laser printer, or displayed on a
graphics screen using suitable software (eg GHOSTSCRIPT on Sun
workstations, or PSVIEW on Silicon graphics IRIS-4D systems). The plots can
be in colour or black-and-white.

The input to PROCHECK is a single file containing the coordinates of your
protein structure. This must be in Brookhaven file format (see Appendix
B). One of the by-products of running PROCHECK is that your coordinates
file will be "cleaned up" by the first of the programs. The cleaning-up
process corrects any mislabelled atoms and creates a new coordinates file
which has a file-extension of .new. The .new file will have the atoms
labelled in accordance with the IUPAC naming conventions (IUPAC-IUB
Commission of Biochemical Nomenclature, 1970).

The suite comprises 7 programs which are described in Appendix C. Below are
given instructions on running and customizing PROCHECK.



---------------------------------------------------------------------------

2. How to run PROCHECK
----------------------

To run PROCHECK, type the following:-

		procheck  filename  [chain-ID]  resolution

where: filename is the name of the file containing your protein structure
and should include the full path unless the file is in the default
directory; [chain-ID] is the single-letter chain ID of the chain to be
analysed (this parameter is optional and can be left out); and resolution
is the resolution at which the structure has been determined.

For example, to run the check on the Brookhaven file 1gcr, which was solved
to 1.6A you might enter:

		    procheck  /data/pdb/p1gcr.pdb  1.6

For structures solved using NMR techniques, for which the term "resolution"
does not apply, use some nominal value such as 2.0A. For model-built
structures, use the resolution of the structure(s) on which the model has
been based. Note that, in this case, it is also useful to run PROCHECK on
the starting structure beforehand to assess its likely reliability.

As mentioned above, the input file must be in Brookhaven format (Appendix B).




Running in batch-mode on a VAX
------------------------------

On VAX computers, you can submit the entire process to a batch queue by
entering:-

  prosub  filename  [chain-ID]  resolution  queue-name  default directory

where queue-name is the name of the batch-queue to which you want the job
submitted, and default directory is the directory in which the files
created by PROCHECK are to be put. Note, this need not be the same
directory as the one containing the structure file, but it must be one to
which you have write access.

Alternatively, if you just type prosub you will be prompted for each of the
5 parameters in turn. If you do not want to specify a chain ID, leave this
entry blank.



---------------------------------------------------------------------------

3. Outputs produced by PROCHECK
-------------------------------

PROCHECK generates a number of output files in the default directory which
have the same name as the original PDB file, but with different extensions,
as follows:-

     PostScript plot files and listing
     ---------------------------------
       _01.ps   Plot files (numbered 01 to nn) in PostScript format
          .                            . . .
          .                            . . .
       _nn.ps                          . . .

        .out    The residue-by-residue listing (ie text file for printing)

     Other files
     -----------
        .lan    Main-chain bond lengths and bond angles used by the
                plotting programs
        .nb     List of atom-pairs making near-neighbour contacts
        .new    "Cleaned-up" version of the original coordinates file
        .pln    Coordinates of atoms in planar groups
        .rin    Residue information used by the plotting programs
        .sco    Main-chain and side-chain properties
        .sdh    Residue-by-residue G-factors


The plot files (_nn.ps) can be sent directly to a PostScript printer or
viewed on a graphics or X Windows terminal. The .out listing file is a text
file which can be sent directly to a line-printer.

Of the others, most are used internally by the PROCHECK suite, and the only
one that is likely to be of use to you is the .new file. This holds the
`cleaned-up' version of the original PDB file, with any wrong atom-labels
corrected in accordance with the IUPAC naming conventions (IUPAC-IUB
Commission of Biochemical Nomenclature, 1970).

Each program in the suite also produces its own log file. Should the
PROCHECK suite crash, or give strange-looking results, these log files
should be the first place you look for a reason for the problem. The 7
files are:

	  anglen.log      bplot.log       clean.log       nb.log
          pplot.log       secstr.log      tplot.log



---------------------------------------------------------------------------

4. Running just the plotting program
------------------------------------

The plots and residue-by-residue listing are produced by the final 3
programs in the suite: TPLOT.F, PPLOT.F and BPLOT.F. Most of the
calculations are performed by the 4 programs that are run beforehand, and
these constitute the most CPU-intensive part of the process. If you need to
produce plots of the same structure several times (perhaps using different
parameters each time), you can save time by not running these other
programs each time; you can run just the plotting programs on their own by
using the command proplot. This has the same two, or three, parameters as
before, namely:

		 proplot  filename  [chain-ID]  resolution


Running just the plotting programs can save a lot of time but, of course,
requires that the entire suite has been run at least once on the current
structure of interest so that the files listed in section 3 above have been
generated.



---------------------------------------------------------------------------


5. Customizing PROCHECK
-----------------------

The plots and listing produced by PROCHECK can be customized to a limited
extent by amending the parameters in a file called procheck.prm. The file
is created in the default directory when you first run PROCHECK. You can
then amend it using any text editor.

In the file there is a separate section for each of the plots produced, as
well as sections governing general parameters. These will be described
below. The first line of the file gives the PROCHECK version number.  If a
subsequent version has altered the format in any way, the programs will
detect this as an error and you will need to delete your existing copies of
procheck.prm so that the programs can create new versions in the correct
format.


5.1. General options
--------------------

The first few options are general options covering all the plots.

+---------------------------------------------------------------------------+
| Colour all plots?                                                         |
| -----------------                                                         |
| N    <- Produce all plots in colour (Y/N)?                                |
|                                                                           |
| Which plots to produce                                                    |
| ----------------------                                                    |
| Y     <-  1. Ramachandran plot (Y/N)?                                     |
| Y     <-  2. Gly & Pro Ramachandran plots (Y/N)?                          |
| Y     <-  3. Chi1-Chi2 plots (Y/N)?                                       |
| Y     <-  4. Main-chain parameters (Y/N)?                                 |
| Y     <-  5. Side-chain parameters (Y/N)?                                 |
| Y     <-  6. Residue properties (Y/N)?                                    |
| Y     <-  7. Main-chain bond length distributions (Y/N)?                  |
| Y     <-  8. Main-chain bond angle distributions (Y/N)?                   |
| Y     <-  9. RMS distances from planarity (Y/N)?                          |
| Y     <- 10. Distorted geometry plots (Y/N)?                              |
+---------------------------------------------------------------------------+

Produce all plots in colour. This option determines whether all the plots
are to be produced as colour or black-and-white PostScript files. If the
option is set to Y, then colour files will be generated for all plots. If
it is set to N, each plot will be generated in colour or black-and-white
depending on the colour option for that particular plot (see below). The
colours themselves can be altered as will be described below.

Which plots to produce. Each of the 10 plots can be individually switched
on or off by entering Y or N here, respectively.


5.2. Plot options
-----------------

The next 10 sets of options apply specifically to each of the 10 plots
generated by PROCHECK. The plots are described in detail in Appendix
D.

5.2.1. Ramachandran plot
       -----------------

+---------------------------------------------------------------------------+
| 1. Ramachandran plot                                                      |
| --------------------                                                      |
| Y     <- Shade in the different regions (Y/N)?                            |
| Y     <- Print the letter-codes identifying the different regions (Y/N)?  |
| Y     <- Draw line-borders around the regions (Y/N)?                      |
| N     <- Show only the core region (Y/N)?                                 |
| 1     <- Label residues in: 0=disallow,1=generous,2=allow,3=core regions  |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| WHITE        <- Region 0: Disallowed                                      |
| CREAM        <- Region 1: Generous                                        |
| YELLOW       <- Region 2: Allowed                                         |
| RED          <- Region 3: Most favourable, core, region                   |
| BLACK        <- Colour of markers in favourable regions                   |
| RED          <- Colour of markers in unfavourable regions                 |
+---------------------------------------------------------------------------+

Shade in the different regions. Determines whether the different
regions of the Ramachandran plot are to be shaded in. Without shading, the
regions can still be made out if their borders are drawn in (see below). In
black-and-white, the shading shows the most favourable regions in the
darkest grey, with the less favourable regions being shown in progressively
lighter tones. If you are only interested in the core region, then set the
option Show only the core region (see below) to Y.

Print the letter-codes. Allows you to label the different regions of the
Ramachandran plot using the appropriate codes: A, B, L, etc.

Draw line-borders around the regions. Determines whether borders are drawn
round the different regions of the Ramachandran plot. If you are only
interested in the core region, then set the option Show only the core
region below to Y.

Show only the core region. Allows you to show only the core regions (ie the
most favoured regions: A, B, and L) of the Ramachandran plot. This gives an
easy-to-see guide as to how many residues are inside and outside the cores.

Label: 0=disallow, 1=generous, 2=allowed, 3=core. Determines which
residues are labelled on the Ramachandran the plot.  Option 0 labels
only the residues in the "disallowed" regions.  Option 1 labels
residues in the disallowed and "generous" regions.  And so on. In all
cases, Gly residues are left unlabelled and are identified by a triangular
marker.

Produce a COLOUR PostScript file. Determines whether a colour or
black-and-white plot is required. Note that if the Colour all plots option
is set to Y above, a colour PostScript file will be produced irrespective
of the setting here.

The colour definitions on the following lines use the 'names' of each
colour as defined in the colour table at the bottom of the file (see
section 5.4 below). If the name of a colour is mis-spelt, or does not
appear in the colour table, then white will be taken instead. Each colour
can be altered to suit your taste and aesthetic judgement, as described in
section 5.4 below.

The first of the defined colours, the Background colour, defines the
background colour of the page on which the plots are drawn.

Colours. The various additional colours on the lines that follow define the
colour of each of the 4 different regions on the Ramachandran plot, and the
colours of the markers that are located in the favourable and unfavourable
regions.


5.2.2. Gly \& Pro Ramachandran plots
       -----------------------------

+---------------------------------------------------------------------------+
| 2. Gly & Pro Ramachandran plots                                           |
| -------------------------------                                           |
| -4.0  <- Cut-off value for labelling of residues                          |
| N     <- Plot all 20 Ramachandran plots (Y/N)?                            |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| CREAM        <- Lightest shaded regions on plots                          |
| GREEN        <- Darkest shaded regions on plots                           |
| BLACK        <- Colour of crosses in favourable regions                   |
| RED          <- Colour of crosses in unfavourable regions                 |
+---------------------------------------------------------------------------+

Cut-off value for labelling of residues. G-factor values below which
residues on the Ramachandran plot will be labelled. The G-factor gives a
measure of how far from the 'normal' regions of the plot each residue
lies. Low G-factors indicate residues in unlikely conformations.

Plot all 20 Ramachandran plots. Allows you to show a separate phi-psi
Ramachandran plot for each of the 20 different residue types, rather than
just for Gly and Pro.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the lightest- and darkest-shaded regions of
the plot, and also the colours of the residue markers in the favourable and
unfavourable regions.

5.2.3. chi1-chi2 plots
       ----------------

+---------------------------------------------------------------------------+
| 3. Chi1-Chi2 plots                                                        |
| ------------------                                                        |
| -4.0  <- Cut-off value for labelling of residues                          |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| CREAM        <- Lightest shaded regions on plots                          |
| GREEN        <- Darkest shaded regions on plots                           |
| BLACK        <- Colour of crosses in favourable regions                   |
| RED          <- Colour of crosses in unfavourable regions                 |
+---------------------------------------------------------------------------+

Cut-off value for labelling of residues. G-factor values below which
residues on the chi1-chi2 plot will be labelled. The G-factor gives a
measure of how far from the 'normal' regions of the plot each residue
lies. Low G-factors indicate residues in unlikely conformations.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the lightest- and darkest-shaded regions of
the plot, and also the colours of the residue markers in the favourable and
unfavourable regions.

5.2.4. Main-chain parameters
       ---------------------

+---------------------------------------------------------------------------+
| 4. Main-chain parameters                                                  |
| ------------------------                                                  |
| Y     <- Background shading (Y/N)?                                        |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| CREAM        <- Background shading on each graph                          |
| PURPLE       <- Colour of band                                            |
+---------------------------------------------------------------------------+

Background shading. Defines whether the background of each plot is to be
lightly shaded when the plot is in black-and-white.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the background of each plot, and the band
representing the range of values expected at each resolution.

5.2.5. Side-chain parameters
       ---------------------

+---------------------------------------------------------------------------+
| 5. Side-chain parameters                                                  |
| ------------------------                                                  |
| Y     <- Background shading (Y/N)?                                        |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| CREAM        <- Background shading on each graph                          |
| PURPLE       <- Colour of band                                            |
+---------------------------------------------------------------------------+

Background shading. Defines whether the background of each plot is to be
lightly shaded when the plot is in black-and-white.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the background of each plot, and the band
representing the range of values expected at each resolution.

5.2.6. Residue properties
       ------------------

+---------------------------------------------------------------------------+
| 6. Residue properties                                                     |
| ---------------------                                                     |
| 1 2 3    < Which 3 main graphs to be printed (see Note 1 for full list)   |
| Y     <- Background shading on main graphs (Y/N)?                         |
| 2.0   <- Number of standard deviations for highlighting                   |
| Y     <- Show shading representing estimated accessibility (Y/N)?         |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour                                         |
| CREAM        <- Background shading on main graphs                         |
| PURPLE       <- Colour of histogram bars on main graphs                   |
| RED          <- Colour of highlighted histogram bars                      |
| BLUE         <- Minimum accessibility colour (buried regions)             |
| WHITE        <- Maximum accessibility colour                              |
| RED          <- Region 0: Disallowed                                      |
| PINK         <- Region 1: Generous                                        |
| GREEN        <- Region 2: Allowed                                         |
| SKY BLUE     <- Region 3: Most favourable, core, region                   |
| YELLOW       <- Colour for favourable G-factor scores                     |
| RED          <- Colour for unfavourable G-factor scores                   |
| YELLOW       <- Colour for schematic of the secondary structure           |
+---------------------------------------------------------------------------+

Which 3 main graphs to be printed. These three options define which 3 out
of the 11 possible graphs are to appear as the main graphs on the plot. The
eleven possibilities are:

1. Absolute deviation of chi1 torsion angle from the "ideal", excluding
Pro.

2. Absolute deviation of omega torsion angle from the "ideal".

3. Absolute deviation of zeta "virtual" torsion angle (defined by the atoms
Calpha-N-C-Cbeta) from the "ideal". This torsion angle provides a measure
of the Calpha chirality and, if negative, signifies a D-amino acid.

4. Absolute deviation of main-chain hydrogen bond energy from the "ideal".

5. B-value of gamma atom (O, C, or S - being whichever is used in
definition of the chi1 torsion angle).

6. Average B-value of main-chain atoms.

7. Average B-value of side-chain atoms.

8. G-factors for phi-psi distributions. Each residue's G(phi-psi)-factor
provides a measure of how 'normal' the phi-psi values for this residue type
are. Highlighted bars indicate low-probability conformations.

9. G-factors for chi1-chi2 distributions. Each residue's
G(chi1-chi2)-factor provides a measure of how 'normal' the chi1-chi2 values
for this residue type are. Highlighted bars indicate low-probability
conformations.

10. Residue-by-residue G-factor. The G-factor plotted here combines the
contributions from the dihedral angle G-factor, G(dih), and the main-chain
bond lengths and bond angles G-factor, G(cov).

11. Approximate accessibility, as estimated by each residue's Ooi
number. An Ooi number is a count of the number of other Calpha atoms within
a radius of, in this case, 14.0A of the given residue's own Calpha
(Nishikawa & Ooi, 1986).

Entering 0 for any of the 3 options will leave a blank region in place of a
graph at the corresponding position on the page.

Background shading. Defines whether the background of each of the 3 main
plots is to be lightly shaded when the plot is in black-and-white.

Number of standard deviations for highlighting. Determines which histogram
bars in the 3 main plots are to highlighted. Residues whose plotted
parameter deviates by more than the number of standard deviations specified
here will be highlighted.

Show shading for accessibility. Determines whether the secondary structure
diagram is to be plotted on a shaded background in which the shading shows
the buried and accessible regions of the structure. At present the
accessibility is estimated from each residue's Ooi number (see above), but
a proper accessibility calculation will be included in PROCHECK very soon.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the various colours to be used for the different parts of
the plot. These include the colouring of the 3 main graphs at the top of
the plot, the accessibility shading and colour of the secondary structure
diagram, the markers representing the different regions of the Ramachandran
plot, and the range of colours used on the chequer-board of the various
G-factors.

5.2.7. Main-chain bond length distributions
       ------------------------------------

+---------------------------------------------------------------------------+
| 7. Main-chain bond length distributions                                   |
| ---------------------------------------                                   |
| Y     <- Background shading on graphs (Y/N)?                              |
| 2.0   <- Number of standard deviations for highlighting                   |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour of page                                 |
| CREAM        <- Background shading on graphs                              |
| PURPLE       <- Colour of histogram bars                                  |
| RED          <- Colour of highlighted histogram bars                      |
+---------------------------------------------------------------------------+

Background shading. Defines whether the background of each plot is to be
lightly shaded when the plot is in black-and-white.

Number of standard deviations for highlighting. Determines which histogram
bars in the plot are to highlighted. Residues whose plotted parameter
deviates by more than the number of standard deviations specified here will
be highlighted.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the highlighted and unhighlighted histogram
bars, and the background colour of each graph.

5.2.8. Main-chain bond angle distributions
       -----------------------------------

+---------------------------------------------------------------------------+
| 8. Main-chain bond angle distributions                                    |
| --------------------------------------                                    |
| Y     <- Background shading on graphs (Y/N)?                              |
| 2.0   <- Number of standard deviations for highlighting                   |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour of page                                 |
| CREAM        <- Background shading on graphs                              |
| PURPLE       <- Colour of histogram bars                                  |
| RED          <- Colour of highlighted histogram bars                      |
+---------------------------------------------------------------------------+

Background shading. Defines whether the background of each plot is to be
lightly shaded when the plot is in black-and-white.

Number of standard deviations for highlighting. Determines which histogram
bars in the plot are to highlighted. Residues whose plotted parameter
deviates by more than the number of standard deviations specified here will
be highlighted.

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the highlighted and unhighlighted histogram
bars, and the background colour of each graph.

5.2.9. RMS distances from planarity
       ----------------------------

+---------------------------------------------------------------------------+
| 9. RMS distances from planarity                                           |
| -------------------------------                                           |
| Y     <- Background shading on graphs (Y/N)?                              |
| 0.03  <- RMS distance for highlighting for ring groups                    |
| 0.02  <-  "      "     "       "        "  other groups                   |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| WHITE        <- Background colour of page                                 |
| CREAM        <- Background shading on graphs                              |
| PURPLE       <- Colour of histogram bars                                  |
| RED          <- Colour of highlighted histogram bars                      |
+---------------------------------------------------------------------------+

Background shading. Defines whether the background of each plot is to be
lightly shaded when the plot is in black-and-white.

RMS distance for highlighting. Determines which histogram bars in the plot
are to be highlighted. Bars corresponding to RMS distances from planarity
greater than the values specified here will be highlighted. The two
different values apply separately to ring groups (Phe, Tyr, Trp and His)
and to planar end-groups (Arg, Asn, Asp, Gln and Glu).

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the highlighted and unhighlighted histogram
bars, and the background colour of each graph.

5.2.10. Distorted geometry plots
        ------------------------

+---------------------------------------------------------------------------+
| 10. Distorted geometry plots                                              |
| ----------------------------                                              |
| 0.05  <- Deviation from "ideal" bond-length (A)                           |
| 10.0  <- Deviation from "ideal" bond-angle (degrees)                      |
| 0.04  <- RMS distance from plane for ring atoms                           |
| 0.03  <-  "      "      "    "   for other atoms                          |
| N     <- Produce a COLOUR PostScript file (Y/N)?                          |
| CREAM        <- Background colour of page                                 |
| BLUE         <- Colour of "ideal" bond-lengths and angles                 |
| RED          <- Colour of actual bond-lengths/angles/planes               |
| GREEN        <- Colour for lettering showing differences from ideals      |
+---------------------------------------------------------------------------+

Deviation from 'ideal' bond length (A). Defines which main-chain bonds are
to be plotted as distorted. All bond-lengths that deviate by more than this
amount from the Engh & Huber 'ideal' values will be plotted.

Deviation from 'ideal' bond angle (degrees). Defines which main-chain bond
angles are to be plotted as distorted. All bond-angles that deviate by more
than this amount from the Engh & Huber 'ideal' values will be plotted.

RMS distance from plane. Defines which planar groups are to be plotted as
distorted. All planar groups that deviate by more than this amount from
planarity will be plotted. The two different values apply separately to
ring groups (Phe, Tyr, Trp and His) and to planar end-groups (Arg, Asn,
Asp, Gln and Glu).

Produce a COLOUR PostScript file. See section 5.2.1 above.

Colours. Define the colours for the 'ideal' and actual bond lengths and
angles and for the lettering showing the differences from the ideals.


5.3. Listing options
--------------------

The listing options affect the output in the .out file which gives the
residue-by-residue listing.

+---------------------------------------------------------------------------+
| Listing options                                                           |
| ---------------                                                           |
| Y   <- Print explanatory text at head of parameters listing (Y/N)         |
| N   <- Print only asterisks denoting deviations, and not the values (Y/N) |
| N   <- Print only highlighted residues (Y/N)                              |
| 0.0    <- Min. deviation from ideal required for parameter to be printed  |
| 66     <- Number of lines per page on parameters listing                  |
| Y   <- List the bad contacts (Y/N)                                        |
+---------------------------------------------------------------------------+

Print explanatory text. Allows you to include or suppress the printing of
the explanatory text which gives details such as the tables of "ideal"
values against which your structure has been compared, and the keys to the
various codes used in the print-out.

Print only asterisks denoting deviations and not values.  Answering Y to
this option shows only the asterisks on the listing.  No values are
printed.

Print only highlighted residues. Allows you to only print those residues
which have any asterisked parameters.

Min. deviation from ideal for parameter to be printed. Defines the minimum
deviation required in a parameter before it is printed.  The default value
is 0.0, meaning that all parameters are printed. If the value is set to,
say, 2.0, only those parameters that deviate by 2.0 or more standard
deviations from their ideal value will be shown.

Number of lines per page. Defines the number of lines to be printed before
a page is thrown. It tells the program when to start a new page and hence
when to print a new set of column headings. Normally, the standard is 66
lines per page, but your printer may print more or less per page, so amend
this figure accordingly.

List the bad contacts. Determines whether the bad contacts are to be listed
at the end of the print-out or not.


5.4. Colours
------------

The colour table allows you to modify any of the default colour definitions
and to set up new colours of your own (up to 20 colours are allowed). Each
entry contains 3 numbers, between 0.0 and 1.0, giving the ratios of red,
green and blue, respectively, making up the given colour. Each colour also
has a 'name', in single quotes, by which the colour is referred to in the
plot options above. The default colours are as shown below.

+---------------------------------------------------------------------------+
| Colours                                                                   |
| -------                                                                   |
| 0.0000 0.0000 0.0000 'BLACK'        <- Colour 1                           |
| 1.0000 1.0000 1.0000 'WHITE'        <- Colour 2                           |
| 1.0000 0.0000 0.0000 'RED'          <- Colour 3                           |
| 0.0000 1.0000 0.0000 'GREEN'        <- Colour 4                           |
| 0.0000 0.0000 1.0000 'BLUE'         <- Colour 5                           |
| 1.0000 1.0000 0.0000 'YELLOW'       <- Colour 6                           |
| 0.8000 0.5000 0.0000 'ORANGE'       <- Colour 7                           |
| 0.5000 1.0000 0.0000 'LIME GREEN'   <- Colour 8                           |
| 0.5000 0.0000 1.0000 'PURPLE'       <- Colour 9                           |
| 0.5000 1.0000 1.0000 'CYAN'         <- Colour 10                          |
| 1.0000 0.5000 1.0000 'PINK'         <- Colour 11                          |
| 0.3000 0.8000 1.0000 'SKY BLUE'     <- Colour 12                          |
| 1.0000 1.0000 0.7000 'CREAM'        <- Colour 13                          |
| 0.0000 1.0000 1.0000 'TURQUOISE'    <- Colour 14                          |
| 1.0000 0.0000 1.0000 'LILAC'        <- Colour 15                          |
| 0.8000 0.0000 0.0000 'BRICK RED'    <- Colour 16                          |
| 0.5000 0.0000 0.0000 'BROWN'        <- Colour 17                          |
| 0.9700 0.9700 0.9700 'LIGHT GREY'   <- Colour 18                          |
| 1.0000 1.0000 1.0000 'WHITE'        <- Colour 19                          |
| 1.0000 1.0000 1.0000 'WHITE'        <- Colour 20                          |
+---------------------------------------------------------------------------+


5.5. G-factors
--------------

+---------------------------------------------------------------------------+
| G-factors                                                                 |
| ---------                                                                 |
| Y   <- Use Engh & Huber means for bond length/angle G-factors(Y/N)?       |
+---------------------------------------------------------------------------+

The final option determines whether the Engh & Huber mean values are to be
used for the main-chain bond lengths and bond angles when calculating the
G-factors. The alternative is to use the structure's own mean values.


Acknowledgements
----------------

We would like to thank Eleanor Dodson, Andrej Sali, Keith Wilson, Victor
Lamzin, Mike Sutcliffe, George Sheldrick, Paula Kuser, Helen Stirk, V Dhanaraj
and Geoff Barton for immensely helpful comments, criticisms and suggestions.



---------------------------------------------------------------------------


APPENDIX A  -  Stereochemical parameters

The two tables in this appendix list the stereochemical parameters used in the
PROCHECK programs.


   TABLE A.1. Stereochemical parameters of Morris et al. (1992), derived
		 from high-resolution protein structures.
---------------------------------------------------------------------------
                               |                    |
Stereochemical parameter       |     Mean value     | Standard deviation
---------------------------------------------------------------------------
                               |                    |                  
phi-psi in most favoured       |                    |                  
regions of Ramachandran plot   |        >90%        |                  
                               |                    |                  
chi1 dihedral angle:           |                    |                  
   gauche minus                |     64.1 degrees   |   15.7 degrees
   trans                       |    183.6 degrees   |   16.8 degrees
   gauce plus                  |    -66.7 degrees   |   15.0 degrees
                               |                    |                  
chi2 dihedral angle            |    177.4 degrees   |   18.5 degrees
                               |                    |                  
Proline phi torsion angle      |    -65.4 degrees   |   11.2 degrees
                               |                    |                  
Helix phi torsion angle        |    -65.3 degrees   |   11.9 degrees
Helix psi torsion angle        |    -39.4 degrees   |   11.3 degrees
                               |                    |                  
chi3 (S-S bridge):             |                    |                  
   right-handed                |    96.8 degrees    |   14.8 degrees
   left-handed                 |   -85.8 degrees    |   10.7 degrees
                               |                    |                  
Disulphide bond separation     |      2.0A          |    0.1A
                               |                    |                  
omega dihedral angle           |   180.0 degrees    |    5.8 degrees
                               |                    |                  
Main-chain hydrogen            |                    |                  
bond energy (kcal/mol)*        |     -2.03          |    0.75            
                               |                    |                  
Calpha chirality: zeta         |                    |                  
``virtual'' torsion angle      | 33.9 degrees       |    3.5 degrees
(Calpha-N-C-Cbeta              |                    |                  
---------------------------------------------------------------------------
* Evaluated using the Kabsch & Sander (1983) method.



  TABLE A.2. Main-chain bond lengths and bond angles, and their standard
   deviations, as observed in small molecules (Engh & Huber, 1991). The
   atom-labelling follows that used in the x-plor dictionary, with some
       additional atom types (marked with an asterisk) as defined by
			   Engh & Huber (1991).
---------------------------------------------------------------------------
			      a. Bond lengths
---------------------------------------------------------------------------
Bond                    | X-PLOR labelling}                 | Value | sigma 
---------------------------------------------------------------------------
C-N                     | C-NH1          | (except Pro)     | 1.329 | 0.014
                        | C-N            | (Pro)            | 1.341 | 0.016
                        |                |                  |       |      
C-O                     | C-O            |                  | 1.231 | 0.020
                        |                |                  |       |      
Calpha-C                | CH1E-C         | (except Gly)     | 1.525 | 0.021
                        | CH2G*-C        | (Gly)            | 1.516 | 0.018
                        |                |                  |       |      
Calpha-Cbeta            | CH1E-CH3E      | (Ala)            | 1.521 | 0.033
                        | CH1E-CH1E      | (Ile,Thr,Val)    | 1.540 | 0.027
                        | CH1E-CH2E      | (the rest)       | 1.530 | 0.020
                        |                |                  |       |      
N-Calpha                | NH1-CH1E       | (except Gly,Pro) | 1.458 | 0.019
                        | NH1-CH2G*      | (Gly)            | 1.451 | 0.016
                        | N-CH1E         | (Pro)            | 1.466 | 0.015
---------------------------------------------------------------------------

---------------------------------------------------------------------------
			      b. Bond angles
---------------------------------------------------------------------------
Angle                   | X-PLOR labelling                  | Value | sigma 
---------------------------------------------------------------------------
C-N-Calpha              | C-NH1-CH1E     | (except Gly,Pro) | 121.7 | 1.8
                        | C-NH1-CH2G*    | (Gly)            | 120.6 | 1.7
                        | C-N-CH1E       | (Pro)            | 122.6 | 5.0
                        |                |                  |       |    
Calpha-C-N              | CH1E-C-NH1     | (except Gly,Pro) | 116.2 | 2.0
                        | CH2G*-C-NH1    | (Gly)            | 116.4 | 2.1
                        | CH1E-C-N       | (Pro)            | 116.9 | 1.5
                        |                |                  |       |    
Calpha-C-O              | CH1E-C-O       | (except Gly)     | 120.8 | 1.7
                        | CH2G*-C-O      | (Gly)            | 120.8 | 2.1
                        |                |                  |       |    
Cbeta-Calpha-C          | CH3E-CH1E-C    | (Ala)            | 110.5 | 1.5
                        | CH1E-CH1E-C    | (Ile,Thr,Val)    | 109.1 | 2.2
                        | CH2E-CH1E-C    | (the rest)       | 110.1 | 1.9
                        |                |                  |       |    
N-Calpha-C              | NH1-CH1E-C     | (except Gly,Pro) | 111.2 | 2.8
                        | NH1-CH2G*-C    | (Gly)            | 112.5 | 2.9
                        | N-CH1E-C       | (Pro)            | 111.8 | 2.5
                        |                |                  |       |    
N-Calpha-Cbeta          | NH1-CH1E-CH3E  | (Ala)            | 110.4 | 1.5
                        | NH1-CH1E-CH1E  | (Ile,Thr,Val)    | 111.5 | 1.7
                        | N-CH1E-CH2E    | (Pro)            | 103.0 | 1.1
                        | NH1-CH1E-CH2E  | (the rest)       | 110.5 | 1.7
                        |                |                  |       |    
O-C-N                   | O-C-NH1        | (except Pro)     | 123.0 | 1.6
                        | O-C-N          | (Pro)            | 122.0 | 1.4
---------------------------------------------------------------------------



APPENDIX B  -  Brookhaven file format

The table below shows the Brookhaven file format for the coordinate records
(ie ATOM and HETATM) of your .pdb file. Each record holds the coordinates
and other details of a single atom.

---------------------------------------------------------------------------
Field |    Column    | FORTRAN |                                         
  No. |     range    | format  | Description                                   
---------------------------------------------------------------------------
   1. |    1 -  6    |   A6    | Record ID (eg ATOM, HETATM)       
   2. |    7 - 11    |   I5    | Atom serial number                            
   -  |   12 - 12    |   1X    | Blank                                         
   3. |   13 - 16    |   A4    | Atom name (eg " CA " , " ND1")   
   4. |   17 - 17    |   A1    | Alternative location code (if any)            
   5. |   18 - 20    |   A3    | Standard 3-letter amino acid code for residue 
   -  |   21 - 21    |   1X    | Blank                                         
   6. |   22 - 22    |   A1    | Chain identifier code                         
   7. |   23 - 26    |   I4    | Residue sequence number                       
   8. |   27 - 27    |   A1    | Insertion code (if any)                       
   -  |   28 - 30    |   3X    | Blank                                         
   9. |   31 - 38    |  F8.3   | Atom's x-coordinate                         
  10. |   39 - 46    |  F8.3   | Atom's y-coordinate                         
  11. |   47 - 54    |  F8.3   | Atom's z-coordinate                         
  12. |   55 - 60    |  F6.2   | Occupancy value for atom                      
  13. |   61 - 66    |  F6.2   | B-value (thermal factor)                    
   -  |   67 - 67    |   1X    | Blank                                         
  14. |   68 - 68    |   I3    | Footnote number                               
---------------------------------------------------------------------------



APPENDIX C  -  Program descriptions

This appendix describes the 7 programs making up the PROCHECK suite.


Program 1. CLEAN.F
------------------
Written by: D K Smith & E G Hutchinson. Modified: R A Laskowski}

The first of the programs is called CLEAN.  Its purpose is to produce a
cleaned-up version of your coordinates file.  The new file is given a .new
extension and is used by the other programs in the suite for the
calculation of the stereochemical parameters.

The cleaning-up process involves a number of checks. The first one ensures
that the atoms in your structure have been correctly labelled in accordance
with the IUPAC naming conventions (IUPAC-IUB Commission on Biochemical
Nomenclature, 1970).

A typical error is that the Neta1 and Neta2 atoms of arginine are labelled
the wrong way round. This error is corrected by the program. Also corrected
are atom labels for Phe, Tyr, Asp and Glu residues. Thus, for example, the
carbon atom in tyrosine labelled Cdelta1 is the one that gives the lowest
chi2 torsion angle.

The program switches atom labels when they are in error and shows the old
atom names in a column at the right of the .new file.

The program also checks that the correct L/D stereochemical assignments
have been made on individual residues, and corrects any errors found.

All other errors are merely flagged in the .new file and described in the
clean.log file. Also listed in clean.log are: chain breaks, unknown
residues, and missing atoms. Chain breaks are defined as locations where
the distance between two adjacent Calpha atoms is greater than
5.0A. Residue atoms on one side of such a break are marked with a `>' in
the .new file while the atoms of the residue on the other side are marked
with a `<'.

Atoms with alternate locations have only the location with the highest
occupancy retained in the .new file. The alternate positions are written to
the .new file, but are flagged: ATOM records are marked as ATALT and HETATM
records as HEALT. Atoms with zero occupancy are flagged as ATZERO and
HEZERO. None of the flagged atoms are included in the stereochemical
checks.

Finally, residues with unusual values of the "notional" zeta dihedral angle
(defined by the four atoms: Calpha-N-C-Cbeta) are marked with an asterisk
in the .new file. The value of this angle should be > 23 degrees and < 45
degrees.


Program 2. SECSTR.F
-------------------
Written by: D K Smith. Modified: R A Laskowski

The second program in the suite is SECSTR which calculates all the required
torsion angles, main-chain hydrogen-bond energies, and secondary structure
assignments. This information is written out to the .rin file for use by
the plotting programs.

The hydrogen-bond energies and the secondary structure assignments are
calculated using the method of Kabsch & Sander (1983).


Program 3. NB.C
---------------
Written by: D T Jones & D Naylor

The third program in the suite is called NB. It identifies all non-bonded
interactions between different pairs of residues in the protein
structure. These are defined where the closest atom-atom contact between
two residues is less than 2.0A and the atoms concerned are 4 or more bonds
apart.

All such closest contacts are written out to the .nb file for use by the
plotting programs. Note that, only intra-chain contacts are stored -
inter-chain contacts are not considered.


Program 4. ANGLEN.F
-------------------
Written by: R A Laskowski

The fourth program in the suite is called ANGLEN. It calculates all the
main-chain bond lengths and bond angles in the structure. These are written
out to the .lan file. Also calculated are the RMS distances from a best-fit
plane for all planar sidechain groups. These comprise the aromatic rings of
Phe, Tyr, Trp and His, and the end-groups of Arg, Asn, Asp, Gln and
Glu. The coordinates of the atoms in the planar groups are written out to
the .pln file.


Programs 5--7. TPLOT.F, PPLOT.F and BPLOT.F
-------------------------------------------
Written by: R A Laskowski & M W MacArthur

The last 3 programs in the suite are the plotting programs which create the
plots and the residue-by-residue listing for your structure. The plots and
listing have a number of user-definable options which are held in the
parameter file procheck.prm and which can be changed simply by editing the
file using any text editor (see section 5 in the main body of these
Operating Instructions). The plots are described in Appendix D, and the
print-out is described in Appendix E.


---------------------------------------------------------------------------



APPENDIX D  -  Plot descriptions

This appendix describes the 10 plots produced by {\sc procheck}.


Plot 1. Ramachandran plot
-------------------------

The first of the plots is a Ramachandran plot of the phi-psi torsion angles
for all residues in the structure, except those at the chain termini.
Glycine residues are separately identified by triangles as these are not
restricted to any particular region of the plot.

The shading represents the different regions of the plot: the darker the
area the more favourable the phi-psi combination.  The regions are labelled
as follows:

     A   - Core alpha            L   - Core left-handed alpha   
     a   - Allowed alpha         l   - Allowed left-handed alpha
    ~a   - Generous alpha       ~l   - Generous left-handed alpha
     B   - Core beta             p   - Allowed epsilon          
     b   - Allowed beta         ~p   - Generous epsilon         
    ~b   - Generous beta

The lightest of the regions is the "disallowed" region, and any residues
found in this region may need careful investigation.

The different regions are those described in Morris et al. (1992) and were
taken from the observed phi-psi distribution for 121,870 residues from 463
known protein structures. The two most favoured regions are the "core" and
"allowed" regions which correspond to 10 degrees x 10 degrees pixels having
more than 100 and 8 residues in them, respectively. The "generous" regions
were defined by Morris et al. (1992) by extending out by 20 degrees (two
pixels) all round the "allowed" regions. In fact, the authors found very
few residues in these "generous" regions, so they can probably be treated
much like the "disallowed" region and any residues in them investigated
more closely.

Ideally, one would hope to have over 90% of the residues in the "core"
regions. The percentage of residues in the "core" regions is one of the
better guides to stereochemical quality.

The appearance of the plot itself can be modified to some extent by
amending some of the parameters in the file procheck.prm (see section 5.2.1
of the main part of these Operating Instructions). For example, you can
choose in which regions residues are labelled, whether the plot's regions
are shaded and lettered, and whether only the "core" regions are
highlighted.


Plot 2. Gly & Pro Ramachandran plot
-----------------------------------

The second plot gives separate phi-psi plots for the Gly and Pro residues
only. Both these residue types have very different favoured- and
unfavoured-regions from the other residues. The darker the shaded area on
the plots, the more favourable the region. The data on which the shading is
based has come from a data set of 163 non-homologous, high-resolution
protein chains chosen from structures solved by X-ray crystallography to a
resolution of 2.0A or better and an R-factor no greater than 20%.

It is also possible to show all 20 individual Ramachandran plots, one for
each residue type, by amending the appropriate parameter in the
procheck.prm file (see section 5.2.2 in the main section of these Operating
Instructions).


Plot 3. Plot of chi1 vs chi2
----------------------------

The third of the plots shows the chi1-chi2 plots for all residue types that
have both these torsion angles. The shading on each plot indicates how
favourable each region on the plot is. The data for the shading has come
from a data set of high-resolution crystal structures (see Plot 2 above).


Plot 4. Main-chain properties
-----------------------------

The six graphs on the main-chain properties plot show how the structure
(represented by the solid square) compares with well-refined structures at
a similar resolution. The dark band in each graph represents the results
from the well-refined structures; the central line is a least-squares fit
to the mean trend as a function of resolution, while the width of the band
on either side of it corresponds to a variation of one standard deviation
about the mean. In some cases, the trend is dependent on the resolution,
and in other cases it is not.

The 6 properties plotted are:

a. Ramachandran plot quality. This property is measured by the
percentage of the protein's residues that are in the most favoured, or
"core", regions of the Ramachandran plot (Plot 1 above). For a good model
structure, obtained at high resolution, one would expect this percentage to
be over 90%. However, as the resolution gets poorer, so this figure
decreases - as might be expected. The shaded region reflects this expected
decrease with worsening resolution.

b. Peptide bond planarity. This property is measured by calculating the
standard deviation of the protein structure's omega torsion angles. The
smaller the value the tighter the clustering around the ideal of 180
degrees (which represents a perfectly planar peptide bond).

c. Bad non-bonded interactions. This property is measured by the number of
bad contacts per 100 residues. Bad contacts are selected from the list of
non-bonded interactions found by program NB (see Appendix C).  They are
defined as contacts where the distance of closest approach is <= 2.6A.

d. Calpha tetrahedral distortion. This property is measured by calculating
the standard deviation of the "zeta torsion angle".  This is a notional
torsion angle in that it is not defined about any actual bond in the
structure. Rather, it is defined by the following four atoms within a given
residue: Calpha, N, C, and Cbeta.

e. Main-chain hydrogen bond energy. This property is measured by the
standard deviation of the hydrogen bond energies for main-chain hydrogen
bonds. The energies are calculated using the method of Kabsch & Sander
(1983).

f. Overall G-factor. The overall G-factor is a measure of the overall
`normality' of the structure (see Plot 6 below for a description of the
various G-factors). The overall G is obtained from an average of all the
different G-factors for each residue in the structure.


Plot 5. Side-chain properties
-----------------------------

These five graphs show five different side-chain properties. Like the
graphs in Plot 4, dealing with main-chain properties, they also show how
the structure (represented by the solid square) compares with well-refined
structures at a similar resolution. Again, the dark band in each graph
represents the results from the well-refined structures, giving one
standard deviation about a mean trend.

The 5 properties plotted are:

   a. Standard deviation of the chi1 gauche minus torsion angles.

   b. Standard deviation of the chi1 trans torsion angles.

   c. Standard deviation of the chi1 gauche plus torsion angles.

   d. Pooled standard deviation of all chi1 torsion angles.

   e. Standard deviation of the chi2 trans torsion angles.


Plot 6. Residue properties
--------------------------

The three main histographs
--------------------------
The various graphs and diagrams on this plot illustrate different
properties of the residues in the structure. The three main histographs at
the top, can be selected from 11 possibles by the user. The three default
graphs, which are plotted when you first run PROCHECK, are the first three
of the full list of:

   1. Absolute deviation of chi1 torsion angle from the "ideal".

   2. Absolute deviation of omega torsion angle from the "ideal".

   3. Absolute deviation of zeta "virtual" torsion angle (defined by the
atoms Calpha-N-C-Cbeta) from the "ideal".

   4. Absolute deviation of main-chain hydrogen bond energy from the
"ideal".

   5. B-value of gamma atom (O, C, or S - being whichever is used in
definition of the chi1 torsion angle).

   6. Average B-value of main-chain atoms.

   7. Average B-value of side-chain atoms.

   8. G-factors for phi-psi distributions.

   9. G-factors for chi1-chi2 distributions.

   10. Overall residue-by-residue G-factor.

   11. Approximate accessibility, as estimated by each residue's Ooi
number.

Which three of these 11 are plotted can be amended by editing the parameter
file procheck.prm (see section 5.2.6 in the main body of these Operating
Instructions).

Secondary structure & estimated accessibility
---------------------------------------------
Below the three main graphs is a schematic picture of the protein's
secondary structure, as defined using the Kabsch & Sander (1983)
assignments. The key just below the picture shows which structure is
which. Beta strands are taken to include all residues with a Kabsch &
Sander assignment of E, helices corresponds to both H and G assignments,
while everything else is taken to be random coil.

The shading behind the schematic picture gives an approximation to the
residue accessibilities. The approximation is a fairly crude one, being
based on each residue's Ooi number (Nishikawa & Ooi, 1986).  An Ooi number
is a count of the number of other Calpha atoms within a radius of, in this
case, 14A of the given residue's own Calpha. Although crude, this does give
a good impression of which parts of the structure are buried and which are
exposed on the surface. Future versions of PROCHECK will include an
accurate calculation of residue accessibility.

Sequence & Ramachandran regions
-------------------------------
The next section shows the sequence of the structure (using the 20 standard
amino-acid codes) and a set of markers that identify the region of the
Ramachandran plot in which each residue is located in Plot 1 above. There
are four marker types, one for each of the four different types of region,
and the key explains which is which.

Max. deviation
--------------
The small histogram of asterisks and plus-signs shows each residue's
"maximum deviation" from one of the ideal values. The asterisk scores are
the same as those on the .out listing, and in fact correspond to the final
column of that listing. Refer to the .out file to see which is the
parameter that deviates by the amount shown here. (See also Part 1 of
Appendix E).

G-factors
---------
The final part of the plot shows a shaded chequer-board of the G-factors
for various residue properties (the PROCHECKer board). The darker the
shading the more unusual the value of that property. Where several of a
residue's properties are unusual, the overall G-factor for that residue
will reflect this and identify residues that may need closer scrutiny.

Each G-factor is a measure of the 'normality' of a particular property.
For the dihedral angle G-factors, G(dih), the standard distribution of each
property, for each residue type, has been obtained from a non-homologous,
high-resolution data set. For the main-chain bond lengths and bond angles,
the Engh & Huber (1991) small-molecule means and standard deviations are
used.


Plot 7. Main-chain bond length distributions
--------------------------------------------
The histograms on this plot show the distributions of each of the different
main-chain bond lengths in the structure. The solid line corresponds to the
small-molecule mean value, while the dashed lines either side show the
small-molecule standard deviation, the data coming from Engh & Huber
(1991). Highlighted bars correspond to values more than 2.0 standard
deviations from the mean, though the value of 2.0 can be changed by editing
the procheck.prm file (see section 5.2.6 of the main part of these
Operating Instructions).

If any of the histogram bars lie off the graph, to the left or to the
right, a large arrow indicates the number of these outliers.


Plot 8. Main-chain bond angle distributions
-------------------------------------------
Like Plot 7, but for the main-chain bond angles.


Plot 9. RMS distances from planarity
------------------------------------
These histograms show the RMS distances from planarity for the different
planar groups in the structure. The dashed lines indicate different ideal
values for aromatic rings (Phe, Tyr, Trp, His) and for planar end-groups
(Arg, Asn, Asp, Gln, Glu). The default values are 0.03A and 0.02A,
respectively, but these values can be altered by editing the procheck.prm
file (see section 5.2.9 of the main part of these Operating Instructions).

Histogram bars beyond the dashed lines are shown as highlighted.


Plot 10. Distorted geometry plots
---------------------------------
The final set of plots shows all distorted main-chain bond lengths,
main-chain bond angles, and planar groups.

The parameters defining the degree of distortion for plotting here are
given in the procheck.prm file which can be edited as described in
section 5.2.10 of the main part of these Operating Instructions.

For each main-chain bond length and angle plotted, the plot shows the
'ideal'  value (as defined by the Engh & Huber small-molecule data), the
actual value, and the difference between the two.

For each distorted planar groups, three orthogonal projections are plotted
and the value shown is the RMS distance of the atoms from the best-fit
plane.


---------------------------------------------------------------------------



APPENDIX E  -  Residue-by-residue listing


Part 1. Residue information
---------------------------

The first part of the residue-by-residue listing (the {\bf .out} file)
deals with a number of stereochemical parameters, as will be described
below.

Explanatory notes
-----------------
The first page gives some explanatory notes about the stereochemical
parameters used. These notes include the "ideal" values, and corresponding
standard deviations, against which the values calculated for your structure
are compared.  The "ideals" used here come from an analysis of 118
high-resolution structures performed by Morris et al.  (1992), and are
listed in Table 1 of Appendix A.

Note that, the printing of this explanatory text can be suppressed, if
required, by amending the parameter file procheck.prm (see section 5.3 in
the main body of these Operating Instructions).

Residue-by-residue information
------------------------------
The explanatory text is followed by an analysis of each of the
stereochemical parameters for each residue in the structure. Each value is
highlighted by asterisks and plus-signs if it deviates from the "ideal" by
more than 1 standard deviation. An asterisk represents one standard
deviation, and a plus-sign represents half a standard deviation. So, a
highlight such as +*** indicates that the value of the parameter is between
3.5 and 4.0 standard deviations from the ideal. Where the deviation is more
than 4.5 standard deviations, its numerical value is shown instead: for
example, *5.5* represents 5.5 standard deviations.

The appearance of the listing can be altered to some extent by editing the
parameter file procheck.prm. This allows you to show, say, only the
asterisks and not have the values themselves printed. You can also include
only those values that are more than a given number of standard deviations
from the ideal. For more information, see section 5.3 in the main body of
these Operating Instructions.

The information shown for each residue is as follows:

1. Residue number - as given in the original coordinates file.

2. Chain identifier - where relevant, picked up from the original
coordinates file.

3. Sequential number - starting at 1 for the first residue and numbering
the residues sequentially from then on. This may differ from the residue
numbering given in the original coordinates file.

4. Kabsch & Sander secondary structure assignment - assignment of secondary
structure according to the method of Kabsch & Sander (1983). The codes used
are as follows:

       B - residue in isolated beta-bridge    S - bend                   
       E - extended strand, participates      T - hydrogen-bonded turn   
           in beta-ladder                     e - extension of beta-strand
       G - 3-helix (3/10 helix)               g - extension of 3/10 helix
       H - 4-helix (alpha-helix)              h - extension of alpha-helix
       I - 5-helix (pi-helix)             

The lower-case assignments are our extensions of the Kabsch & Sander
definition and are obtained by slightly relaxing their criteria.

5. Region of Ramachandran plot - a single letter code identifies which
region of the Ramachandran plot the residue is in. For end residues and
glycines this assignment does not apply, so is shown by a hyphen, `-'.  The
other codes are as follows:

             A - Core alpha              L - Core left-handed alpha  
             a - Allowed alpha           l - Allowed left-handed alpha
            ~a - Generous alpha         ~l - Generous left-handed alpha
             B - Core beta               p - Allowed epsilon         
             b - Allowed beta           ~p - Generous epsilon        
            ~b - Generous beta          XX - Outside major areas     
                                            ie disallowed

6. Chi-1 dihedral angle - three separate columns are given for the three
possible conformations of chi1: gauche minus, trans, and gauche plus.

7. Chi-2 dihedral angle - only the values for the chi2 dihedral angles in
the trans conformation are shown.

8. Proline phi - the phi torsion angle for proline residues only.

9. Phi helix - the phi torsion angle for all residues identified as being
in an alpha-helix by the H of the Kabsch & Sander secondary structure
assignment code.

10. Helix psi - as above, but for the psi torsion angle.

11. Chi-3 dihedral angle - being the torsion angle defined by the S-S
bridge in a disulphide bond, with separate columns for the right- and
left-handed conformations.

12. Disulph bond - sulphur-sulphur distance, in A, between paired cysteine
residues.

13. H-bond en. - estimated strength of the main-chain hydrogen bond (in
kcal/mol), where applicable, calculated using the method of Kabsch & Sander
(1983).

14. Chirality C-alpha - value of the zeta "virtual" torsion angle, defined
by the atoms Calpha, N, C, and Cbeta. This is a "virtual" torsion angle as
it is not defined along an actual bond.

15. Bad contacts - number of bad contacts for this residue, as defined by
non-bonded atoms at a distance of <= 2.6A. The bad contacts are listed at
the end of the print-out (see Part 3 below).

16. Max dev - this shows the maximum deviation (in terms of asterisks, etc)
of all the columns in the current row.

At the end of this print-out, the column totals show the maximum deviation
in each column, the column's mean value and standard deviation, and number
of values it contains. If the mean values themselves deviate significantly
from the "ideals", they too are highlighted by asterisks.


Part 2. Main-chain bond lengths and bond angles
-----------------------------------------------

The second part of the listing analyses the main-chain bond lengths and
bond angles of your protein structure. As before, any deviations in the
actual bond-lengths and bond angles from the "ideal" values are highlighted
with asterisks and plus signs.

At the end of this print-out, the different bond lengths and bond angles
are summarised in two tables giving the minimum, maximum, and mean values
of each type, together with their standard deviations.

The "ideal" values used are given at the head of the listing (though the
printing of these can be suppressed by amending the parameter file
procheck.prm). The ideals are as determined from the analysis of
small-molecule data by Engh & Huber (1991) and are shown in Table 2 of
Appendix A.


Part 3. Bad contacts listing
----------------------------

The bad contacts listing shows the atom-pairs involved, the type of
contact, and the separation between the two atoms. As already mentioned,
bad contacts are defined here as any pair of non-bonded atoms that are at a
distance of <=2.6A from one another.


Part 4. Summary statistics and quality assessment
-------------------------------------------------

The final part of the print-out reproduces the statistics printed on Plots
1, 4 and 5 (Appendix D). It also gives an overall assessment of the
structure's quality using the Morris et al. (1992) stereochemical
classification scheme. Here a number from 1 to 4 is assigned to the
structure for each of three separate stereochemical parameters (1 being the
best and 4 the worst score). Finally, it prints an analysis of the various
overall G-factors calculated for the structure. Any G-factors below -1.0
may indicate properties that need to be investigated more closely.



---------------------------------------------------------------------------


REFERENCES


Adobe Systems Inc. (1985). PostScript Language Reference Manual.
Addison-Wesley, Reading, MA.

Engh R A & Huber R (1991). Accurate bond and angle parameters for X-ray
protein structure refinement. Acta Cryst., A47, 392-400.

IUPAC-IUB Commission on Biochemical Nomenclature (1970). Abbreviations and
symbols for the description of the conformation of polypeptide chains. J.
Mol. Biol., 52, 1-17.

Kabsch W & Sander C (1983). Dictionary of protein secondary structure:
pattern recognition of hydrogen-bonded and geometrical features.
Biopolymers, 22, 2577-2637.

Laskowski R A, MacArthur M W, Moss D S & Thornton J M (1993). PROCHECK: a
program to check the stereochemical quality of protein structures. J. Appl.
Cryst., 26, 283-291.

Morris A L, MacArthur M W, Hutchinson E G & Thornton J M (1992).
Stereochemical quality of protein structure coordinates. Proteins, 12,
345-364.

Nishikawa K & Ooi T (1986). Radial locations of amino-acid residues in a
globular protein - correlation with the sequence. J. Biochem., 100,
1043-1047.