PAT, PATTERN
NAME
PAT, PATTERN - define sequence pattern.
SYNOPSIS
PAT = name_11 name_12 ... name_1m / ... / name_k1 name_k2 ... name_kn
PAT TOL number_of_allowed_errors
PATTERN = name_11 name_12 ... name_1m / ... / name_k1 name_k2 ... name_kn
PATTERN TOLERANCE number_of_allowed_errors
DESCRIPTION
The command PATTERN (garlic version 1.4 or later) may be used to define a
sequence pattern. It is more general than the command SEQUENCE. Each pattern
is a set of name lists, separated by slashes (/). In the following example,
a simple pattern of three residues is defined:
PAT = ASP ASN / PHE TYR TRP / ARG LYS
The first residue may be matched by either ASP or ASN, the second residue
may be matched by PHE, TYR or TRP, while the third residue may be matched
by ARG or LYS.
The macromolecular structure may be searched for the given pattern, using
the command SELECT PATTERN (short form: SEL PAT). The commands RESTRICT
and ADD may be also used to search for the fiven pattern. For the pattern
defined above, the following fragments will be selected:
ASP PHE ARG
ASP PHE LYS
ASP TYR ARG
ASP TYR LYS
ASP TRP ARG
ASP TRP LYS
ASN PHE ARG
ASN PHE LYS
ASN TYR ARG
ASN TYR LYS
ASN TRP ARG
ASN TRP LYS
WILDCARDS
The command PATTERN may be combined with wildcards. The character * (asterisk)
may be placed at any position in the pattern string, though it does not make
much sense at the first and at the last position. Example:
PAT = ARG LYS HIS / * / ARG LYS HIS / * / ARG LYS HIS
PATTERN TOLERANCE
TOLERANCE (short form: TOL) is the only keyword which may be combined with
the command PATTERN. It may be used to define the pattern tolerance, i.e. the
maximal number of allowed errors which may be tolerated while searching for
the specified pattern.
DELETIONS AND INSERTIONS
It is not possible to handle deletions and insertions directly. However,
a number of different patterns may be incorporated into a simple script,
which may be used to handle deletions and insertions. The example below is
a screen shot of one such script (loaded into vi editor). The script contains
three patterns of different length. Each pattern was written as a single
line but it was wrapped by the editor because of length. Note that many
consecutive wildcards are used in this example.
NOTES
(1) The maximal number of residue names in each list is 30 (more than enough
for 20 standard residue names). The maximal number of lists in a pattern
is 100. If this is not enough, change MAX_PATT_LENGTH in defines.h file.
RELATED COMMANDS
The command SEQUENCE, combined with the keyword = (equality sign), may be
used to define a fixed sequence pattern. Of course, only one residue at
each position may be specified with the command SEQUENCE. The commands
SELECT, ADD and RESTRICT may be used to search the macromolecular structure
for the specified pattern.