                           HETPARSE documentation



CONTENTS

   1.0 SUMMARY
   2.0 INPUTS & OUTPUTS
   3.0 INPUT FILE FORMAT
   4.0 OUTPUT FILE FORMAT
   5.0 DATA FILES
   6.0 USAGE
   7.0 KNOWN BUGS & WARNINGS
   8.0 NOTES
   9.0 DESCRIPTION
   10.0 ALGORITHM
   11.0 RELATED APPLICATIONS
   12.0 DIAGNOSTIC ERROR MESSAGES
   13.0 AUTHORS
   14.0 REFERENCES

1.0 SUMMARY

   Converts raw dictionary of heterogen groups EMBL-like format. Convert
   heterogen group dictionary to EMBL-like format

2.0 INPUTS & OUTPUTS

   HETPARSE parse the dictionary of heterogen groups available at
   http://pdb.rutgers.edu/het_dictionary.txt and writes a file containing
   the group names, synonyms and 3-letter codes in EMBL-like format.
   Optionally, HETPARSE will search a directory of PDB files and will
   count the number of files that each heterogen appears in. The path and
   extension of the PDB files and the names of the input and output files
   are user- specified (file extension is set in the ACD file).

3.0 INPUT FILE FORMAT

   An excerpt from the raw dictionary of heterogen group is shown (Figure
   1).

  Input files for usage example

  File: het.txt

RESIDUE   061     58
CONECT      N1     2 N2   C5
CONECT      N2     2 N1   N3
CONECT      N3     2 N2   N4
CONECT      N4     3 N3   C5   HN4
CONECT      C5     3 N1   N4   C6
CONECT      C6     3 C5   C7   C11
CONECT      C7     3 C6   C8   C12
CONECT      C8     3 C7   C9   H8
CONECT      C9     3 C8   C10  H9
CONECT      C10    3 C9   C11  H10
CONECT      C11    3 C6   C10  H11
CONECT      C12    3 C7   C13  C17
CONECT      C13    3 C12  C14  H13
CONECT      C14    3 C13  C15  H14
CONECT      C15    3 C14  C16  C18
CONECT      C16    3 C15  C17  H16
CONECT      C17    3 C12  C16  H17
CONECT      C18    4 C15  N19 1H18 2H18
CONECT      N19    3 C18  C20  C33
CONECT      C20    3 N19  C21  N25
CONECT      C21    4 C20  C22 1H21 2H21
CONECT      C22    4 C21  C23 1H22 2H22
CONECT      C23    4 C22  C24 1H23 2H23
CONECT      C24    4 C23 1H24 2H24 3H24
CONECT      N25    2 C20  C26
CONECT      C26    3 N25  C27  C32
CONECT      C27    3 C26  C28  H27
CONECT      C28    3 C27  C29  H28
CONECT      C29    3 C28  O30  C31
CONECT      O30    2 C29  HOU
CONECT      C31    3 C29  C32  H31
CONECT      C32    3 C26  C31  C33
CONECT      C33    3 N19  C32  O34
CONECT      O34    1 C33
CONECT      HN4    1 N4
CONECT      H8     1 C8
CONECT      H9     1 C9
CONECT      H10    1 C10
CONECT      H11    1 C11
CONECT      H13    1 C13
CONECT      H14    1 C14
CONECT      H16    1 C16
CONECT      H17    1 C17
CONECT     1H18    1 C18
CONECT     2H18    1 C18
CONECT     1H21    1 C21
CONECT     2H21    1 C21
CONECT     1H22    1 C22
CONECT     2H22    1 C22


  [Part of this file has been deleted for brevity]

CONECT     2H6     1 C6
CONECT     1H8     1 C8
CONECT     2H8     1 C8
CONECT     1H9     1 C9
CONECT     2H9     1 C9
END
HET    104             28
HETSYN     104 TRIENTINE
HETNAM     104 N,N'-BIS(2-AMINOETHYL)-1,2-ETHANEDIAMINE
FORMUL      104    C6 H18 N4

RESIDUE   105     32
CONECT      B      3 O1   O2   C3
CONECT      O1     2 B    H1
CONECT      O2     2 B    H2
CONECT      C3     4 B    N4  1H3  2H3
CONECT      N4     3 C3   C5   H4
CONECT      C5     3 N4   O6   C7
CONECT      O6     1 C5
CONECT      C7     3 C5   C8   C12
CONECT      N11    2 O10  C12
CONECT      O10    2 N11  C8
CONECT      C8     3 C7   O10  C9
CONECT      C12    3 C7   N11  C13
CONECT      C9     4 C8  1H9  2H9  3H9
CONECT      C13    3 C12  C14  C18
CONECT      C14    3 C13  C15 CL1
CONECT     CL1     1 C14
CONECT      C15    3 C14  C16  H15
CONECT      C16    3 C15  C17  H16
CONECT      C17    3 C16  C18  H17
CONECT      C18    3 C13  C17  H18
CONECT      H1     1 O1
CONECT      H2     1 O2
CONECT     1H3     1 C3
CONECT     2H3     1 C3
CONECT      H4     1 N4
CONECT     1H9     1 C9
CONECT     2H9     1 C9
CONECT     3H9     1 C9
CONECT      H15    1 C15
CONECT      H16    1 C16
CONECT      H17    1 C17
CONECT      H18    1 C18
END
HET    105             32
HETSYN     105 CLOXACILLIN DERIVATIVE
HETNAM     105 N-[5-METHYL-3-O-TOLYL-ISOXAZOLE-4-CARBOXYLIC ACID
HETNAM   2 105 AMIDE] BORONIC ACID
FORMUL      105    C12 H12 N2 O4 B1 CL1


4.0 OUTPUT FILE FORMAT

   The records used in the output file (Figure 2) are as follows:
     * ID - 3-character abbreviation of heterogen
     * DE - full description
     * SY - synonym
     * NN - number of files which this heterogen appears in

  Output files for usage example

  File: Ehet.dat

ID   105
DE   N-[5-METHYL-3-O-TOLYL-ISOXAZOLE-4-CARBOXYLIC ACIDAMIDE] BORONIC ACID
SY   CLOXACILLIN DERIVATIVE
NN   0
//
ID   104
DE   N,N'-BIS(2-AMINOETHYL)-1,2-ETHANEDIAMINE
SY   TRIENTINE
NN   0
//
ID   103
DE   2',5'-DIDEOXY-ADENOSINE 3'-MONOPHOSPHATE
SY   .
NN   0
//
ID   102
DE   GAMMA-DEOXY-GAMMA-SULFO-GUANOSINE-5'-TRIPHOSPHATE
SY   .
NN   0
//
ID   101
DE   2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
SY   .
NN   0
//
ID   100
DE   1-(5-CHLOROINDOL-3-YL)-3-HYDROXY-3-(2H-TETRAZOL-5-YL)-PROPENONE
SY   .
NN   0
//
ID   074
DE   [PROPYLAMINO-3-HYDROXY-BUTAN-1,4-DIONYL]-ISOLEUCYL-PROLINE
SY   CA-074;
SY   [N-(L-3-TRANS-PROPYLCARBAMOYL-OXIRANE-2-CARBONYL)-L-ISOLEUCYL-L-PROLINE]
NN   0
//
ID   072
DE
DE   (+/-)(2S,5S)-3-(4-(4-CARBOXYPHENYL)BUTYL)-2-HEPTYL-4-OXO-5-THIAZOLIDINE
SY   THIAZOLIDINONE; GW0072
NN   0
//
ID   061
DE
DE   2-BUTYL-6-HYDROXY-3-[2'-(1H-TETRAZOL-5-YL)-BIPHENYL-4-YLMETHYL]-3H-QUINAZOL
IN-4-ONE
SY   L-159,061
NN   0
//

5.0 DATA FILES

   HETPARSE does not use a data file.

6.0 USAGE

Convert heterogen group dictionary to EMBL-like format.
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-infile]            infile     This option specifies the name of input file
                                  (raw dictionary of heterogen groups) to
                                  parse, which should be of the format
                                  specified at
                                  http://pdb.rutgers.edu/het_dictionary.txt
   -dogrep             toggle     [N] This option specifies whether to search
                                  a directory of files (typically PDB files)
                                  with keywords. If set, HETPARSE will search
                                  the directory and will count the number of
                                  files that each heterogen appears in.
*  -dirlistpath        dirlist    [./] This option specifies the directory to
                                  search with keywords.
  [-outfile]           outfile    [Ehet.dat] This option specifies the name of
                                  EMBL-like format dictionary of heterogen
                                  groups.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-dirlistpath" associated qualifiers
   -extension          string     Default file extension

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


  6.1 COMMAND LINE ARGUMENTS

