docs/userguide/t_parse.py
author rgbecker
Wed, 25 Oct 2000 08:57:46 +0000
changeset 494 54257447cfe9
parent 301 5ad57f31ae75
child 1122 fff6f306a99e
permissions -rw-r--r--
Changed to indirect copyright
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
494
54257447cfe9 Changed to indirect copyright
rgbecker
parents: 301
diff changeset
     1
#copyright ReportLab Inc. 2000
54257447cfe9 Changed to indirect copyright
rgbecker
parents: 301
diff changeset
     2
#see license.txt for license details
54257447cfe9 Changed to indirect copyright
rgbecker
parents: 301
diff changeset
     3
#history http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/docs/userguide/t_parse.py?cvsroot=reportlab
54257447cfe9 Changed to indirect copyright
rgbecker
parents: 301
diff changeset
     4
#$Header: /tmp/reportlab/docs/userguide/Attic/t_parse.py,v 1.2 2000/10/25 08:57:45 rgbecker Exp $
301
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
     5
"""
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
     6
Template parsing module inspired by REXX (with thanks to Donn Cave for discussion).
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
     7
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
     8
Template initialization has the form:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
     9
   T = Template(template_string, wild_card_marker, single_char_marker,
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    10
             x = regex_x, y = regex_y, ...)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    11
Parsing has the form
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    12
   ([match1, match2, ..., matchn], lastindex) = T.PARSE(string)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    13
   
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    14
Only the first argument is mandatory.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    15
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    16
The resultant object efficiently parses strings that match the template_string,
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    17
giving a list of substrings that correspond to each "directive" of the template.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    18
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    19
Template directives:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    20
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    21
  Wildcard:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    22
    The template may be initialized with a wildcard that matches any string
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    23
    up to the string matching the next directive (which may not be a wild
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    24
    card or single character marker) or the next literal sequence of characters 
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    25
    of the template.  The character that represents a wildcard is specified
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    26
    by the wild_card_marker parameter, which has no default.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    27
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    28
    For example, using X as the wildcard:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    29
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    30
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    31
    >>> T = Template("prefixXinteriorX", "X")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    32
    >>> T.PARSE("prefix this is before interior and this is after")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    33
    ([' this is before ', ' and this is after'], 47)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    34
    >>> T = Template("<X>X<X>", "X")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    35
    >>> T.PARSE('<A HREF="index.html">go to index</A>')
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    36
    (['A HREF="index.html"', 'go to index', '/A'], 36)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    37
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    38
    Obviously the character used to represent the wildcard must be distinct
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    39
    from the characters used to represent literals or other directives.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    40
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    41
  Fixed length character sequences:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    42
    The template may have a marker character which indicates a fixed
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    43
    length field.  All adjacent instances of this marker will be matched
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    44
    by a substring of the same length in the parsed string.  For example:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    45
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    46
      >>> T = Template("NNN-NN-NNNN", single_char_marker="N")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    47
      >>> T.PARSE("1-2-34-5-12")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    48
      (['1-2', '34', '5-12'], 11)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    49
      >>> T.PARSE("111-22-3333")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    50
      (['111', '22', '3333'], 11)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    51
      >>> T.PARSE("1111-22-3333")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    52
      ValueError: literal not found at (3, '-')
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    53
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    54
    A template may have multiple fixed length markers, which allows fixed
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    55
    length fields to be adjacent, but recognized separately.  For example:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    56
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    57
      >>> T = Template("MMDDYYX", "X", "MDY")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    58
      >>> T.PARSE("112489 Somebody's birthday!")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    59
      (['11', '24', '89', " Somebody's birthday!"], 27)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    60
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    61
  Regular expression markers:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    62
    The template may have markers associated with regular expressions.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    63
    the regular expressions may be either string represenations of compiled.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    64
    For example:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    65
      >>> T = Template("v: s i", v=id, s=str, i=int)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    66
      >>> T.PARSE("this_is_an_identifier: 'a string' 12344")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    67
      (['this_is_an_identifier', "'a string'", '12344'], 39)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    68
      >>> 
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    69
    Here id, str, and int are regular expression conveniences provided by
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    70
    this module.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    71
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    72
  Directive markers may be mixed and matched, except that wildcards cannot precede
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    73
  wildcards or single character markers.
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    74
  Example:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    75
>>> T = Template("ssnum: NNN-NN-NNNN, fn=X, ln=X, age=I, quote=Q", "X", "N", I=int, Q=str)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    76
>>> T.PARSE("ssnum: 123-45-6789, fn=Aaron, ln=Watters, age=13, quote='do be do be do'")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    77
(['123', '45', '6789', 'Aaron', 'Watters', '13', "'do be do be do'"], 72)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    78
>>> 
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    79
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    80
"""
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    81
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    82
import regex, string
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    83
from types import StringType
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    84
from string import find
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    85
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    86
#
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    87
# template parsing
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    88
#
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    89
# EG: T = Template("(NNN)NNN-NNNN X X", "X", "N")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    90
#     ([area, exch, ext, fn, ln], index) = T.PARSE("(908)949-2726 Aaron Watters")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    91
#      
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    92
class Template:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    93
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    94
   def __init__(self, 
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    95
                template,
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    96
                wild_card_marker=None,
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    97
                single_char_marker=None,
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    98
                **marker_to_regex_dict):
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
    99
       self.template = template
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   100
       self.wild_card = wild_card_marker
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   101
       self.char = single_char_marker
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   102
       # determine the set of markers for this template
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   103
       markers = marker_to_regex_dict.keys()
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   104
       if wild_card_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   105
          markers.append(wild_card_marker)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   106
       if single_char_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   107
          for ch in single_char_marker: # allow multiple scm's
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   108
              markers.append(ch)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   109
          self.char = single_char_primary = single_char_marker[0]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   110
       self.markers = markers
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   111
       for mark in markers:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   112
           if len(mark)>1:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   113
              raise ValueError, "Marks must be single characters: "+`mark`
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   114
       # compile the regular expressions if needed
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   115
       self.marker_dict = marker_dict = {}
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   116
       for (mark, rgex) in marker_to_regex_dict.items():
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   117
           if type(rgex) == StringType:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   118
              rgex = regex.compile(rgex)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   119
           marker_dict[mark] = rgex
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   120
       # determine the parse sequence
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   121
       parse_seq = []
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   122
       # dummy last char
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   123
       lastchar = None
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   124
       index = 0
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   125
       last = len(template)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   126
       # count the number of directives encountered
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   127
       ndirectives = 0
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   128
       while index<last:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   129
          start = index
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   130
          thischar = template[index]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   131
          # is it a wildcard?
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   132
          if thischar == wild_card_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   133
             if lastchar == wild_card_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   134
                raise ValueError, "two wild cards in sequence is not allowed"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   135
             parse_seq.append( (wild_card_marker, None) )
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   136
             index = index+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   137
             ndirectives = ndirectives+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   138
          # is it a sequence of single character markers?
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   139
          elif single_char_marker and thischar in single_char_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   140
             if lastchar == wild_card_marker:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   141
                raise ValueError, "wild card cannot precede single char marker"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   142
             while index<last and template[index] == thischar:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   143
                index = index+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   144
             parse_seq.append( (single_char_primary, index-start) )
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   145
             ndirectives = ndirectives+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   146
          # is it a literal sequence?
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   147
          elif not thischar in markers:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   148
             while index<last and not template[index] in markers:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   149
                index = index+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   150
             parse_seq.append( (None, template[start:index]) )
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   151
          # otherwise it must be a regex marker
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   152
          else:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   153
             rgex = marker_dict[thischar]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   154
             parse_seq.append( (thischar, rgex) )
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   155
             ndirectives = ndirectives+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   156
             index = index+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   157
          lastchar = template[index-1]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   158
       self.parse_seq = parse_seq
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   159
       self.ndirectives = ndirectives
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   160
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   161
   def PARSE(self, str, start=0):
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   162
       ndirectives = self.ndirectives
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   163
       wild_card = self.wild_card
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   164
       single_char = self.char
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   165
       parse_seq = self.parse_seq
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   166
       lparse_seq = len(parse_seq) - 1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   167
       # make a list long enough for substitutions for directives
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   168
       result = [None] * ndirectives
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   169
       current_directive_index = 0
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   170
       currentindex = start
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   171
       # scan through the parse sequence, recognizing
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   172
       for parse_index in xrange(lparse_seq + 1):
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   173
           (indicator, data) = parse_seq[parse_index]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   174
           # is it a literal indicator?
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   175
           if indicator is None:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   176
              if find(str, data, currentindex) != currentindex:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   177
                 raise ValueError, "literal not found at "+`(currentindex,data)`
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   178
              currentindex = currentindex + len(data)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   179
           else:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   180
              # anything else is a directive
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   181
              # is it a wildcard?
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   182
              if indicator == wild_card:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   183
                 # if it is the last directive then it matches the rest of the string
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   184
                 if parse_index == lparse_seq:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   185
                    last = len(str)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   186
                 # otherwise must look at next directive to find end of wildcard
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   187
                 else:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   188
                    # next directive must be regex or literal
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   189
                    (nextindicator, nextdata) = parse_seq[parse_index+1]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   190
                    if nextindicator is None:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   191
                       # search for literal
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   192
                       last = find(str, nextdata, currentindex)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   193
                       if last<currentindex:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   194
                          raise ValueError, \
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   195
                           "couldn't terminate wild with lit "+`currentindex`
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   196
                    else:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   197
                       # data is a regex, search for it
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   198
                       last = nextdata.search(str, currentindex)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   199
                       if last<currentindex:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   200
                          raise ValueError, \
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   201
                           "couldn't terminate wild with regex "+`currentindex`
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   202
              elif indicator == single_char:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   203
                 # data is length to eat
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   204
                 last = currentindex + data
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   205
              else:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   206
                 # other directives are always regular expressions
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   207
                 last = data.match(str, currentindex) + currentindex
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   208
                 if last<currentindex:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   209
                    raise ValueError, "couldn't match regex at "+`currentindex`
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   210
              #print "accepting", str[currentindex:last]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   211
              result[current_directive_index] = str[currentindex:last]
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   212
              current_directive_index = current_directive_index+1
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   213
              currentindex = last 
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   214
       # sanity check
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   215
       if current_directive_index != ndirectives:
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   216
          raise SystemError, "not enough directives found?"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   217
       return (result, currentindex)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   218
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   219
# some useful regular expressions
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   220
USERNAMEREGEX = \
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   221
  "["+string.letters+"]["+string.letters+string.digits+"_]*"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   222
STRINGLITREGEX = "'[^\n']*'"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   223
SIMPLEINTREGEX = "["+string.digits+"]+"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   224
id = regex.compile(USERNAMEREGEX)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   225
str = regex.compile(STRINGLITREGEX)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   226
int = regex.compile(SIMPLEINTREGEX)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   227
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   228
def test():
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   229
    global T, T1, T2, T3
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   230
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   231
    T = Template("(NNN)NNN-NNNN X X", "X", "N")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   232
    print T.PARSE("(908)949-2726 Aaron Watters")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   233
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   234
    T1 = Template("s --> s blah", s=str)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   235
    s = "' <-- a string --> ' --> 'blah blah another string blah' blah"
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   236
    print T1.PARSE(s)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   237
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   238
    T2 = Template("s --> NNNiX", "X", "N", s=str, i=int)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   239
    print T2.PARSE("'A STRING' --> 15964653alpha beta gamma")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   240
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   241
    T3 = Template("XsXi", "X", "N", s=str, i=int)
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   242
    print T3.PARSE("prefix'string'interior1234junk not parsed")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   243
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   244
    T4 = Template("MMDDYYX", "X", "MDY")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   245
    print T4.PARSE("122961 Somebody's birthday!")
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   246
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   247
5ad57f31ae75 added quickhack for font changes in paragraphs and lots of new text
aaron_watters
parents:
diff changeset
   248
if __name__=="__main__": test()