tools/docco/t_parse.py
author rptlab
Tue, 30 Apr 2013 14:28:14 +0100
branchpy33
changeset 3723 99aa837b6703
parent 3721 0c93dd8ff567
child 3789 5bc95e1f3dd4
permissions -rw-r--r--
second stage of port to Python 3.3; working hello world
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
3617
ae5744e97c42 reportlab: copyright date changes
robin
parents: 3326
diff changeset
     1
#Copyright ReportLab Europe Ltd. 2000-2012
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     2
#see license.txt for license details
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     3
#history http://www.reportlab.co.uk/cgi-bin/viewcvs.cgi/public/reportlab/trunk/reportlab/tools/docco/t_parse.py
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     4
"""
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     5
Template parsing module inspired by REXX (with thanks to Donn Cave for discussion).
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     6
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     7
Template initialization has the form:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     8
   T = Template(template_string, wild_card_marker, single_char_marker,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
     9
             x = regex_x, y = regex_y, ...)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    10
Parsing has the form
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    11
   ([match1, match2, ..., matchn], lastindex) = T.PARSE(string)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    12
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    13
Only the first argument is mandatory.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    14
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    15
The resultant object efficiently parses strings that match the template_string,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    16
giving a list of substrings that correspond to each "directive" of the template.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    17
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    18
Template directives:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    19
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    20
  Wildcard:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    21
    The template may be initialized with a wildcard that matches any string
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    22
    up to the string matching the next directive (which may not be a wild
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    23
    card or single character marker) or the next literal sequence of characters
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    24
    of the template.  The character that represents a wildcard is specified
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    25
    by the wild_card_marker parameter, which has no default.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    26
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    27
    For example, using X as the wildcard:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    28
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    29
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    30
    >>> T = Template("prefixXinteriorX", "X")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    31
    >>> T.PARSE("prefix this is before interior and this is after")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    32
    ([' this is before ', ' and this is after'], 47)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    33
    >>> T = Template("<X>X<X>", "X")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    34
    >>> T.PARSE('<A HREF="index.html">go to index</A>')
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    35
    (['A HREF="index.html"', 'go to index', '/A'], 36)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    36
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    37
    Obviously the character used to represent the wildcard must be distinct
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    38
    from the characters used to represent literals or other directives.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    39
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    40
  Fixed length character sequences:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    41
    The template may have a marker character which indicates a fixed
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    42
    length field.  All adjacent instances of this marker will be matched
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    43
    by a substring of the same length in the parsed string.  For example:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    44
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    45
      >>> T = Template("NNN-NN-NNNN", single_char_marker="N")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    46
      >>> T.PARSE("1-2-34-5-12")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    47
      (['1-2', '34', '5-12'], 11)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    48
      >>> T.PARSE("111-22-3333")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    49
      (['111', '22', '3333'], 11)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    50
      >>> T.PARSE("1111-22-3333")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    51
      ValueError: literal not found at (3, '-')
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    52
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    53
    A template may have multiple fixed length markers, which allows fixed
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    54
    length fields to be adjacent, but recognized separately.  For example:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    55
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    56
      >>> T = Template("MMDDYYX", "X", "MDY")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    57
      >>> T.PARSE("112489 Somebody's birthday!")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    58
      (['11', '24', '89', " Somebody's birthday!"], 27)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    59
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    60
  Regular expression markers:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    61
    The template may have markers associated with regular expressions.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    62
    the regular expressions may be either string represenations of compiled.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    63
    For example:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    64
      >>> T = Template("v: s i", v=id, s=str, i=int)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    65
      >>> T.PARSE("this_is_an_identifier: 'a string' 12344")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    66
      (['this_is_an_identifier', "'a string'", '12344'], 39)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    67
      >>>
3023
47087db79506 Extra blank-line to fix doc string for cheesecake index.
damian
parents: 2963
diff changeset
    68
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    69
    Here id, str, and int are regular expression conveniences provided by
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    70
    this module.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    71
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    72
  Directive markers may be mixed and matched, except that wildcards cannot precede
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    73
  wildcards or single character markers.
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    74
  Example:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    75
>>> T = Template("ssnum: NNN-NN-NNNN, fn=X, ln=X, age=I, quote=Q", "X", "N", I=int, Q=str)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    76
>>> T.PARSE("ssnum: 123-45-6789, fn=Aaron, ln=Watters, age=13, quote='do be do be do'")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    77
(['123', '45', '6789', 'Aaron', 'Watters', '13', "'do be do be do'"], 72)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    78
>>>
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    79
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    80
"""
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    81
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    82
import re, string
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    83
from types import StringType
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    84
from string import find
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    85
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    86
#
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    87
# template parsing
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    88
#
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    89
# EG: T = Template("(NNN)NNN-NNNN X X", "X", "N")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    90
#     ([area, exch, ext, fn, ln], index) = T.PARSE("(908)949-2726 Aaron Watters")
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    91
#
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    92
class Template:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    93
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    94
   def __init__(self,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    95
                template,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    96
                wild_card_marker=None,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    97
                single_char_marker=None,
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    98
                **marker_to_regex_dict):
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
    99
       self.template = template
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   100
       self.wild_card = wild_card_marker
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   101
       self.char = single_char_marker
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   102
       # determine the set of markers for this template
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   103
       markers = list(marker_to_regex_dict.keys())
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   104
       if wild_card_marker:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   105
          markers.append(wild_card_marker)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   106
       if single_char_marker:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   107
          for ch in single_char_marker: # allow multiple scm's
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   108
              markers.append(ch)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   109
          self.char = single_char_primary = single_char_marker[0]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   110
       self.markers = markers
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   111
       for mark in markers:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   112
           if len(mark)>1:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   113
              raise ValueError("Marks must be single characters: "+repr(mark))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   114
       # compile the regular expressions if needed
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   115
       self.marker_dict = marker_dict = {}
3723
99aa837b6703 second stage of port to Python 3.3; working hello world
rptlab
parents: 3721
diff changeset
   116
       for mark, rgex in marker_to_regex_dict.items():
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   117
           if type(rgex) == StringType:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   118
              rgex = re.compile(rgex)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   119
           marker_dict[mark] = rgex
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   120
       # determine the parse sequence
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   121
       parse_seq = []
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   122
       # dummy last char
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   123
       lastchar = None
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   124
       index = 0
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   125
       last = len(template)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   126
       # count the number of directives encountered
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   127
       ndirectives = 0
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   128
       while index<last:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   129
          start = index
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   130
          thischar = template[index]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   131
          # is it a wildcard?
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   132
          if thischar == wild_card_marker:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   133
             if lastchar == wild_card_marker:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   134
                raise ValueError("two wild cards in sequence is not allowed")
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   135
             parse_seq.append( (wild_card_marker, None) )
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   136
             index = index+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   137
             ndirectives = ndirectives+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   138
          # is it a sequence of single character markers?
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   139
          elif single_char_marker and thischar in single_char_marker:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   140
             if lastchar == wild_card_marker:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   141
                raise ValueError("wild card cannot precede single char marker")
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   142
             while index<last and template[index] == thischar:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   143
                index = index+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   144
             parse_seq.append( (single_char_primary, index-start) )
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   145
             ndirectives = ndirectives+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   146
          # is it a literal sequence?
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   147
          elif not thischar in markers:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   148
             while index<last and not template[index] in markers:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   149
                index = index+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   150
             parse_seq.append( (None, template[start:index]) )
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   151
          # otherwise it must be a re marker
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   152
          else:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   153
             rgex = marker_dict[thischar]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   154
             parse_seq.append( (thischar, rgex) )
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   155
             ndirectives = ndirectives+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   156
             index = index+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   157
          lastchar = template[index-1]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   158
       self.parse_seq = parse_seq
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   159
       self.ndirectives = ndirectives
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   160
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   161
   def PARSE(self, str, start=0):
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   162
       ndirectives = self.ndirectives
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   163
       wild_card = self.wild_card
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   164
       single_char = self.char
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   165
       parse_seq = self.parse_seq
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   166
       lparse_seq = len(parse_seq) - 1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   167
       # make a list long enough for substitutions for directives
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   168
       result = [None] * ndirectives
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   169
       current_directive_index = 0
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   170
       currentindex = start
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   171
       # scan through the parse sequence, recognizing
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   172
       for parse_index in range(lparse_seq + 1):
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   173
           (indicator, data) = parse_seq[parse_index]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   174
           # is it a literal indicator?
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   175
           if indicator is None:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   176
              if find(str, data, currentindex) != currentindex:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   177
                 raise ValueError("literal not found at "+repr((currentindex,data)))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   178
              currentindex = currentindex + len(data)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   179
           else:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   180
              # anything else is a directive
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   181
              # is it a wildcard?
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   182
              if indicator == wild_card:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   183
                 # if it is the last directive then it matches the rest of the string
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   184
                 if parse_index == lparse_seq:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   185
                    last = len(str)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   186
                 # otherwise must look at next directive to find end of wildcard
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   187
                 else:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   188
                    # next directive must be re or literal
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   189
                    (nextindicator, nextdata) = parse_seq[parse_index+1]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   190
                    if nextindicator is None:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   191
                       # search for literal
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   192
                       last = find(str, nextdata, currentindex)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   193
                       if last<currentindex:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   194
                          raise ValueError("couldn't terminate wild with lit "+repr(currentindex))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   195
                    else:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   196
                       # data is a re, search for it
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   197
                       last = nextdata.search(str, currentindex)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   198
                       if last<currentindex:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   199
                          raise ValueError("couldn't terminate wild with re "+repr(currentindex))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   200
              elif indicator == single_char:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   201
                 # data is length to eat
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   202
                 last = currentindex + data
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   203
              else:
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   204
                 # other directives are always regular expressions
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   205
                 last = data.match(str, currentindex) + currentindex
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   206
                 if last<currentindex:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   207
                    raise ValueError("couldn't match re at "+repr(currentindex))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   208
              #print "accepting", str[currentindex:last]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   209
              result[current_directive_index] = str[currentindex:last]
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   210
              current_directive_index = current_directive_index+1
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   211
              currentindex = last
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   212
       # sanity check
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   213
       if current_directive_index != ndirectives:
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   214
          raise SystemError("not enough directives found?")
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   215
       return (result, currentindex)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   216
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   217
# some useful regular expressions
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   218
USERNAMEREGEX = \
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   219
  "["+string.letters+"]["+string.letters+string.digits+"_]*"
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   220
STRINGLITREGEX = "'[^\n']*'"
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   221
SIMPLEINTREGEX = "["+string.digits+"]+"
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   222
id = re.compile(USERNAMEREGEX)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   223
str = re.compile(STRINGLITREGEX)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   224
int = re.compile(SIMPLEINTREGEX)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   225
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   226
def test():
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   227
    global T, T1, T2, T3
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   228
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   229
    T = Template("(NNN)NNN-NNNN X X", "X", "N")
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   230
    print(T.PARSE("(908)949-2726 Aaron Watters"))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   231
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   232
    T1 = Template("s --> s blah", s=str)
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   233
    s = "' <-- a string --> ' --> 'blah blah another string blah' blah"
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   234
    print(T1.PARSE(s))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   235
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   236
    T2 = Template("s --> NNNiX", "X", "N", s=str, i=int)
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   237
    print(T2.PARSE("'A STRING' --> 15964653alpha beta gamma"))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   238
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   239
    T3 = Template("XsXi", "X", "N", s=str, i=int)
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   240
    print(T3.PARSE("prefix'string'interior1234junk not parsed"))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   241
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   242
    T4 = Template("MMDDYYX", "X", "MDY")
3721
0c93dd8ff567 initial changes from 2to3-3.3
rptlab
parents: 3617
diff changeset
   243
    print(T4.PARSE("122961 Somebody's birthday!"))
2963
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   244
c414c0ab69e7 reportlab-2.2: first stage of major re-org
rgbecker
parents:
diff changeset
   245
3023
47087db79506 Extra blank-line to fix doc string for cheesecake index.
damian
parents: 2963
diff changeset
   246
if __name__=="__main__": test()