# [Twisted-Python] Re: IMAP fixes

Thu Jul 10 01:53:39 EDT 2003

```On Thu, 10 Jul 2003, "Tony Meyer" <ta-meyer at ihug.co.nz> wrote:

> Well, it started life as a python RE, so there was no translation.  I did
> did very carefully analyse (and test) it, and you're welcome to as well.
> It's easy enough to do without it, or split it into several smaller re's, but
> this seems the best way to do it.

Why?
It's much better to split into several REs. When you need to comment your
REs, they're obviously useless -- it is impossible to keep comments in sync
with code, so when bugs crop up in your RE, it will be difficult to fix.
Not to mention you can't step with PDB through REs, or insert prints, to
see what exactly is going wrong.

Here's a proof of concept (untested) factorization into several REs
each of which is easy enough to describe:

sectionRe = re.compile('([\d\.])*\.') # digits and dots, ends with dot
partRe = re.compile('([A-Z\.]+)') # upper case letters and dots
fielditemsRe = re.compile(' \([()])*\)') # open-paren, parens, close paren
partialRe = re.compile('<(\d+\.[1-9]\d*)>') # digits, dot, non-zero, digits
body = re.compile('BODY') # BODY
peek = re.compile('\.?PEEK') # optional dot, PEEK
bra = re.compile('\[') # literal [
cket = re.compile('\]') # literal ]

# The comments are not necessary -- my point is that looking at those
# REs is easy to understand what they capture

def trymatch(r, s):
m = r.match(s)
if m:
return m.group(1), s[m.end():]
else:
return None, s

def forcematch(r, s)
m = r.match(s)
if not m:
raise IllegalIdentiferError()
return s[m.end():]

# Note how the parsing is now done with code -- only the low-level
# tokenizing is done with REs. The code is relatively easy to read.
def parse(s):
s = forcematch(body, s)
peek, s = trymatch(peek, s)
s = forcematch(bra, s)
section, s = trymatch(sectionRe, s)
part, s = trymatch(partRe, s, 1)
fielditems, s = trymatch(fielditemsRe, s)
s = forcematch(cket, s, 1)
partial, s = trymatch(partialRe, s, 1)
# End when you write it like that, it is easy to fine
# false positives. For example, should we check s is empty?
return peek, section, part, fielditems, partial

--