Regular Python IRC

[ rakaur on Fri Oct 01 at 08:12 PM // category: programming, technology ]

Definitely owning in IRPG:

  1. rakaur, the level 52 destroyer of worlds. Next level in 7 days, 15:00:56.
  2. sycobuny, the level 51 CockGrabber. Next level in 12 days, 23:52:30.
  3. rintaun, the level 50 ultimate lego warrior. Next level in 0 days, 01:03:37.
  4. madragoran, the level 50 gaidin. Next level in 10 days, 01:27:34.

In other news, I wrote a twelve-line IRC protocol parser in Python earlier. Actually, I wrote several. First I wrote one a la C, via string manipulation and concatenation, etc. Then I timed it parsing two million lines of IRC protocol (not realistic: it was the same two lines a million times). Then, I wrote one using a regular expressions. I immediately thought the RE one would be slower. However, after I wrote and looked at it I realized it was less than half the code to implement. When I timed it, it was only three seconds faster. So, I moved the re.compile() call to a global so it’s only compiled once per program run, and that shaved off about 30% of the time. Then, I fixed the argument vector counting and that shaved off about 10% of the time. All in all, the RE one was 60% faster than the string manipulation one (maybe because strings are immutable in Python and thus the interpreter has to build a new string for every operation).

I can tell you’re all quivering in anticipation, so here’s the code. I’ll paste the RE first (as of now; when my friend gets home I’ll ask him if there’s anything to optimize it as he knows far more RE than I do):

import re

# A regular expression to match and disect IRC protocol messages.
# This is actually 60.501% faster than not using RE.
pattern = r"""
           ^             # beginning of string
           (:\w+)?       # if we have an origin it starts with a ':'
           \s?           # space between words
           (\w+)         # there has to be a command
           \s            # space between words
           (\W?\w+)?     # if the command has a target it might be a #channel
           \s?           # space between words
           (:.*)         # anything after a ':' is one string
           $             # end of string
           """

pattern = re.compile(pattern, re.VERBOSE)

And here’s the parser:

def parse(self):            
    """Parse IRC protocol and call methods based on the results."""

    global pattern

    # Go through every line in the recvq.
    for line in self.recvq:
        parv = []
        parc = 0

        # Split this crap up with the help of RE.
        origin, command, target, message = pattern.match(line).groups()

        # Chop off the leading ':' on `origin` and `message`.
        if origin:
            origin = origin[1:]

        message = message[1:]

        # Make an IRC parameter argument vector.
        if target:
            parv.append(target)
            parc += 1

        parv.append(message)
        parc += 1

… and that’s about all I have for you today.

-- rakaur // 2004.10.01 @ 08:12 PM


0 TrackBacks

Listed below are links to blogs that reference this entry.

TrackBack URL for this entry: http://mt.ericw.org/mt-tb.cgi/17


Leave a comment


Name:
Email:
URL:   
Remember personal info?