Regular Python IRC
[ rakaur on Fri Oct 01 at 08:12 PM // category: programming, technology ]
Definitely owning in IRPG:
- rakaur, the level 52 destroyer of worlds. Next level in 7 days, 15:00:56.
- sycobuny, the level 51 CockGrabber. Next level in 12 days, 23:52:30.
- rintaun, the level 50 ultimate lego warrior. Next level in 0 days, 01:03:37.
- madragoran, the level 50 gaidin. Next level in 10 days, 01:27:34.
In other news, I wrote a twelve-line IRC protocol parser in Python earlier. Actually, I wrote several. First I wrote one a la C, via string manipulation and concatenation, etc. Then I timed it parsing two million lines of IRC protocol (not realistic: it was the same two lines a million times). Then, I wrote one using a regular expressions. I immediately thought the RE one would be slower. However, after I wrote and looked at it I realized it was less than half the code to implement. When I timed it, it was only three seconds faster. So, I moved the re.compile() call to a global so it’s only compiled once per program run, and that shaved off about 30% of the time. Then, I fixed the argument vector counting and that shaved off about 10% of the time. All in all, the RE one was 60% faster than the string manipulation one (maybe because strings are immutable in Python and thus the interpreter has to build a new string for every operation).
I can tell you’re all quivering in anticipation, so here’s the code. I’ll paste the RE first (as of now; when my friend gets home I’ll ask him if there’s anything to optimize it as he knows far more RE than I do):
import re
# A regular expression to match and disect IRC protocol messages.
# This is actually 60.501% faster than not using RE.
pattern = r"""
^ # beginning of string
(:\w+)? # if we have an origin it starts with a ':'
\s? # space between words
(\w+) # there has to be a command
\s # space between words
(\W?\w+)? # if the command has a target it might be a #channel
\s? # space between words
(:.*) # anything after a ':' is one string
$ # end of string
"""
pattern = re.compile(pattern, re.VERBOSE)
And here’s the parser:
def parse(self):
"""Parse IRC protocol and call methods based on the results."""
global pattern
# Go through every line in the recvq.
for line in self.recvq:
parv = []
parc = 0
# Split this crap up with the help of RE.
origin, command, target, message = pattern.match(line).groups()
# Chop off the leading ':' on `origin` and `message`.
if origin:
origin = origin[1:]
message = message[1:]
# Make an IRC parameter argument vector.
if target:
parv.append(target)
parc += 1
parv.append(message)
parc += 1
… and that’s about all I have for you today.
-- rakaur // 2004.10.01 @ 08:12 PM
0 TrackBacks
Listed below are links to blogs that reference this entry.
TrackBack URL for this entry: http://mt.ericw.org/mt-tb.cgi/17
