Wednesday, April 16, 2008

XML escape

Friday afternoon is a prime time for blitz-tasks and a rich opportunity for your hacking one-liner 5K1LLz skills.

This Friday the finish-up task came from a colleague, who had to leave in half an hour to catch the plane. There's a big several-megabyte XML file and all characters in it had to be escaped, I presume for preparation to be sent over the wire (read http).

Problem is, a certain Windows editor (does it really qualify that definition?) hangs when opening files bigger than 1234KB, and writing a Java program would take a fairly long time compared to the alternatives. Not to mention that many programmers could write the Java program even less memory-efficiently than the joke of an editor that Notepad is (there, I said it). And Java is not very forgiving on memory problems.

But what are the alternatives? As the Perl manual page says, "The three principal virtues of a programmer are Laziness, Impatience, and Hubris." Being a lazy programmer, I tried to see if somebody else had already written a utility to do this (there was zero chance that there wasn't one), and if it was available. First I found this eclipse plugin. However, pasting megabytes into a text box didn't make me confident that it would work.

There was also the xmlstarlet package, which would have done a wonderful job, had it been installed on the old servers where the file could be easily transferred. But it wasn't, while it would take too long to copy it to my machine and back just to convert the file. It would also be hard to find an appropriate package for that old Linux version. No, that's not for impatient programmers.

The next thought I had was: why spend effort on trying to install a package when with Ruby I could do this in a one-liner. Of course, I have nothing against Python, but if there's one thing nobody would argue is that it doesn't fare well against Ruby when it comes to writing one-liners. Anyway, Ruby wasn't installed there either (note to self: this must be amended).

The clock was ticking. So Python it is, and instead of an obfuscated one-liner I convinced myself to write many short readable lines. I hadn't done serious XML processing in Python for a while, but a google search away the answer came to me. It was really insultingly simple, but given enough hubris one could turn even this meager piece of code into a rambling blog post:

#!/usr/bin/env python

from xml.sax.saxutils import escape
from sys import stdin

for line in stdin:
print escape(line)

1 comment:

vlb said...

Nicely done. I especially like the final sentence.