Friday, April 18, 2008

Twitter: do you follow me?

This week's hacking task was to implement a "follow all" function for Twitter.

Even for Twitter users, this needs some explanation: the follow functionality now means "enable notifications". However, the command interface in IM/SMS wasn't changed, so the command name remains "follow". For brevity, I will use the word "follow" instead of "enable notifications".

The reason for having this command is that there used to be a function "follow all" in Twitter. It used to instantly turn on notifications for all your friends (users you're following in new terminology). Now there's a user, called "all" and the function doesn't work (ok, maybe that's not the real reason). This put an end to a very useful feature for users who rely often on the Twitter IM integration.

Having a quick look at the Twitter API it seemed pretty straightforward to fetch all users and enable notifications for all of them one by one. It would be fairly slow, but there was no information in the user list whether notifications are enabled for a user or not. This would have eliminated the need to send requests for users, for whom we already have notifications enabled. Ah well...

The first tool I reach in my toolbox is Ruby. I tried using JSON, but had to give up- I simply couldn't handle Unicode issues:

/usr/lib/ruby/1.8/json.rb:288:in `chr': 1090 out of char range (RangeError)

It turned that it was much smoother with REXML, and it really is a superior library for XML processing (Python's are either easy or full-featured, REXML seems to be both).

I initially took the path of using 'open-uri' for fetching the data over http. After all, it handled even http base authentication and abstracted the nitty-gritty details, and so was easy to use.

But it isn't meant to be used for more fine-grained control, and I soon ran into performance problems, which required special treatment. I found that I quickly exhausted the rate limit of the Twitter API- it's only 70 requests per hour, and with one request per user... you get the picture. The web interface wasn't actually subject to such restrictions, so I wanted to check how it's doing it. A slightly different URL, but worked like a charm, and rate limits seemed to be no problem now!

This time, though, the script ran much longer- 80 seconds compared to about 30 before the change. I analyzed the requests and found out that each received a 302 response, forwarding back to the home page. That meant that open-uri was downloading the whole home page for each user!

At that point open-uri had to go and make way for Net::HTTP. It took more lines to rewrite it, but now I had the choice not to follow redirect responses. I only needed to toggle notifications and didn't care what I got back (as long as it's not an error code). In addition, I could use the same Net::HTTP object, meaning that I use the same HTTP keep-alive connection (not sure if open-uri can do this).

And here's the result- dirty, but still quick. You can configure the action to "follow" or "leave" (to disable all notifications). You need to configure the user and password. Putting the configuration options as command-line arguments is left as an exercise to the reader.

#!/usr/bin/env ruby

require 'uri'
require 'net/http'
require 'rexml/document'
include REXML

user = "lazyuser"
pass = "notmypassword"
action = "follow"
PAGE_USERS = 100

Net::HTTP.start("twitter.com") do |http|
page = 0
begin
page += 1
req = Net::HTTP::Get.new("/statuses/friends.xml?lite=true&page=#{page}")
req.basic_auth(user, pass)

doc = Document.new(http.request(req).body)
ids = doc.elements.to_a("/users/user/id")
ids.each do |entry|
req_follow = Net::HTTP::Get.new("/friends/#{action}/" + entry.text)
req_follow.basic_auth(user, pass)
http.request(req_follow)
end
end while ids.size == PAGE_USERS
end

Wednesday, April 16, 2008

XML escape

Friday afternoon is a prime time for blitz-tasks and a rich opportunity for your hacking one-liner 5K1LLz skills.

This Friday the finish-up task came from a colleague, who had to leave in half an hour to catch the plane. There's a big several-megabyte XML file and all characters in it had to be escaped, I presume for preparation to be sent over the wire (read http).

Problem is, a certain Windows editor (does it really qualify that definition?) hangs when opening files bigger than 1234KB, and writing a Java program would take a fairly long time compared to the alternatives. Not to mention that many programmers could write the Java program even less memory-efficiently than the joke of an editor that Notepad is (there, I said it). And Java is not very forgiving on memory problems.

But what are the alternatives? As the Perl manual page says, "The three principal virtues of a programmer are Laziness, Impatience, and Hubris." Being a lazy programmer, I tried to see if somebody else had already written a utility to do this (there was zero chance that there wasn't one), and if it was available. First I found this eclipse plugin. However, pasting megabytes into a text box didn't make me confident that it would work.

There was also the xmlstarlet package, which would have done a wonderful job, had it been installed on the old servers where the file could be easily transferred. But it wasn't, while it would take too long to copy it to my machine and back just to convert the file. It would also be hard to find an appropriate package for that old Linux version. No, that's not for impatient programmers.

The next thought I had was: why spend effort on trying to install a package when with Ruby I could do this in a one-liner. Of course, I have nothing against Python, but if there's one thing nobody would argue is that it doesn't fare well against Ruby when it comes to writing one-liners. Anyway, Ruby wasn't installed there either (note to self: this must be amended).

The clock was ticking. So Python it is, and instead of an obfuscated one-liner I convinced myself to write many short readable lines. I hadn't done serious XML processing in Python for a while, but a google search away the answer came to me. It was really insultingly simple, but given enough hubris one could turn even this meager piece of code into a rambling blog post:


#!/usr/bin/env python

from xml.sax.saxutils import escape
from sys import stdin

for line in stdin:
print escape(line)

Thursday, April 10, 2008

Instant evaluation of Java code in OpenOffice

At the end of last year I had agreed to update a 4-day Java learning course to reflect the changes in Java 5 and 6. The problems with that were:


  • The course material was long- over 300 pages already
  • I know what you're thinking- that's 75 pages per freakin' day (not counting the exercises). All right, next time I'll publish a book
  • There was a deadline
  • ...tight as it usually is, considering it had to be done in off-work hours.
  • My first son was going to be born before the deadline
  • Uh-oh, now that should have made me worry.
  • The document was in Word .doc format
  • A big no-no. Maybe you have your reasons, but I have mine- I will never, ever write or maintain a significant piece of documentation like that (or insignificant, for that matter) ever again. And I mean it.


I really considered moving the contents to another format- be it LaTeX, docbook or a lightweight markup language like reStructuredText or asciidoc (a topic for another post). Now being out of time meant that I couldn't.

I should have also updated each and every single example for the new language syntax (where relevant) and test that it works. Now last I counted that was 254 snippets of code. OK, it could happen that one example was split across several snippets, but that's nonetheless tons of work- copy the code, paste it in a text editor, save it, compile it, run it, see if the results are the ones expected. If any step fails, rinse and repeat. Dirty work.

The reason I miss simple text formats is that it's so much easier to automate things. For instance, in the document you could include all your source files, which are located separately and tested automatically in one go.

But I didn't have this option. I needed to find some way to automate this. And the answer came from the very material I was going to present- Java Compiler API.

A frequently neglected feature in Java 6, Compiler API provided language libraries to control the process of compiling right in your Java code. So far you needed to save into a file and invoke a separate process to do that- really a workaround. If you've ever used eval in scripting language, you're going to miss instant evaluation sorely. With the new compiler API, well, you're still gonna miss it, but at least compiling is no longer a hack. Besides, control of the compiler now meant that your performance is going to be much better as you can even compile from a string in memory.

The idea began to form- I could define a BeanShell macro in OpenOffice. If we're using OpenOffice integrated with Java 6, then BeanShell will have access to the compiler APIs as well. The macro would compile a class from the selected text in the document (in memory) and load it (still in memory), then run it and display the results. This would certainly make testing the examples faster.

I'm usually lazy enough to first search for a similar solution (even if finding that solution takes more effort than writing it). The first source I came across about in-memory compilation was in the API documentation of JavaCompiler. It was a good start, it used SimpleJavaFileObject to read source from a String, but the class file was still compiled and saved to disk.

Along the same lines was the detailed article in JavaBeat- it showed how to compile from String using a SimpleJavaFileObject.

I knew there had to be more you need to do. The class file always appeared on disk. I was not very keen on the idea of writing a file to disk (implicitly or explicitly). It's slower, it's less secure and often more effort. I found what I was looking for in the velocityreviews forums. Bot I really struck gold with this really detailed document, which described almost exactly what I wanted to do. It's about visualizing the Java bytecode by the way.

There was one more point. I was wondering which graphical widget to use, and it turns out I could have a popup box using both the OpenOffice APIs or normal Java AWT/Swing. I chose OpenOffice, because the look and feel was better integrated- and because it was different than the Java libraries, which I already knew. I had to read a bit in the OpenOffice developer's guide, but it finally worked out.

So my final macro began to take shape. I first had to create the familiar SimpleJavaFileObject. I had to construct it with a String as an argument. The crucial point was to override the getCharContent method so it returns the class field with the String.


class JavaObjectFromString extends SimpleJavaFileObject{
private String contents = null;
public JavaObjectFromString(String className, String contents) throws Exception{
super(new URI(className + Kind.SOURCE.extension), Kind.SOURCE);
this.contents = contents;
}
public CharSequence getCharContent(boolean ignoreEncodingErrors) throws IOException {
return contents;
}
}


What I discovered in the last two sources was that I would need to implement a file manager, preferrably by extending ForwardingJavaFileManager. However, it needs another reimplementation of SimpleJavaFileObject, but this time for the class data itself, not the source. The important thing here is to override openInputStream and openOutputStream to correct the notion of the class about where its data is located:


static class RAMJavaFileObject extends SimpleJavaFileObject {

RAMJavaFileObject(String name, Kind kind) {
super(URI.create("string:///" + name.replace('.','/') + Kind.SOURCE.extension), kind);
}

ByteArrayOutputStream baos;

public CharSequence getCharContent(boolean ignoreEncodingErrors) throws IOException, IllegalStateException, UnsupportedOperationException {
throw new UnsupportedOperationException();
}

public InputStream openInputStream() throws IOException, IllegalStateException, UnsupportedOperationException {
return new ByteArrayInputStream(baos.toByteArray());
}

public OutputStream openOutputStream() throws IOException, IllegalStateException, UnsupportedOperationException {
return baos = new ByteArrayOutputStream();
}

}


Now we have everything necessary to extend on our ForwardinJavaFileManager so that it uses our implementation of in-memory class data. Note that we create a HashMap to be used later:


output = new HashMap();

class RAMFileManager extends ForwardingJavaFileManager{
RAMFileManager (JavaCompiler compiler){
super(compiler.getStandardFileManager(null,null,null));
}
public JavaFileObject getJavaFileForOutput(Location location, String name, Kind kind, FileObject sibling) throws java.io.IOException {
JavaFileObject jfo = new RAMJavaFileObject(name, kind);
output.put(name, jfo);
return jfo;
}
}


The last thing I define is a small utility method which just returns the String of the class. As you can see, I've made it convenient to select code fragments and the macro will create the necessary framework around them- a container class and a main method, if need be:


SimpleJavaFileObject getJavaFileContentsAsString(String outside, String inClass, String inMethod){
StringBuilder javaFileContents = new StringBuilder(outside +
"\npublic class TestClass{\n" +
inClass +
"\n public void testMethod() throws Throwable {\n" +
inMethod +
"try{this.getClass().getMethod(\"main\", String[].class).invoke(null, new Object[] {new String[]{}});} catch (NoSuchMethodException nsme) {}" +
"\n}\n" +
"}");
JavaObjectFromString javaFileObject = null;
try{
javaFileObject = new JavaObjectFromString("TestClass", javaFileContents.toString());
}catch(Exception exception){
exception.printStackTrace();
}
return javaFileObject;
}


Now that we have the main infrastructure in place it is time for the action to unfold. This means creating the compiler object, a new task and invoking the task. I do not really want to parse the java fragments in the document to find where they belong, so I use a trick. I assume the fragment belongs either outside of the class I'm generating (import statements, other classes), in the class definition (fields, methods) or in the main method (instructions). I try to put each selection (you haven't forgotten that you can have multiple selections in OpenOffice, right?) in one of these three areas and invoke the task to see if it compiles without error. If it does, I go ahead with the next one:


JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
JavaFileManager jfm = new RAMFileManager (compiler);

outside = inClass = inMethod = "";
for(i=0;i<count;i++) {
xTextRange = (XTextRange) UnoRuntime.queryInterface(XTextRange.class, xIndexAccess.getByIndex(i));
newText = xTextRange.getString();

JavaFileObject javaObjectFromString = getJavaFileContentsAsString(outside + newText, inClass, inMethod);
Iterable fileObjects = Arrays.asList(new Object[] {javaObjectFromString});
out = new StringWriter();
CompilationTask task = compiler.getTask(out, jfm, null, null, null, fileObjects);
Boolean result = task.call();

if(result){
outside += newText;
} else {
javaObjectFromString = getJavaFileContentsAsString(outside, inClass + newText, inMethod);
fileObjects = Arrays.asList(new Object[] {javaObjectFromString});
task = compiler.getTask(out, jfm, null, null, null, fileObjects);
result = task.call();
if (result){
inClass += newText;
} else {
javaObjectFromString = getJavaFileContentsAsString(outside, inClass, inMethod + newText);
fileObjects = Arrays.asList(new Object[] {javaObjectFromString});
task = compiler.getTask(out, jfm, null, null, null, fileObjects);
result = task.call();
if (result){
inMethod += newText;
} else {
message = "Compilation fails:\n" + out.toString();
msgtype = "errorbox";
title = "Compilation error";
showMessage();
return 0;
}
}
}
}


At this stage, we already have compiled the class, but how to execute it? We need to define a class loader which knows specifically where to find and how to define the class (note the HashMap we used to store the classes):


ClassLoader cl = new ClassLoader() {
protected Class findClass(String name) throws ClassNotFoundException {
JavaFileObject jfo = output.get(name);
if (jfo != null) {
byte[] bytes = ((RAMJavaFileObject) jfo).baos.toByteArray();
return defineClass(name, bytes, 0, bytes.length);
}
return super.findClass(name);
}

};


Now that we have the class, we can get the method and run it. Here I'm also doing some hocus-pocus in order to kidnap the standard output and show it in the message box- I am really interested in what actually is printed and if there are any exceptions:


Class c = Class.forName("TestClass", true, cl);
constructor = c.getConstructor(new Class[]{});
object = constructor.newInstance(new Object[]{});
method = c.getMethod("testMethod", new Class[]{});

sysOut = System.out;
sysErr = System.err;
// System.setOut(out);
PipedOutputStream pout = new PipedOutputStream();
BufferedReader brout = new BufferedReader(new InputStreamReader(new PipedInputStream(pout)));
System.setOut(new PrintStream(pout));

PipedOutputStream perr = new PipedOutputStream();
BufferedReader brerr = new BufferedReader(new InputStreamReader(new PipedInputStream(perr)));
System.setErr(new PrintStream(perr));

message = "Compiled and ran successfully:\n";
msgtype = "infobox";
title = "Success";

try {
method.invoke(object, new Object[]{});
} catch (Throwable t) {
message = "Runtime error:\n" + t;
msgtype = "errorbox";
title = "Runtime error";
}
sb = new StringBuilder();
while (brout.ready()) {
sb.append((char) brout.read());
}
message += (sb.length() == 0 ? "" : "\nOutput:\n") + sb.toString();

sb = new StringBuilder();
while (brerr.ready()) {
sb.append((char) brerr.read());
}
message += (sb.length() == 0 ? "" : "\nError:\n") + sb.toString();

System.setOut(sysOut);
System.setErr(sysErr);

showMessage();


And that's it! What's that you ask? We don't know anything about the showMessage method? You're really curious, aren't you? OK, here it is, mostly taken from the OpenOffice UNO interface:


showMessage() {
import com.sun.star.awt.XToolkit;
import com.sun.star.awt.XMessageBoxFactory;
import com.sun.star.awt.XMessageBox;
import com.sun.star.awt.XWindowPeer;

m_xContext = XSCRIPTCONTEXT.getComponentContext();
m_xMCF = m_xContext.getServiceManager();

Object oToolkit = m_xMCF.createInstanceWithContext("com.sun.star.awt.Toolkit", m_xContext);
XToolkit xToolkit = (XToolkit) UnoRuntime.queryInterface(XToolkit.class, oToolkit);

windowPeer = xTextDoc.currentController.frame.containerWindow;
XWindowPeer xWindowPeer = (XWindowPeer) UnoRuntime.queryInterface(XWindowPeer.class, windowPeer);

com.sun.star.awt.Rectangle aRectangle = new com.sun.star.awt.Rectangle();

XMessageBoxFactory xMessageBoxFactory = (XMessageBoxFactory) UnoRuntime.queryInterface(XMessageBoxFactory.class, oToolkit);
XMessageBox xMessageBox = xMessageBoxFactory.createMessageBox(xWindowPeer, aRectangle, msgtype, com.sun.star.awt.MessageBoxButtons.BUTTONS_OK, title, message);
if (xMessageBox != null){
short nResult = xMessageBox.execute();
}
}


What's missing is a couple of import statements and standard initializing lines found in every OpenOffice macro template, but this blog post is already too long. And it's better that you ask if you need it rather than get bored and not even get to the end.

Wednesday, April 2, 2008

Playing with Javascript or what binds Greasemonkey, Twitter and Ambient Avatars together

It's been a while since I tried JavaScript hacking (almost 2 years). This time I had the haunting idea to create a Greasemonkey mashup so I can see my twitter page with the avatar next to each tweet exactly as it looked at the time the tweet was posted.

To do this the avatar history must be stored somewhere. That's where chinposin.com comes in. Initially originated as a refreshing avatar on Friday, it evolved into the Ambient Avatar Platform (TM) (credit goes to @monkchips and @yellowpark- you're great). In simple words- you follow @chinposin on twitter, and when you change your avatar, the old one is saved. So you have a gallery of all of your previous avatars for your previewing pleasure and along with the dates they were changed.

For those of you wondering what's twitter, that's a topic for an entire new blog post... or a whole blog, so start at wikipedia, so we can continue with the interesting stuff, shall we?

So there we are- we want to include info from one site into another- a task where Greasemonkey excels (normally JavaScript cannot just fetch info from any other site at whim).

I've obviously lost some of my JavaScript knowledge since it took me an obscene amount of time to get this tiny piece of code working. To start off, I had forgotten that Greasemonkey had also some restrictions, not only enhancements. For security purposes, a lot of objects were wrapped in XPCNativeWrapper and I had to use loads of wrappedJSObject as a workaround. Yes, I know it's not secure, and you should know this too.

Another issue I had a problem with was passing an argument to a closure. I eventually remembered that the closure is an object and you can just assign any field to an object, because each object is also an associative array. Accessing the function object from itself also took some googling- arguments.callee did the trick.

So is there anything that can be improved in this shoddy script? You bet. For starters, it loads the chinposin site a lot, sending 20 simultaneous requests right off the bat, even for duplicate user pages. I could cache the avatar history, but that would require that I synchronize the requests. This script could be modified into a Firefox extension, which has less restrictions than Greasemonkey. And I really should use a prototype for those twenty closures I create, but I gotta have something to do for next time, right?

Without further ado, here's the script. Copy it and paste it into twitteravatarhistory.user.js (OK, you can come up with a longer name if you're so inclined). Then open it with Firefox and if Greasemonkey is installed you will be presented with a dialog prompting you to install it. It's tested with Firefox 2.0.0.13, 3a9, 3b4 and Greasemonkey 0.6.6.20061017 and 0.7.20080121.0. Considering the rate of change, I would be surprised it works in 1 year.

// ==UserScript==
// @name TwitterAvatarHistory
// @description Shows tweets with the avatar at time of posting
// @include http://twitter.com/*
// ==/UserScript==

// Assumptions:
// -chinposin.com has a special date string under the pic
// -avatars are listed chronologically
// -many others regarding DOM position

const avatar_home = "http://www.chinposin.com/home/";
var twitter_images = document.evaluate('//.[contains(@class, "hentry")]', document, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null);
while (message = twitter_images.iterateNext()) {
message = message.wrappedJSObject;

// Read user name
var url = message.getElementsByClassName("url")[0];
if (!url) continue;
var username = url.getAttribute("href").match("[^/]*$");

// Read date of message and extract fields with a regexp
var date_string = message.getElementsByClassName("published")[0].getAttribute("title");
var match = date_string.match(/(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})\+(\d{2}):(\d{2})/);
var date = new Date(match[1], match[2], match[3], match[4], match[5], match[6]);

var http = function(responseDetails) {
// add dummy element so we can operate on its DOM
var elem = document.createElement("html");
document.body.appendChild(elem);
elem.innerHTML = responseDetails.responseText;

// getElementById is only found in document object, will use XPath
var gallery = document.evaluate('//.[@id="gallery"]', elem.wrappedJSObject, null,
XPathResult.ANY_UNORDERED_NODE_TYPE, null).singleNodeValue.wrappedJSObject;

// Might be better to couple these more tightly than creating two separate arrays
var images = gallery.getElementsByTagName("img");
var dates = gallery.getElementsByClassName("mainText");

// Find avatar date not more recent than message date
for (i = 0; i < dates.length; i++) {
var match = dates[i].textContent.match(/(\d{4})-(\d{2})-(\d{2}) +(\d{2}):(\d{2}):(\d{2})/);
var avatar_date = new Date(match[1], match[2], match[3], match[4], match[5], match[6]);

if (avatar_date < arguments.callee.date) {
// Replace message pic with avatar corresponding to date
arguments.callee.img.firstChild.setAttribute("src", images[i].getAttribute("src"))
// TMTOWTDI:
//~ arguments.callee.img.replaceChild(images[i].cloneNode(false), arguments.callee.img.firstChild);
break;
}
}

// clean up temp structure
document.body.removeChild(elem);

}

// Trick to pass data to the closure
http.date = date;
http.img = message.getElementsByClassName("url")[0];

// Reach list of pix from user page
GM_xmlhttpRequest({method : "GET", url : avatar_home + username, onload : http});

}


Update: code formatting had munched some of the Greasemonkey header, that should be fixed now.

Update 10 April 2008: New code's on Greasemonkey repository since last week, today a fix was issued that adapts to twitter interface changes.