Monday, November 23, 2009

Error detection with pattern matching

Scala's pattern matching is one of the most powerful features in the language. It not only helps write concise and very readable code, but also helps prevent trivial errors.

Let's say you want to save a line or two of code when comparing for some empty data structure like a List. You decide to use comparison for equality:


val items = Nil
if (items == Nil) println("No items")


Then you decide to refactor and turn the items collection into a Set. The Scala compiler is clever and should give you a warning, right? Well, not quite:


val items = Set()
if (items == Nil) println("No items")


This results in nothing printed, as the expression evaluates to false. Of course, given the types it's perfectly clear at compile-time that this will never be true. Can't the compiler give you a hint? Indeed, it will if you use pattern matching:


items match {
case Nil => println("No items")
case _ =>
}


This will result in the following error message:


error: pattern type is incompatible with expected type;
found : object Nil
required: scala.collection.immutable.Set[Nothing]
case Nil => println("No items")
^


The examples are contrived (unparameterized Nil and Set?), but you get the point. Yes, Nil can never be a value of the Set type. You might think this is fairly obvious, but pattern matching can be a life saver when implicit conversions are involved. Consider this:


"heh".reverse == "heh"


What does this evaluate to? This should be obvious, right? But the value is false! If you used pattern matching, you would easily see why:


"heh".reverse match {
case "heh" => "obvious?"
}


This will make the compiler very nervous, and this is the reason why:


error: type mismatch;
found : java.lang.String("heh")
required: scala.runtime.RichString
case "heh" => "obvious?"
^


So reverse converts the String to the wrapper RichString, which is not the same type as String.

I have had similar problems detecting a bug where I was checking for equality with None a variable which was of type net.liftweb.common.Box (a type very similar conceptually to Scala's built-in Option).

This made me adopt a general rule to prefer pattern matching rather than equality comparison. The bugs it catches are sometimes subtle and hard to see, and that's exactly what Scala's rich static type checker tries to avoid. Use it to your advantage.

Since we're talking about bugs caught by pattern matching, there's one subtle bug, which is often (though not always) caught by the compiler (another contrived example follows):


val items = Set()
Set() match {
case items => println("empty")
case _ => println("full")
}


This will result in an error, which looks a little bit unusual to the newbie:


error: unreachable code
case _ => println("full")
^


This error is usually crying out loud: hey, you're inadvertently using variable binding instead of constant matching! The newly bound variable items shadows the existing variable with the same name and all other cases after it will never match.

One way to fix it is to use backticks to prevent the name to be bound to a new variable:


Set() match {
case `items` => println("empty")
case _ => println("full")
}


As a rule of thumb it is advised to use CapitalLetters for case classes and constants which you intend to pattern match.

This error wouldn't have occurred if you used equality comparison in the first place, but even in mildly complex cases pattern matching trumps plain equality checking in readability and detecting errors. Apparently, there are cases where pattern matching fails (for instance, matching structural types), so there's still no reason to deprecate good old "==". But there are many more errors, which pattern matching catches, like checking if the match is exhaustive. So there's no point in saving a couple of characters but lose the type safety you expect from Scala.

Wednesday, November 4, 2009

Embedded Scala interpreter

The Scala interpreter is proof that a statically typed language can have features most people only expect from a dynamically-typed language. One of the cool uses of an interpreter is embedding it within your application. This allows you to conveniently experiment with Scala and probably even interact with object instances in your running system. I won't explain how the interpreter works here, but I will try to show you a simple way of embedding the interpreter.

As it usually happens, someone beat me to it. Josh Suereth explains in great detail how to embed an interpreter, but he has done so many customizations that his solution would probably fit on several pages.

I wanted a simpler solution which one could understand at a glance. The code for Lord of the REPLs is much shorter although it doesn't customize much of what the standard interpreter offers.

I tried to come up with the shortest working version you could possibly get away with. Provided I create the settings properly, this is what I could muster:


val out = new java.io.StringWriter()
val interpreter = new scala.tools.nsc.Interpreter(settings, new PrintWriter(out))
interpreter.interpret(code)


Not too much code, is it? (Half of it is probably due to the full package names). Now you could collect your output from the "out" stream and probably convert to String if you need using "out.toString".

Not so fast, though. I said this works if I have the appropriate settings:


val settings = new scala.tools.nsc.Settings()


The problem here is that the interpreter doesn't find two of the crucial jars needed for its proper functioning: scala-compiler.jar and scala-library.jar. When it doesn't it spits out the following error:


scala.tools.nsc.FatalError: object scala not found.


Thanks to the following discussion by Eric Torreborre (author of Specs) I managed to find out that one needs to add to the bootclasspath of the settings object:


val origBootclasspath = settings.bootclasspath.value
val pathList = List(jarPathOfClass(compilerPath),
jarPathOfClass(libPath))
settings.bootclasspath.value = (origBootclasspath :: pathList).mkString(java.io.File.separator)


One could hardcode the path to these two jars, but that's not too flexible. If we want to do it right, we might create a function which discovers the path to the jar from the name of a class that's in it:


def jarPathOfClass(className: String) = {
Class.forName(className).getProtectionDomain.getCodeSource.getLocation
}


Now you could find the paths to these jars like this:


val compilerPath = jarPathOfClass("scala.tools.nsc.Interpreter")
val libPath = jarPathOfClass("scala.ScalaObject")


I've read that getProtectionDomain.getCodeSource returns null in some classloaders and might have problems specifically with OSGi. In that case, one might need to resort to the following hack:


def jarPathOfClass(className: String) = {
val resource = className.split('.').mkString("/", "/", ".class")
val path = getClass.getResource(resource).getPath
val indexOfFile = path.indexOf("file:")
val indexOfSeparator = path.lastIndexOf('!')
path.substring(indexOfFile, indexOfSeparator)
}


With the last ugly piece of code creating an interpreter is not so concise anymore, but sometimes you can't be both robust and concise.

If you want to see the above snippets assembled in one piece you can check out Apache ESME's source code for the ScalaInterpreter action.

Warning: interpreting code directly in your application is a huge security risk and might not always be a good idea.

Tuesday, October 13, 2009

Scala closures as poor man's mocks

Groovy has this feature that you can use a closure whenever you need an interface with one method only. A class implementing the interface is automatically created, and the closure provides the implementation of this single method. This process is called closure coercion and is particularly convenient to make tests readable and concise.

I'm not yet sure about the relative advantages of such code everywhere, since there might be ambiguities or the intent might be obscured. Tests usually contain a lot of boilerplate, though, so I'm all for making them more concise. Except for readability, people would be more likely to write a test if it doesn't take much effort to create yet another trivial mock.

Who tests your tests though? You don't need to answer. Still, the page about implementing interfaces in Groovy warns of this trap: "Be careful that you don't accidentally define the map with { }". Type safety could help sometimes when writing tests. Or at least that's a very good excuse to try to emulate this Groovy feature in Scala.

In Scala there are implicit conversions, which help define a controlled way to coerce a value to a different type. We cannot automatically convert all closures to all possible single-method interfaces in scope, but we can choose several we are using especially often (as it happens in tests). For instance, if we want to use a mock of java.lang.Readable often enough, we could define the following implicit conversion:


import java.nio.CharBuffer

implicit def fun2readable(f: Function1[CharBuffer,Int]): Readable =
new Readable { def read(cb: CharBuffer) = f(cb) }


So now everywhere we need a Readable, we might just drop in the appropriate closure instead (technically, this is not a mock, but a stub):


val readable: Readable = {cb: CharBuffer => cb.put("12 34"); 5}
val s = new java.util.Scanner(readable)
assert (s.nextInt == 12)


Compare this with the Groovy version:


def readable = { it.put("12 34"); 5 } as Readable
def s = new Scanner(readable)
assert s.nextInt() == 12


In the Scala version, we are explicitly defining the type of the variable in order to force the implicit conversion. In methods, where Readable is expected as an argument, explicitly naming the type will not be necessary, whereas in Groovy you always need to coerce using "as Readable":


val s = new java.util.Scanner({cb: CharBuffer => cb.put("12 34"); 5})


So in Scala you're trading some verbosity up front for some conciseness in all of the code where the implicit conversion is visible. Furthermore, the type checker guarantees that only closures with the correct signature are converted to the interface in question (i.e. with an input of CharBuffer and output of Int) and will flag a compile-time error otherwise.

The reader might be tempted to also emulate map coercion, but beware of an unexpected difficulty: it's not easy to define both a closure (with one argument) and map coercion. Since a Map in scala is a function and due to type erasure, the compiler won't know whether to substitute the Map or the Function!

Can you generate these implicits automatically without defining them? Maybe with a Scala compiler plugin, but then you lose some of the transparency of knowing which implicits you've defined.

Saturday, September 26, 2009

Scala interpreter in Firefox' location bar

SimplyScala is awesome for trying out some quick Scala tricks. However, you can go one step further, and execute Scala one-liners in your Firefox location bar!

1. Create a bookmark to URL http://www.simplyscala.com/interp?code=%s
2. Associate a keyword to the bookmark. Mine is "scala", you can try something shorter like "s" or ">".
3. Now when you type "scala <expression>", the expression will be sent to SimlpyScala and evaluated. For instance, "scala List(1,2,3).reverse"

Why does this work? Whenever you have a keyword associated with a bookmark Firefox expands "%s" in the URL to the text following the keyword. Check this Lifehacker article for details.

The nice thing is that your bindings are kept in your session, so you can use variable names defined in previous invocations. There are inconveniences, for instance you cannot easily preview your history and it's hard to review inputs which are more than a couple of lines long.

I also find that in combination with URL-shortening services it's an acceptable alternative to using code snippet sharing sites like gist.github.com or paste.pocoo.org. The code is actually part of the URL, which makes sense for shorter code snippets. You don't have syntax highlighting, but you have the evaluated result readily available.

Wednesday, September 16, 2009

3 things you didn't know Scala pattern matching can(not) do

Since I started learning Scala, pattern matching has become a favorite feature. But powerful as it is, pattern matching has some limitations. On the other hand, there are some unexpected ways to use pattern matching.

Fake multiple assignment of tuples



You already knew you can initialize multiple variables using the tuple syntax:

val (a, b) = (1, 2)


I want to do the same with closure parameters, though. For example, to swap the elements of 2-tuples in a list:


List(1 -> "one") map {
t => (t._2, t._1)
}


Still, using numbered slots for tuples doesn't seem as readable. It would be great if we could easily assign variables with meaningful names to document the purpose of the elements of the tuple. Parameters are already packed in a tuple, so why can't I do this:


List(1 -> "one") map {
val (num, str) = _; (str, num)
}


Unfortunately this is not legal syntax. Of course, I can explicitly bind the tuple to a temporary variable and then decompose it:


List(1 -> "one") map {
t => val (num, str) = t; (str, num)
}


But this seems too verbose for Scala. Fortunately, there's a trick which can help. Pattern matches are actually instances of partial functions! The closures used for map, filter, etc. are functions, too. This means we can use a pattern match to bind variables to the closure parameters:


List(1 -> "one") map {
case (num, str) => (str, num)
}


Match the last element of a list



Decomposing lists in pattern matching is another power feature of Scala. You can match the first element of a list:


List(1, 2, 3) match {
case List(1, _*) => "yeah!"
}


The underscore-star (_*) symbol serves as a placeholder for the rest of the elements of the list. You can also do this:


List(1, 2, 3) match {
case 1 :: _ => "hurray!"
}


Which mimicks construction of the list using the cons operator. So what if I want to match on the last element instead of the first?


(1 to 9).toList match {
case List(_*, 8, 9) => "sure"
}


Hey, this seems like it's working! Not so fast, though:


(1 to 9).toList match {
case List(_*, 18) => "yeah, right"
}


In fact, in Scala 2.8 this syntax will trigger an error message: "_* may only come last"

How about using the alternative notation?


(1 to 9).toList match {
case _ :: 9 :: Nil=> "no way"
}


Unfortunately Scala expects the placeholder to be an element, not a list. The triple colon expects a list, will it work?


(1 to 9).toList match {
case _ ::: 9 :: Nil => "absolutely not"
}


Nope, this isn't even recognized by the compiler. While you're wondering why there is a ::: operator, but it's unknown to Scala in pattern matching, remember that ::, which is used in pattern matching, is actually a class. Furthermore, the syntax "a :: b" is in fact a more readable expression of "::(a, b)", which in turn is a short form of "call unapply on the matching expression and check if it decomposes to a and b".

And this, in short, is how the Scala black magic, called "extractors", works. But can we also use it to match on the last element of a list? Sure! Just define an object (let's call it "::>") and define its unapply method. The method expects a list and must return a tuple of the "init" part of the list and the "last".


object ::> {def unapply[A] (l: List[A]) = Some( (l.init, l.last) )}

List(1, 2, 3) match {
case _ ::> last => println(last)
}

(1 to 9).toList match {
case List(1, 2, 3, 4, 5, 6, 7, 8) ::> 9 => "woah!"
}
(1 to 9).toList match {
case List(1, 2, 3, 4, 5, 6, 7) ::> 8 ::> 9 => "w00t!"
}


Match structural types



Since you've heard of structural types, didn't you wish you could do this to find out whether a class has a certain method?


List(1) match {
case t: {def length: Int} => "cool!"
}


Unfortunately, here again we are fooled in thinking method checking happens when matching the class. It's not:


List(1) match {
case t: {def aoeu: Int} => "really?"
}


The reason for this is that the class type is erased to AnyRef. Unfortunately, there's no clean workaround for this- yet. This looks positively ugly:


List(1) match {
case t: {def aoeu: Int}
if t.getClass.getMethods.exists{_.getName == "aoeu"} => "not really!"
}


Do not despair, though- there's a chance that this feature is going to land in Scala sooner or later.

Tuesday, June 23, 2009

The ultimate Twitter client

...is a Feed reader!

OK, not quite, but after some thinking about the top features the ideal Twitter client should have, I found feed readers have most. For the minor drawback of not being able to post messages, you get:


  • Marking messages as read

  • Marking mails as read is absolutely critical for email clients, since you can easily see at any time which messages are new. There's enough cognitive load on your brain already, it doesn't need to rescan again and again messages you've already seen. Paradoxically, many Twitter clients don't have this feature, and there are far more messages in a typical twitter timeline than in a typical email inbox.

  • Fixed replies

  • If you know what #fixreplies means, you know that some folks prefer to see the replies of their friends to people who are not also their friends. This is an excellent way to discover new people to follow and manages to capture a lot more interesting conversations. Well, if you track the individual feeds of your friends (more on that later), all of their replies are there.

  • Favorites of friends

  • Yeah, most Twitter clients let you see your own favorites, but that's backwards. Of course, you already know which tweets you've marked as favorites! What you really want to know is what others, mostly your friends have marked as favorites. You could see favorites' timelines of individual users, but who would want to check all of their friends' favorites manually? There's a real need to track an aggregated list of messages your friends recommend. The whole flood of retweet (aka "RT") messages are a response to this need. Retweeting is not a solution though, as it pollutes Twitter and especially search results with tweets with duplicate content. Ever searched for Scala on the day The Register published the article that Twitter is "dumping" Ruby for Scala (ignore the fact Twitter was not really dumping Ruby)?

  • Tracking

  • Remember the days of old when Twitter had Instant Messaging integration? And you could track messages from any user by a keyword? It couldn't scale, of course. Then there was Twitterspy, which provided identical functionality, but it fell down under the weight of popularity too. Well, you can use Twitter search. And you can create an atom feed out of your Twitter search. It does not have the real-time responsiveness of IM, but I'd take that over nothing.

  • No follow/unfollow counting

  • This is actually a feature. People have complained before that followers' count should go, since it serves no purpose other than trophy collecting and is no measure of the usefulness of someone's tweets.

    There's also Qwitter. It's a service, which shows you when someone quits, and the message after they quit. If you thought for a moment this is good, think again. Rarely ever someone decides you're not worth following after a single tweet. The corresponding reaction could be either to become too careful about what you tweet (you become too boring) or demanding an explanation from the qwitter (lame, but I heard some are doing it). Noone will chastise me if I unsubscribe from their blog because I have no time to read it, why take Twitter personally?



But some of these "features" require you to import the feed of every single one of your users. Noone in their right mind would do that manually, but there's a way to generate a list of feeds in the form of an OPML file (which many readers can import). First of all, get a list of your friends. Then apply the following XSL stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" encoding="ISO-8859-1" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/">
<opml>
<body>
<outline title="twitter">
<xsl:for-each select="/ids/id">
<outline>
<xsl:attribute name="title"><xsl:value-of select="."/></xsl:attribute>
<xsl:attribute name="xmlUrl">http://twitter.com/statuses/user_timeline/<xsl:value-of select="."/>.atom</xsl:attribute>
</outline>
</xsl:for-each>
</outline>
</body>
</opml>
</xsl:template>

</xsl:stylesheet>


This contains only the most essential elements of a feed, and the name of each feed is the ID, not the name, but Gooogle Reader imports the feed successfully and some readers can rename the feed from information provided in the RSS/Atom format.

For an OPML of your friends' favorites, replace "statuses/user_timeline" with favorites.

Monday, February 9, 2009

3 Scala gotchas

It's been over a year since I bought my "Programming In Scala" book by Martin Odersky. Since then, a lot has changed. There are now 3 more Scala books in the works: a "Programming Scala" by Venkat Subramaniam (Pragmatic Bookshelf), "Beginning Scala" by David Pollak (Apress) and another "Programming Scala" by Alex Payne (of Twitter fame) and Dean Wampler (by O'Reilly). Scala is the key actor in a couple of new software projects (pun intended), including one at Apache (ESME, to which yours truly is a committer; ok, I have no shame).

There's no perfect language, though, and you can't say you know a language until you know its warts. So here's my share of traps, which wasted a lot of my time- I certainly hope you discover them before you waste yours. Of course, these don't mean the language is ill-designed; if only minor corner cases like these can be discovered, this would suggest that a lot of thought has been put in the language.

Main method in companion objects



object Test {
def main(args: Array[String]) {
println("Running!")
}
}


You copy & paste a simple example and compile it. Great, everything works and you're happy. So now you read about companion objects and decide to use your main method so you can test the companion class. You start with a dummy and (if you're cautious), try to compile right away:


class Test {
}


Good, it compiles. Running it, though, results in a NoSuchMethodException for main. Huh? Well, there's a main method only in singleton objects, not in companion objects.

Variables and constants in pattern matching




val Extension = "xml"
"json" match {
case Extension => "Wrong!"
}


Here, you want to match on the contents of a variable. It works as expected: in the above example a MatchError is thrown, as expected. So suddenly you decide that you want to rename the variable...


val fileExtension = "xml"
"json" match {
case fileExtension => "Wrong!"
}


Boom! Now pattern matching matches things it shouldn't! Actually, it matches everything you throw at it. What has happened? Well, Scala checks the name in order to decide if it's an existing constant (capitalized first letter) or a new variable name it should bind (lower-case first letter). So in the last example a new variable called "fileExtension" was created, which shadows the existing variable name.

Always returning Unit from methods




def test1 {
println("test1")
}

def test2 = {
"test"
}


No surprises here- one method prints a String, the other returns one. So then you suddenly decide that you want only one method to this (contrived) and merge these two:


def test {
println("test1")
"test2"
}

println(test)


...and the output is not what is expected: only "test1" is printed. What's wrong here? In Scala, if you omit the equals sign after the method declaration and before the method body, the result is always Unit (same as void in Java).