Speaking my (programming) language?

Monday, January 4, 2010

Is Scala more complicated than Java?

One Scala-related thread on Artima drew a lot of attention: "Is Scala really more complicated than Java?". This post really struck a nerve. Whoever claims that Scala is much more complicated than Java has clearly not seen a Java Programmer Certification in a while and is probably not using many new features since Java 5 came out.

What I'll try to prove in this post is not that Scala is not a complicated language. There are certainly many languages which are simpler. The core features which are used reasonably often are indeed a simplification over Java. Scala also has features which are more complicated than what Java has. However, the complicated Scala features are more specialized at extending the language while the complexity of Java is usually imposed on everyone including the beginner.

This post will also not try to describe the language features in exhaustive detail- that's what the language specification tries to achieve, and the blog post is already long enough. I will assume that you know about the core language rules or can easily look them up.

What is complexity? Many conflate it with unreadability, some say it's the opposite to ease of use. Let's start with the following definition of complexity: it's the many special exceptions (pun not intended) to the rules, making the whole system difficult to understand and remember.

Based on that definition, let's run a comparison of language features of Java and Scala.

Keywords

Java has more reserved words than Scala. Even when we remove the words of primitive types (boolean, int, float, etc.) Scala still has less keywords!

Scala misses some of Java's control structures

continue

break

if/then/else

for loop

for

Scala keywords not in Java

override

@Override

override

Access modifiers

Java has four access modifiers: default (package), private, public and protected. Scala has three, but it can combine them with a scope for protection. This flexibility allows you to define access control, for which Java has no analogs. They are also readable because they look self explanatory. For instance, if you have the class org.apache.esme.actor.UserActor, these are the Scala equivalents for Java's access modifiers:

private[actor]: Same as package access in Java
private[UserActor]: Same as private in Java

Scala's default access is public (yay, one less keyword!).

On the other hand, by defining the scope, Scala allows visibility types, which Java doesn't have:

private[this]: the member is accessible only from this instance. Not even other instances from the same class can access it
private[esme]: access for a package and subpackages. How many times did you have to use clumsy workarounds in Java because subpackages couldn't access the parent package?
protected: only subclasses can access the field. This is more consistent than Java's protected keyword. In Java, you have to remember that both subclasses and classes in the same package can access the field. How's that for remembering arbitrary rules?
private: This will not allow you to access inner objects. This is also more consistent than Java's private access, which is perhaps indicative of the fact that inner objects in Java were bolted on after the first version of the language was created

Namespaces

Java has four namespaces (fields, methods, packages, types), Scala has two (one for types, one for everything else). This has important implications when overriding. In Scala, you can start with a parameterless def (method definition) and then override with a val (immutable variable). This helps enforce the uniform access principle: whether you access a field or a method, this doesn't restrict you later on, because the syntax is the same everywhere you access it.

One thing which you cannot do is define a variable and a method with the same method. The "Programming in Scala" book explicitly mentions that allowing this would be a code smell because of possible code duplication and the ambiguity which could arise.

Types and the type system

Scala's type system has been criticized for being too complex, but let's have a look at Java's type system.

Primitive types

There are many exceptions in Java's type system, which make the language hard to use and evolve. Java has primitive types, while in Scala everything is an object, making this valid code:


(1 to 3) foreach println

Java's primitive types make a lot of things harder- before Java 5, manual wrapping in collections, e.g. int in Integer, boolean in Boolean, etc. was a pain. After auto-boxing and unboxing came out, the syntax is cleaner, but the overhead remains, as well as some very subtle bugs. For example, auto-unboxing a null wrapper type will cause a NullPointerException.

There is a lot of code duplication because of primitive types- you always have to specify special cases and can't be truly generic.

Generics

Scala has generics done right. For instance, you can define covariance at the declaration site, whereas Java requires you to do this at the call site. This, combined with Scala's type inference allows one to use generified libraries without having to know or define the complete type signature.

Java has yet another "special" type: arrays. As a result the rules for the underlying array type and the generified ArrayList are quite different and inconsistent. The type of arrays is checked at runtime, while the genericity of ArrayList is checked at compile-time. As a result, inappropriate assignment to an array element results in a ArrayStoreException only at runtime.

Constructors

Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors.

In Scala, any code which isn't part of a method or member variable declaration is executed as the primary constructor. You can define auxiliary constructors which call either the primary one or another auxiliary one defined in the same class before it. Can you come up with anything simpler?

Uniform syntax

Scala is sometimes accused of using too many symbols. Whatever you've seen, it's mostly not part of the language, but of the libraries. You can override them and even disable them. What does Java have to say about special symbols?

Arrays

Arrays in Java are accessed via square brackets. In Scala, parentheses are used, because accessing an array index is a method call (called apply). Don't worry though, the compiler optimizes this away.

Collection literals

Java has a lot of special syntax for array instantiation and will soon have ones for instantiating lists, sets and maps. Scala, again, creates these special collections using factory methods in the companion objects: Array(1,2,3), List(1,2,3), Map((1,2)). Lists can also be created using the cons "operator", but here's the trick: it's not actually an operator. It's a method, appending to the list. You can also create a map using the arrow tuple syntax: Map(1->2). And again, this is not "special" syntax, which is part of the language- it's a method, constructing a tuple!

Now someone might smirk and think: "Ha, gotcha! Do you mean to say that simply because you've pushed the complexity out of the language and into the libraries you don't need to deal with it?". True, but let's have a look at Java and its ever growing standard libraries. It has AWT and Swing. It has old I/O and new I/O (gosh, which one do I use?). It has RMI (do you remember RMI?) and OMG, it even has CORBA. These libraries will never die. Methods in Thread have been deprecated for ages. There's also no sign that the ill-conceived Date/Calendar classes will ever be removed, but you still must know JodaTime if you hope to get any job with dates done.

More importantly, extending the language easily helps abstract away the details and evolve the language without creating tons of special cases. As per our definition, special cases add up to increase complexity. We'll explore the topic of extending the language in the followup post.

Monday, December 28, 2009

Bridge-crossing with Clojure

Solving puzzles has become popular among colleagues at the end of this year. One of these was the popular Bridge and torch problem. To summarize: four men have to cross a bridge in the dark with a torch, which can only light the way for two persons. These guys can cross the bridge in 1, 2, 5 and 10 minutes, respectively. When two people cross the bridge, it takes the time needed for the slower one to pass.

I'm not going to explain the optimal solution, but I wanted to show how this is a graph-traversal problem, which is amenable to solving using Dijkstra's algorithm. I also noticed that some clever guy has solved the puzzle using Prolog. So an idea was formed: generate the tree of solutions using a language, which I'm not really comfortable with (so I can practice and learn it better). I settled on Clojure, because I wanted to learn it this year (and the year is almost over).

The solution has a lot to do with sets. The state of each of the banks (which I'll call left and right) is a set of people, which are identified by their crossing speed. If we generate the list of possible combinations between the persons for each next move and recurse into them, then we'll have the tree of all solutions. In order to avoid indefinitely long (and stupid) solutions, we assume that going forward always takes two men, and going back takes only one.

The necessary libraries are set, combinatorics and str-utils. That's where I start:


(use '[clojure.contrib.combinatorics])
(use '[clojure.set])
(use '[clojure.contrib.str-utils])
(use '[clojure.contrib.pprint])

Here are the initial data structures: the two river banks are sets, the right one is empty:


(def left #{1, 2, 5, 10})
(def right #{})

And here's a first version of the solution:


(defn forward [left right steps minutes]
        (map #(back
                (difference left %)
                (union right (set %))
                (cons % steps)
                (+ minutes (reduce max %)) )
            (combinations left 2)) )

(defn back [left right steps minutes]
    (if (empty? left)
        (list steps minutes)
        (map #(forward
                (union left (set %))
                (difference right %)
                (cons % steps)
                (+ minutes (reduce max %)) )
            (combinations right 1)) ) )

(pprint (forward left right nil 0))

This prints some semblance of a tree, constructed by nested lists (hey, that's what Lisp is good for, remember?). What I didn't like about this is that there is too much repetition. There are many parts of these functions, which look the same. In order to evolve the solution, I wanted to extract the common parts in a single function:


(defn solve [from to steps minutes group]
    (if (and (empty? to) (not group))
        (list steps minutes)
        (mapcat #(solve
                (union to (set %))
                (difference from %)
                (cons % steps)
                (+ minutes (reduce max %))
                (not group) )
            (combinations from (if group 2 1))) ) )

(pprint (solve left right nil 0 true))

The function behaves differently depending on whether the group of people crossing consists of two or just one person. Also, the tree is now flattened, so the optimal solution can be easily found like this:


(print (reduce #(if (< (second %1) (second %2)) %1 %2) (apply hash-map (solve left right nil 0 true))))

Next step, I want to be able to collect a lot more data about each state, so I need to stop messing around with lists of lists and use something more organized like a hashmap. There's a reason Clojure includes the function second, but not third or, heaven forbid, fourth. Naming each part of your ad-hoc data structure plays a major role in readability.


(defn solve [from to steps minutes group]
    (if (and (empty? to) (not group))
        {:steps steps, :from from, :time minutes}
        {:steps steps, :from from, :time minutes,
            :solutions (map #(solve
                    (union to (set %))
                    (difference from %)
                    (cons % steps)
                    (+ minutes (reduce max %))
                    (not group) )
                (combinations from (if group 2 1)))} ) )

So now the interesting part can begin- rendering the tree in a nice graphical form using GraphViz. Watch this:


(defn printgraph [solution_tree]
    (let [steps (solution_tree :steps)
        nodename (if (nil? steps) "root" steps)
        from (solution_tree :from)
        label (if (= (count from) (count left)) (solution_tree :time) from)]
        (str "\"" nodename "\" [label=\"" label "\"];\n"
            (if (contains? :solutions solution_tree)
                ""
                (str-join ""
                    (map #(str "\"" nodename "\" -> "
                            "\"" (:steps %) "\""
                            " [label=\"" (intersection from (:from %)) "\"];\n"
                            (printgraph %))
                        (solution_tree :solutions)) ) ) ) ) )

And now you can output the recursive stuff wrapped with the header and footer like this:


(println "digraph G {")
(print (printgraph (solve left right nil 0 true)))
(println "}")

This graph shows each node as the state of alternating river banks, and the edges show the people crossing the bridge. The leaves display the number of minutes which it took for this particular combination of crossings.

I also wanted to generate a mind map for FreeMind, but since it doesn't support edge labels, I used the persons crossing for the nodes:


(defn mindmap [solution_tree]
    (str-join "\n"
        (map #(str
                "<node text=\"" (intersection (:from solution_tree) (:from %)) "\">"
                (mindmap %)
                "</node>")
            (solution_tree :solutions)) ) )

Now generating the complete file according to the FreeMind format:


(println "<map><node text=\"root\">\n")
(print (mindmap (solve left right nil 0 true)))
(println "</node></map>\n")

Finally, finding the optimal solution is just a wee bit more involved, because it requires traversing the tree recursively, but that's no problem for us anymore:


(defn optimal [tree]
    (if (contains? tree :solutions)
        (reduce #(if (< (second %1) (second %2)) %1 %2) (map optimal (tree :solutions)))
        (list (tree :steps) (tree :time)) ) )

(print (optimal (solve left right nil 0 true)))

In conclusion, I knew that one can write very compact code in Clojure (and any Lisp, in general), and this was certainly true here. I didn't use the Clojure-specific concurrency primitives or the lazy sequences, so my example would probably look similar in another Lisp. I did try to make it as functional as possible and Clojure does encourage that.

Some of the difficulties I had could be solved by static type checking, but I guess I'm spoiled by Scala. I've ran into small problems like renaming a variable and forgetting to rename (or misspelling) some instance of it in my code. I also tend to mix up which of the arguments to contains? comes first, and which second. This is probably because you can exchange a symbol as a key in a hashmap, and the hashmap without changing the meaning, e.g. (:from solution_tree) and (solution_tree :from). Of course, this is a problem with dynamic typing, not of Clojure in particular (and could probably be interpreted as a problem of using the language inappropriately).

Whatever the small hurdles, the problem was solved. Having fun? Check. Generating nice graphs and mind maps to impress colleagues? Check. Learning a bit of Lispy Clojure in the process? Check. Mission accomplished.

Thursday, December 3, 2009

String interpolation in Scala

When I tried Scala for the first time, one of the first missing language features I noticed was string interpolation (you guessed it, I was using Ruby at the time). Of course, this is a small convenience rather than a major language feature and usually String.format and java.util.Formatter are pretty good substitutes. Still, string interpolation comes now and then in Scala discussions and one has to wonder if Scala's powerful language extension facilities can emulate a feature like this.

It turns out you can get reasonably close:

trait InterpolationContext {
  class InterpolatedString(val s: String) {

    def i = interpolate(s)
  }

  implicit def str2interp(s: String) = new InterpolatedString(s)

  def interpolate(s: String) = {
    val sb = new StringBuilder
    var prev = 0
    for (m <- "\\$\\{.+?\\}".r.findAllIn(s).matchData) {
      sb.append(s.substring(prev, m.start))
      val matchString = m.matched
      var identifier = matchString.substring(2, matchString.length - 1)
      try {
        val method = this.getClass.getMethod(identifier)
        sb.append(method.invoke(this))
      } catch {
        case _: NoSuchMethodException =>
        sb.append(matchString)
      }
      prev = m.end
    }
    sb.append(s.substring(prev, s.length))
    sb.toString
  }
}

object Test extends InterpolationContext {
  val a = "3"
  def main(args: Array[String]) {
    println("expanded: ${a}; not expanded: ${b}; more text".i)
  }
}

How does this work? If you call the i method on a String, it will force implicit conversion to a InterpolatedString, similar to how the r method works for converting to a Regex. The interpolate method uses reflection to find a method with the name equal to the identifier delimited by "${" and "}". This works both for vals and defs due to Scala's adhering to the uniform access principle. If the delimited string is not a valid identifier or a parameterless method with this name doesn't exist, it is not substituted, but stays as it is.

How do you use it? Just extend or mix in the InterpolationContext trait in the class where you want this to work, and then call i on the Strings where you want interpolation to work. The InterpolationContext trait serves two purposes- first of all, it imports the implicit conversion to the interpolated string, and second it provides access to the methods of the current object via reflection.

The limitations of this method are that interpolation only works on member vals and defs of the current object only. I rather like this, because you know you can't accidentally construct a String from user input in an html form like "Name: ${System.exec(\"echo echo pwned >> .bashrc\")}". Also, interpolation doesn't work for private members as well as local variables. Finally, you have to both mix in or extend the trait and call a method on every String (even though it's a one-letter method). This is not too bad, because you can control the scope where interpolation works and therefore avoid and debug nasty surprises easier.

I don't see this as a pattern, which can be used in real-life projects, and I don't see the use of implicits here worthy of emulation. Nevertheless, trying to copy features from other languages is instructive about the limitations of your current language of choice and the tradeoffs you get for these features. Last but not least- even if it's a gimmick feature, implementing it was fun.

Monday, November 23, 2009

Error detection with pattern matching

Scala's pattern matching is one of the most powerful features in the language. It not only helps write concise and very readable code, but also helps prevent trivial errors.

Let's say you want to save a line or two of code when comparing for some empty data structure like a List. You decide to use comparison for equality:


val items = Nil
if (items == Nil) println("No items")

Then you decide to refactor and turn the items collection into a Set. The Scala compiler is clever and should give you a warning, right? Well, not quite:


val items = Set()
if (items == Nil) println("No items")

This results in nothing printed, as the expression evaluates to false. Of course, given the types it's perfectly clear at compile-time that this will never be true. Can't the compiler give you a hint? Indeed, it will if you use pattern matching:


items match {
  case Nil => println("No items")
  case _ =>
}

This will result in the following error message:


error: pattern type is incompatible with expected type;
 found   : object Nil
 required: scala.collection.immutable.Set[Nothing]
         case Nil => println("No items")
              ^

The examples are contrived (unparameterized Nil and Set?), but you get the point. Yes, Nil can never be a value of the Set type. You might think this is fairly obvious, but pattern matching can be a life saver when implicit conversions are involved. Consider this:


"heh".reverse == "heh"

What does this evaluate to? This should be obvious, right? But the value is false! If you used pattern matching, you would easily see why:


"heh".reverse match {
  case "heh" => "obvious?"
}

This will make the compiler very nervous, and this is the reason why:


error: type mismatch;
 found   : java.lang.String("heh")
 required: scala.runtime.RichString
         case "heh" => "obvious?"
              ^

So reverse converts the String to the wrapper RichString, which is not the same type as String.

I have had similar problems detecting a bug where I was checking for equality with None a variable which was of type net.liftweb.common.Box (a type very similar conceptually to Scala's built-in Option).

This made me adopt a general rule to prefer pattern matching rather than equality comparison. The bugs it catches are sometimes subtle and hard to see, and that's exactly what Scala's rich static type checker tries to avoid. Use it to your advantage.

Since we're talking about bugs caught by pattern matching, there's one subtle bug, which is often (though not always) caught by the compiler (another contrived example follows):


val items = Set()
Set() match {
  case items => println("empty")
  case _ => println("full")
}

This will result in an error, which looks a little bit unusual to the newbie:


error: unreachable code
         case _ => println("full")
                   ^

This error is usually crying out loud: hey, you're inadvertently using variable binding instead of constant matching! The newly bound variable items shadows the existing variable with the same name and all other cases after it will never match.

One way to fix it is to use backticks to prevent the name to be bound to a new variable:


Set() match {
  case `items` => println("empty")
  case _ => println("full")
}

As a rule of thumb it is advised to use CapitalLetters for case classes and constants which you intend to pattern match.

This error wouldn't have occurred if you used equality comparison in the first place, but even in mildly complex cases pattern matching trumps plain equality checking in readability and detecting errors. Apparently, there are cases where pattern matching fails (for instance, matching structural types), so there's still no reason to deprecate good old "==". But there are many more errors, which pattern matching catches, like checking if the match is exhaustive. So there's no point in saving a couple of characters but lose the type safety you expect from Scala.

Wednesday, November 4, 2009

Embedded Scala interpreter

The Scala interpreter is proof that a statically typed language can have features most people only expect from a dynamically-typed language. One of the cool uses of an interpreter is embedding it within your application. This allows you to conveniently experiment with Scala and probably even interact with object instances in your running system. I won't explain how the interpreter works here, but I will try to show you a simple way of embedding the interpreter.

As it usually happens, someone beat me to it. Josh Suereth explains in great detail how to embed an interpreter, but he has done so many customizations that his solution would probably fit on several pages.

I wanted a simpler solution which one could understand at a glance. The code for Lord of the REPLs is much shorter although it doesn't customize much of what the standard interpreter offers.

I tried to come up with the shortest working version you could possibly get away with. Provided I create the settings properly, this is what I could muster:


val out = new java.io.StringWriter()
val interpreter = new scala.tools.nsc.Interpreter(settings, new PrintWriter(out))
interpreter.interpret(code)

Not too much code, is it? (Half of it is probably due to the full package names). Now you could collect your output from the "out" stream and probably convert to String if you need using "out.toString".

Not so fast, though. I said this works if I have the appropriate settings:


val settings = new scala.tools.nsc.Settings()

The problem here is that the interpreter doesn't find two of the crucial jars needed for its proper functioning: scala-compiler.jar and scala-library.jar. When it doesn't it spits out the following error:


scala.tools.nsc.FatalError: object scala not found.

Thanks to the following discussion by Eric Torreborre (author of Specs) I managed to find out that one needs to add to the bootclasspath of the settings object:


val origBootclasspath = settings.bootclasspath.value
val pathList = List(jarPathOfClass(compilerPath),
                         jarPathOfClass(libPath))
settings.bootclasspath.value = (origBootclasspath :: pathList).mkString(java.io.File.separator)

One could hardcode the path to these two jars, but that's not too flexible. If we want to do it right, we might create a function which discovers the path to the jar from the name of a class that's in it:


def jarPathOfClass(className: String) = {
  Class.forName(className).getProtectionDomain.getCodeSource.getLocation
}

Now you could find the paths to these jars like this:


val compilerPath = jarPathOfClass("scala.tools.nsc.Interpreter")
val libPath = jarPathOfClass("scala.ScalaObject")

I've read that getProtectionDomain.getCodeSource returns null in some classloaders and might have problems specifically with OSGi. In that case, one might need to resort to the following hack:


def jarPathOfClass(className: String) = {
  val resource = className.split('.').mkString("/", "/", ".class")
  val path = getClass.getResource(resource).getPath
  val indexOfFile = path.indexOf("file:")
  val indexOfSeparator = path.lastIndexOf('!')
  path.substring(indexOfFile, indexOfSeparator)
}

With the last ugly piece of code creating an interpreter is not so concise anymore, but sometimes you can't be both robust and concise.

If you want to see the above snippets assembled in one piece you can check out Apache ESME's source code for the ScalaInterpreter action.

Warning: interpreting code directly in your application is a huge security risk and might not always be a good idea.

Tuesday, October 13, 2009

Scala closures as poor man's mocks

Groovy has this feature that you can use a closure whenever you need an interface with one method only. A class implementing the interface is automatically created, and the closure provides the implementation of this single method. This process is called closure coercion and is particularly convenient to make tests readable and concise.

I'm not yet sure about the relative advantages of such code everywhere, since there might be ambiguities or the intent might be obscured. Tests usually contain a lot of boilerplate, though, so I'm all for making them more concise. Except for readability, people would be more likely to write a test if it doesn't take much effort to create yet another trivial mock.

Who tests your tests though? You don't need to answer. Still, the page about implementing interfaces in Groovy warns of this trap: "Be careful that you don't accidentally define the map with { }". Type safety could help sometimes when writing tests. Or at least that's a very good excuse to try to emulate this Groovy feature in Scala.

In Scala there are implicit conversions, which help define a controlled way to coerce a value to a different type. We cannot automatically convert all closures to all possible single-method interfaces in scope, but we can choose several we are using especially often (as it happens in tests). For instance, if we want to use a mock of java.lang.Readable often enough, we could define the following implicit conversion:


import java.nio.CharBuffer

implicit def fun2readable(f: Function1[CharBuffer,Int]): Readable =
  new Readable { def read(cb: CharBuffer) = f(cb) }

So now everywhere we need a Readable, we might just drop in the appropriate closure instead (technically, this is not a mock, but a stub):


val readable: Readable = {cb: CharBuffer => cb.put("12 34"); 5}
val s = new java.util.Scanner(readable)
assert (s.nextInt == 12)

Compare this with the Groovy version:


def readable = { it.put("12 34"); 5 } as Readable
def s = new Scanner(readable)
assert s.nextInt() == 12

In the Scala version, we are explicitly defining the type of the variable in order to force the implicit conversion. In methods, where Readable is expected as an argument, explicitly naming the type will not be necessary, whereas in Groovy you always need to coerce using "as Readable":


val s = new java.util.Scanner({cb: CharBuffer => cb.put("12 34"); 5})

So in Scala you're trading some verbosity up front for some conciseness in all of the code where the implicit conversion is visible. Furthermore, the type checker guarantees that only closures with the correct signature are converted to the interface in question (i.e. with an input of CharBuffer and output of Int) and will flag a compile-time error otherwise.

The reader might be tempted to also emulate map coercion, but beware of an unexpected difficulty: it's not easy to define both a closure (with one argument) and map coercion. Since a Map in scala is a function and due to type erasure, the compiler won't know whether to substitute the Map or the Function!

Can you generate these implicits automatically without defining them? Maybe with a Scala compiler plugin, but then you lose some of the transparency of knowing which implicits you've defined.

Saturday, September 26, 2009

Scala interpreter in Firefox' location bar

SimplyScala is awesome for trying out some quick Scala tricks. However, you can go one step further, and execute Scala one-liners in your Firefox location bar!

1. Create a bookmark to URL http://www.simplyscala.com/interp?code=%s
2. Associate a keyword to the bookmark. Mine is "scala", you can try something shorter like "s" or ">".
3. Now when you type "scala <expression>", the expression will be sent to SimlpyScala and evaluated. For instance, "scala List(1,2,3).reverse"

Why does this work? Whenever you have a keyword associated with a bookmark Firefox expands "%s" in the URL to the text following the keyword. Check this Lifehacker article for details.

The nice thing is that your bindings are kept in your session, so you can use variable names defined in previous invocations. There are inconveniences, for instance you cannot easily preview your history and it's hard to review inputs which are more than a couple of lines long.

I also find that in combination with URL-shortening services it's an acceptable alternative to using code snippet sharing sites like gist.github.com or paste.pocoo.org. The code is actually part of the URL, which makes sense for shorter code snippets. You don't have syntax highlighting, but you have the evaluated result readily available.