Speaking my (programming) language?: scala

Showing posts with label scala. Show all posts

Sunday, July 31, 2011

Partially unexpected effects of chaining partial functions in Scala

People who learn Scala usually agree that pattern matching is a great feature which helps make your code more expressive. Some time later they also discover that partial functions used in the match statement can also be used separately. And since partial functions are full-blown functions, you can combine them in a couple useful ways:

f1 orElse f2: Combines two partial function in a new one; if f1 is not defined in its argument, tries f2.
f1 andThen f2: Applies the result of f1 to f2. This means that the output of f1 must be a type compatible with the input of f2

orElse

As others have discovered, you can combine a list (TraversableOnce really) of partial functions into one with reduce. What's not so obvious though is that the way you combine them can lead to unexpected perfomance consequences.

In order to easily create a lot of partial functions to test, we will create a higher-order function to generate them (if you're used to Java, you can call it a factory). The produced partial function will print a short message when its isDefinedAt method is called (not when it's applied):


def genPF(defined: Int): PF = { case i if {println(defined); defined == i} => i }
val f123 = 1 to 3 map genPF reduceLeft(_ orElse _)

Let's try it:


> f123(1)
1
1
1
> f123(2)
1
2
1
2
1
2
> f123(3)
1
2
3

Wait, what? The isDefinedAt method is called up to 6 times. It gets even worse with a bigger number of composed functions.


val f1to5 = 1 to 5 map genPF reduceLeft(_ orElse _)


> f1to5(2)
1
2
1
2
1
2
1
2
> f1to5(3)
1
2
3
1
2
3
1
2
3
> f1to5(4)
1
2
3
4
1
2
3
4

Let's take a closer look at the definition of isDefinedAt and apply of the function created with orElse:


def orElse[A1 <: A, B1 >: B](that: PartialFunction[A1, B1]) : PartialFunction[A1, B1] =
  new PartialFunction[A1, B1] {
  def isDefinedAt(x: A1): Boolean =
    PartialFunction.this.isDefinedAt(x) || that.isDefinedAt(x)
  def apply(x: A1): B1 =
    if (PartialFunction.this.isDefinedAt(x)) PartialFunction.this.apply(x)
    else that.apply(x)
}

When you apply a composed partial function, we first check if it's defined in either f1 or f2, and then we check f1 again, so that we know which one to call. This means that in the worst case, isDefinedAt for f1 is called twice.

Given this, we can explain what happens here. The isDefinedAt delegates to the composed functions' methods, and when it's called twice... you know what happens when we do this again and again. We can fairly easily find out that isDefinedAt is called k * (n - k + 1) times, where n: number of composed functions, k: the first function that matches.

Luckily, there is an easy solution to combine partial functions in a more efficient way. We can use reduceRight, where isDefinedAt for each composed function is invoked at most twice. Verifying this and finding out why is left as an exercise for the curious reader (as you undoubtedly are, since you're reading this).

andThen

You would think that f1 andThen f2 should be defined only in the cases when the results of f1 are defined in f2


val doubler: PartialFunction[Int,Int] = { case i if Set(1,2) contains i => i * 2 }
val f = doubler andThen doubler

Of course, that's not how it works. In order to find out if the output of f1 is a valid input for f2, we would need to execute the function, and it's better not to do this in case it has side effects. This means that we cannot rely on calling isDefined for the combined function to avoid MatchErrors:


> f isDefinedAt 2
res3: Boolean = true
> f(2)
scala.MatchError: 4
...

Conclusion: when you're looking for performance, it always help to understand how the abstractions you're using decompose into simpler building blocks.

Friday, June 17, 2011

Testing actors in Scala

Probably the most frequent question that people asked on Scala eXchange 2011 was how to test actors. Since I've long planned to write up a blog post on this topic, this was an indication that it's high time to get it done.

It seems that the main problem people have is that actors are asynchronous and this introduces non-determinism in their tests. How do you know when to check if the actor has received and processed the message? Do you just wait a certain number of seconds before checking? This would make tests unnecessarily slow.

Another problem I think folks have is that they don't know how to verify that an actor has sent a message to another actor. Developers are familiar with mocking objects to verify that a method has been called, but how do you mock an actor to verify it has received a certain message?

Finally, it is difficult for most to handle the fact that it's not easy (or at least not idiomatic) to check the actor's internal state. This is especially valid with Akka actors which are created using a factory method and you can't define methods for mucking with the actor's internals.

Let me first address the last one. How do you check that internal state of an actor has changed? You don't! I find that actors are better at following the Object-Oriented principle of encapsulation even than objects are. Relying on a certain internal state couples your tests unnecessarily to the implementation and makes them brittle. As Viktor Klang has pointed out, if the internal state of an actor cannot be observed outside the actor, does it really matter what it is?

So how do you know when an actor has processed a message which was sent asynchronously? An easy way to eliminate non-determinism is to define a method where the functionality is located and call that upon receiving a message:


object MyActor {

  def computation(arg1: String, arg2: Int) = {

    ...
    result
  }
}

class MyActor extends Actor {

  loop {
    react {
      case Message(arg1, arg2) =>

        anotherActor ! computation(arg1, arg2)
    }
  }

}

Then you can test the method the way you are familiar with. Putting the method in the companion object means that you can't test the internal state- but this also means this approach should work with Akka actors as well. We also avoid the problem of checking if the next actor in the chain has received the result.

Sometimes you cannot test a helper method and sometimes testing the method is not enough. In these cases you want to verify explicitly that an actor has sent a message to another actor as a result of receiving a certain trigger message. Here's another approach we use at Apache ESME, which works very well:


case object Wait

class ConductorActor extends Actor {
  def act {

    react {
      case Wait => reply {
        receive {

          case MessageReceived(msg, reason) => msg
        }

      }
    }
  }
}

What's going on here? We define a helper actor (the recipient) which is the one supposed to receive the message from the actor we want to test (the sender). Usually the sender we want to test doesn't send a message to a hardcoded recipient- it is a good idea to either inject it as a construction parameter at instantiation time or register it via a message representing a subscription request.

This actor uses a fairly rarely used nested syntax, which is only available with the actors in the standard Scala library. The recipient handler would reply synchronously (which is what we want) with the message received only after we give it the signal that we've already sent a message to the sender. This implementation also relies on the fact that unhandled messages are kept in the inbox if there's no handler for them. While this can lead to memory leaks if these messages don't get handled, it is a nice way to process out-of-order messages, which is something we take advantage of here. This is similar to selective receive in Erlang and is a fairly painless way to handle race conditions- it doesn't matter which message has been received first here.

These features are not present in Akka, but you could emulate nested handlers using become and keep unhandled messages using. An even better idea would be to use the built-in TestKit or the akka-expect project, which use the same technique in a not so ad-hoc manner (but AFAIK don't work for non-Akka actors).

So now the only thing we need to do is send the trigger message to the tested sender and then ask the recipient if the resulting message has been sent by the sender:


// wait till the message appears in the timeline
// or fail after 5 seconds
val msgReceived = conductor !? (5000L, Wait)

if (msgReceived.isEmpty) fail("no message received")

If the recipient gets the message within a certain timeframe, the test is successful, otherwise we time out and fail the test. The nice thing about this approach is that in the happy path case, the test can continue immediately without slowing down the test suite. The test is slowed down by the designated timeout only when the test is going to fail, but this should be an exceptional event.

A minor inconvenience is if the sender doesn't expect the recipient to be a Scala library actor, but e.g. a Lift actor, but this can be easily overcome by using a bridge actor, which only acts as an intermediary and just forwards the request to the designated recipient actor:


class BridgeActor(receiver: Actor) extends LiftActor {

  protected def messageHandler = {
    case nm @ MessageReceived(_, _) => receiver ! nm

  }
}

val liftActor = new BridgeActor(conductor)

Distributor ! Listen(theUser.id.is, liftActor)
Distributor ! Listen(followerUser.id.is, liftActor)

Here we're injecting one actor as a construction parameter and registering another via the Listen message.

Further research

If you're using Akka, your best bet is the recommended TestKit- here's an article on how to use it:

http://roestenburg.agilesquad.com/2011/02/unit-testing-akka-actors-with-testkit_12.html

Another solution would be to use the akka-expect framework:

https://github.com/joda/akka-expect

A more universal library is Awaitility, which uses a similar solution, but with more general applicability to Java threads and Scala actors:

http://code.google.com/p/awaitility/

ScalaTest and Specs also have the conductor actor, which implement a similar idea of using a CountDownLatch to make actors deterministic:

http://www.scalatest.org/scaladoc/doc-1.0/org/scalatest/concurrent/Conductor.html

Thursday, October 21, 2010

Why Scala's Option won't save you from lack of experience

Some time ago Cedric Beust was the cause for some excitement in the Scala community by declaring that he doesn't see the advantages of using Scala's Option type, which is also similar to Haskell's Maybe type.

There were a lot of insightful comments which outlined the benefits of using Option. James Iry has written a well-reasoned post called Why Scala's "Option" and Haskell's "Maybe" types will save you from null.

I wanted to approach things differently. I wanted to show people some patterns which usually come with experience and let them decide which is better. That's why this turned into a very long post which looks more like a tutorial. Of course, I'm not adding anything new to the discussion, but just summarizing some of the common wisdom accumulated by the community. I'm sure this is not the last typical blog post showing the wonders of Option, but I hope it can clear up some (Some?) misunderstandings. Or Maybe not.

But more importantly, I wanted to explain why having a solution as a language feature is a premature optimization. It's neither as flexible, nor as powerful as having it as just a type in the library.

NullPointerException

Cedric is definitely not alone- programmers who decide to give Scala a try (and moreso Haskell or the ML family) face conceptual differences from popular imperative languages. Just reading that Option is a type wrapper does not mean it's easy to wrap one's head around it.

One advantage of using Scala is that if you are convinced that NullPointerExceptions solve your problem better, you are free to use it. Option is just one option. And of course, you can come back later any time if you make up your mind that Option has Some advantages (e.g. composability). Of course, some might view having too many choices as a disadvantage.

But both Scala and Clojure must live with the design decisions on the JVM which were taken before them, and with interoperating with a wealth of existing libraries. So allowing null is first of all a practical decision.

Getting started

First of all, let's define a simplistic employee type, a list of employees, and a map where the "builder" occupation points to the list of employees.


case class Person(name: String, email: String)
val employees = List(Person("bob", "bob@builder.com"))
val occupations = Map("builder" -> employees)

Compile-time safety and backward compatibility

One of the examples given by Cedric is pattern matching on an Option type. His main point of contention is that this looks very much like testing for null. Except one minor point: where is the example similar to the case when you don't want to test for null?

Exactly. No such valid example exists if you want to use the value inside Option, at least not unless you're explicit about it. As Paul Snively mentions, the compiler will stop you. Cedric has noticed, "the worst part about this example is that it forces me to deal with the null case right here". But this is not the worst part, it's maybe the best part.

Do you remember a feature which was added to Java 5, which was intended to save you from another type of exception, ClassCastException? Of course, that would be generics! The problem is, it gives you type-safety, but only as long as you use classes compiled with the Java 5 compiler and you don't use the escape hatch of raw types. As soon as you start using legacy code, you leave the safety of compiler checked code. You can ignore the warnings at your own risk, because then there's no guarantee you won't get a ClassCastException.

Does this remind you of something? Compile-time safety as long as you don't use legacy code or the escape hatch? These restrictions sound exactly like the ones Option has.

And of course, there is an escape hatch. You can use Option.get or instead of pattern matching, you can even use your old friend the if statement (Scala veterans, please close your eyes now):


val evangelists = occupations.get("evangelist")
// ugly, ugly, ugly
if (evangelists == None)
  println("No such occupation here!")
else
  println("Found occupation " + evangelists.get)

But, as you'll see later, this doesn't mean that you have to deal with the "no value" case right here. Pattern matching is not the only option.

Simplification through a generalization

Instead of solving the most obvious problem, it always pays out to see if it isn't a manifestation of a bigger class of problems. Having a related class of problems solved by a common pattern simplifies things. There are fewer rules to remember. Not only that, but the specific applications of the general solution begin to interact in ways you couldn't have anticipated before. Eventually problems will appear which would be solved by a general solution, problems which you didn't know about when you implemented the solution.

As it turns out, the problem of syntax similar to the safe dereference operator can be solved in Scala. I would say that having no explicit syntax for this, it's a fairly elegant solution, but this is subjective opinion.

Handling both value and lack of value and stop processing

This is handled by our well-known pattern match. It seems easy to use and obvious in what it does.

The advantage of pattern matching is that it uses the type system in such a way that forgetting to handle one of the cases explicitly will result in a compile time warning.


occupations.get("builder") match {
        case Some(_) => println("builder occupation exists")
        // oops, forgot to check for None or the catch-all _
}
// warning: match is not exhaustive!
// missing combination           None

The disadvantage is that pattern matching doesn't compose very elegantly. If the result of the pattern match is just an intermediate step, you'll need to add another one, and another, and pattern matching does take some screen real estate.

Pattern matching is a bit like exception handling with try/catch blocks- you usually do it when you're interested in both the normal behaviour and the exceptional behaviour and that's fairly verbose. On a related note, did you know that you can use pattern matching in Scala's exception handlers?

So let's see what we can do to get more composable data processing.

Transform value

When we're interested in creating a series of steps for processing a value, we can use map. It will transform the value if it's there, but will leave it inside the Option. And map won't change the Option if it's empty (None.map will result in None).


val employee = employees find ( _.name == "bob" )
// Some(Person(bob,bob@builder.com))
employee map ( _.email )
// Some(bob@builder.com)

If you need to "flatten" the result you can use flatMap. This means that instead of an Option nested inside an Option, you get just one Option. It only results in Some (a "full" Option type) if it's called on Some and also results in Some:


val builders = occupations.get("builder")
// Some(List(Person(bob,bob@builder.com)))
val bobTheBuilder = builders flatMap { _ find ( _.name == "bob" ) }
// Some(Person(bob,bob@builder.com))

If you were using just map, you would get Some(Some(Person(bob,bob@builder.com))), which is probably a bit too nested for your taste.

Some of you are probably familiar with other languages which have map (like Ruby or Python) and are scratching their heads: "Wait, wasn't map defined only for lists/Enumerables?". Please be patient.

Only get the value if it satisfies a test

If you find only some of the possible values useful, you can weed out what you have by using filter. It will only result in Some for values which satisfy a certain condition (called a predicate).


bobTheBuilder filter { _.email endsWith "builder.com"}
// Some(Person(bob,bob@builder.com))

I'm sure at this point the folks who have used Google Collections have also joined the folks with past Ruby or Python experience screaming: "Hey, but filter is only used for Collections!"

Transform lack of value

That's fine, but eventually you want to get the value out. If there's no value, just assume some default value. We have to use pattern matching again, right?

But there is a shorter solution. getOrElse extracts the value or puts a default value of the same type if there's nothing in the Option container:


val larryWho = employees find ( _.name == "larry" )
// None
val emptyEmail = larryWho map ( _.email )

// None
emptyEmail.getOrElse("nobody@nowhere.com")
// nobody@nowhere.com

Groovy has this in the form of the Elvis operator. The trouble is, you can't get rid of the elvis operator, it's just adding cruft to the language, even though it's a useful one. It's also somewhat restrictive that it's all this operator can do.

Chain, chain, chain

The reason map, flatMap, filter and getOrElse are so useful is that they can be chained together, intermixed and the results can be passed around to other methods.

Let's shift to high gear and put it all together:


occupations.get("builder").
flatMap { _ find ( _.name == "bob" ) }.
map (_.email).
filter { _ endsWith "builder.com"}.
getOrElse("nobody@nowhere.com")
// bob@builder.com

If we're not yet interested in which step processing has failed, this is a clear way to express the process flow. It's also similar to the Fantom example Cedric desribed.

There is an ever shorter syntax for this using for expressions (or for comprehensions).


{for (builders <- occupations.get("builder");
      bobTheBuilder <- builders find (_.name == "bob");
      email = bobTheBuilder.email if email endsWith "builder.com"
      ) yield email
} getOrElse "nobody@nowhere.com"
// bob@builder.com

But wait, weren't for expressions a way to loop over stuff? Well, yes, this too. More generally, for expressions work with collections. And the beauty of it all is that we can use collections together with Option and do nested invocations. Let's modify the example a bit and suppose that there might be more than one person named Bob and we want them all.


for (builders <- occupations.get("builder") toList;
     bobTheBuilder <- builders if bobTheBuilder.name == "bob";
     email = bobTheBuilder.email if email endsWith "builder.com"
     ) yield email
// List(bob@builder.com)

Because for all practical purposes, Option behaves like a specialized collection. By viewing it as one, you reuse the experience of all the programmers using Groovy, Ruby, Python, Google Collections and whatnot, and flatten the learning curve.

Now imagine that the only way to work with a collection is to pattern match it. Would you use it? Yeah, me neither.

Safe invoke and composability

Now let's see how filter works in Fantom:


fansh> list := [1, 2, null]
fansh> list.findAll |v| { v.isEven }
sys::NullErr: java.lang.NullPointerException

Oh crap, then I need to to use the safe invoke operator:


fansh> list.findAll |v| { v?.isEven }
ERROR(20): Cannot return 'sys::Bool?' as 'sys::Bool'

But it all results in a compile-time error. It's the same story with the reduce higher-order function:


fansh> list.reduce(0) |r, v| { v + r }
sys::NullErr: java.lang.NullPointerException

So I can't practically use the safe invoke operator in nullable collections with filter/reduce, which the Fantom documentation has conveniently omitted from the documentation page. So we're back to checking for null the old way. This means that unlike flatMap, the safe invoke works fine when you chain, but not when you compose.

Iterator

Let's now see some other advantages of Option behaving like a collection. For instance, it lets you use Iterable's API in some elegant ways:


val noVal: Option[Int] = None
val someVal = Some(4)
List(1,2,3) ++ someVal ++ noVal
// List(1, 2, 3, 4)

Guess what happens here? Only the numbers contained in Some are added to the list. I think there's no operator for this in Fantom and Groovy, and it would be overkill to include one, too.

One size doesn't fit all

Wait, if Option is like a collection, does this mean that there are many types of Option? Does this mean I can create my own Option?

Yes, and yes. Just as there isn't just one type of List, or one type of Map, there can also be several types of Option. For instance, Lift defines its own, which it currently calls Box (I think it's a great metaphor). One of the things Box has in addition to Option is a type to collect Failures. It's no longer just a "dunno what happened, something failed along the way". It's a list of error messages which can pinpoint exactly what went wrong. This is invaluable for a web framework, because when a user expects a complex form to be validated, a simple "some of our input is wrong" just won't cut it.


for {
  id <- S.param("id") ?~ "id param missing" ~> 401
  u <- User.find(id) ?~ "User not found"
} yield u.toXml

And guess what, you can also define your own operators, which also work in for comprehensions.


for {
  login <- get("/account/verify_credentials.xml", httpClient, Nil)
    !@ "Failed to log in"
  message <- login.post("/statuses/update.xml", "status" -> "test_msg1")

    !@ "Couldn't post message"
  xml <- message.xml
} yield xml

Except that they're not operators. When conventions and patterns evolve, Lift folks can always change the "operator". Or another framework can do it.

Another good example of using an enhanced Option is Josh Suereth's (jsuereth) Scala ARM library. It's collecting a list of errors, and Java's safe resource blocks proposed for Java 7 looks primitive in comparison.

Option is not only Scala's to have

Actually, there's nothing specific about Option that ties it to Scala. The only thing which is Scala specific is the syntax sugar of for comprehensions. You can use Option in Java if you want, although some of the examples above wouldn't be as concise and so it might be a bit of a pain. But this doesn't stop people from trying to recreate Maybe in Java.

Many folks have even tried to cheat and use Java's enhanced for expression as syntax sugar. So if Java can afford some syntax sugar over Iterator, and according to Joshua Bloch the for expression is a clear win, why shouldn't Scala do it? The difference is only that Scala's is applicable to a wider set of problems.

Why language syntax won't save you from the future

One advantage which some people don't realize Scala has is that it's a relatively minimal language with a relatively rich library. Apart from Option, there are other examples where having a library instead of a language feature has brought huge benefits to Scala. One such example is actors.

Let's compare Scala's actors to Erlang's. Undoubtedly Erlang is the daddy of practical actor implementations. Actors in the Erlang virtual machine have some superb characteristics which other runtime implementations will have a hard time catching up with. They're scalable and lightweight. They work across hosts and in the same virtual machine. They can be hot swapped and you can create millions of them in a single virtual machine.

But there's only one type of actor. This means it must deal with all possible cases, and as it usually happens, it deals better with some and worse with others. I have no doubt that having actors as part of the language, choosing only one type of actor is a very sensible decision, but it can still be restrictive sometimes.

Scala deals with this differently. Scala's actors are not part of the language, and the actor message send syntax (which is borrowed from Erlang) is just a method invocation in a library. This means that Scala's free to evolve different actor implementation, and you're free to choose the one which suits your case better. Some are more full-featured, some are lightweight and performant; some are remote, some are local; some use a thread pools, some use a single scheduler; some use managed hierarchies, some don't. Regarding actors Scala the language is smaller than Erlang, but the Scala libraries are richer.

Which actor library will win? I don't know. And probably neither do you. There might not be one best answer. That's why hardcoding stuff in the language is not a good way to prepare for the future. Only experience will, either the collective experience of the community or the extensive experience of a genius Benevolent Dictator For Life.

Sunday, January 17, 2010

Is Scala more complicated than what Java tries to become?

Is Scala more complicated than Java? My last post did not tell the whole truth. I've only listed Scala features, which have a Java analog. There is a glaring omission of advanced Scala features like implicit conversions, operator overloading, call-by-name parameters and pattern matching. These Scala features are more complicated than what Java has. There, I said it. But then Scala is more complicated in the way a calculator is more complicated than an abacus- sure you can do some of the same stuff with an abacus, but trying to calculate the square root of a number is much more cumbersome.

However, this complexity pays off, because it lets us simplify many day-to-day features. This post will try a different angle by comparing where Java wants to be and where Scala is right now. I hope after reading it you will at least question your assumptions whether this trade-off is worth it.

Upon its creation, Java was a fairly simple language. A major reason it took over C++ is because it was specifically designed to steer away from multiple inheritance, automatic memory management and pointer arithmetic. But it's not a simple language anymore, and it's getting more and more complicated.

Why? Java wasn't designed to be too extensible. Scala, on the other hand, was designed to be scalable, in the sense of flexible syntax. The very creators of Java knew very well that a "main goal in designing a language should be to plan for growth" (Guy Steele's famous words from Growing a Language)

We need to expand our definition of language complexity. The language needs to be able to abstract away accidental complexity, or using it will be difficult. Examples of accidental complexity: jumping to a position in your program with goto, and then remembering to go back (before procedural programming); or allocating memory, and then remembering to deallocate it (before garbage collectors). Another example: using a counter to access collections, and remembering to initialize and increment it correctly, not to mention checking when we're done.

Creating extensions of the language in order to hide these complexities doesn't happen often. When it does, it offers huge rewards. On the other hand, if a language is rigid, even though it looks simple, this forces you to invent your own arcane workarounds. When the language leaves you to deal with complexity on your own, the resulting code will necessarily be complicated.

Let's see what special new language features Java tries to add to the language, which Scala can do because of its flexibility and extensibility.

Pattern matching

Pattern matching is often compared with Java's switch/case statement. I have listed pattern matching as something which doesn't have an analog in Java, because comparing it to "switch" really doesn't do it justice. Pattern matching can be used for arbitrary types, it can be used to assign variables and check preconditions; the compiler will check if the match is exhaustive and if the types make sense. Meanwhile Java has only recently accepted Strings in switch statements, which is only scratching the surface of Scala's pattern matching.

Furthermore, Scala is using pattern matching all through the language- from variable assignment to exception handling. To compare, the proposal for handling multiple exceptions in Java is postponed yet again.

Case classes

In order to get rid of Java's verbose getters, setters, hashCode and equals, one solution is to muck with the javac compiler, like the folks from Project Lombok have done. Is going to the guts of javac complicated? I'm sure it is.

In Scala, you can do it if you just define your classes as case classes.

Implicit conversions

In short, implicit conversions help transparently convert one type to another if the original type doesn't support the operations requested.

There are many examples where this is useful.

What in Java is hardcoded in the language as conversions and promotions, in Scala is defined using implicit conversions. This is another example where Java can get quite complicated. In most cases where you need to decide how to convert a method argument, for instance, you must have in mind narrowing and widening conversions, promotions, autoboxing, varargs and overriding (whew!). In Scala, the advantage of having implicit conversions is that you can inspect the code, where no ambiguity can result. You can analyze the conversions taking place in the interpreter by supplying the "-Xprint:typer" parameter. You can even disable these implicits, if you don't like them, by shadowing the import.

Another example of what implicits can do is adding methods and functionality to existing classes. Dynamic languages already do that easily using open classes and "missing method" handlers. In Java one way to do this using bytecode manipulation trickery via libraries like cglib, bcel, asm or javassist.

Bytecode manipulation in Java is required for popular libraries like Hibernate, Spring and AspectJ. Few "enterprise" Java developers can imagine development without Hibernate and Spring. Although there are many more things you can do with AspectJ, it can be used to emulate implicits with type member declarations. However, even though using AspectJ is a more high-level way to solve the problem, it adds even more complexity, as it defines additional keywords and constructs.

If you're new to Scala, you don't lose much if you don't know how implicit conversions work, just like you don't need to know about the magic that happens behind the scenes when Hibernate persists objects or when Spring creates its proxies. Just as with bytecode generation, you're not advised to use this feature often, as it is difficult to use. Still, you'll be glad it exists, because someone will create a library which will make your life and the life of many developers so much easier.

Operator overloading

The line between operators and methods in Scala is blurred- you can use the symbols +, -, /, *, etc. as method names. In fact, that's exactly how arithmetic operators work in Scala- they are method invocations (relax, everything is optimized by the compiler).

Some people object that operator overloading adds unnecessary complexity, because they can be abused. Still, you can also abuse method naming in much the same way. For instance, some hapless folk can define methods with visually similar symbols, like method1, methodl and methodI. They can use inconsistent capitalization, like addJar or addJAR. One could use meaningless identifiers like ahgsl. Why should operator best practices be different than method naming best practices?

What is complicated is treating numeric types like ints and BigInteger differently. Not only that, but operations with BigInteger are very verbose and barely readable even with simple expressions. To compare, this is how a recursive definition of factorial looks like in Scala with BigInteger:


def factorial (x: BigInt): BigInt =
  if (x == 0) 1 else x * factorial(x - 1)

This is how it would look if Scala didn't support operator overloading:


def factorial (x: BigInteger): BigInteger =
  if (x == BigInteger.ZERO)
    BigInteger.ONE
  else
    x.multiply(factorial(x.subtract(BigInteger.ONE)))

Call by name

One of the proposals for Java 7 language extension was automatic resource management. This is one more rule to the language, which you need to remember. Without this feature, code is also unnecessarily complicated, because it forces you to remember to always close resources after using them- if you slip up, subtle bugs with leaking files or connections can result.

In Scala, it's easy to add language constructs like this. Using function blocks, which are evaluated only when they are invoked, one can emulate almost any language construct, including while, if, etc..

Existential types

Existential types are roughly an alternative to Java wildcards, only more powerful.

Martin Odersky: If Java had reified types and no raw types or wildcards, I don't think we would have that much use for existential types and I doubt they would be in Scala.

If Martin Odersky says that existential types wouldn't be in the language if it wasn't for Java compatibility, why would you even need to know about them? Mostly if you need to interoperate with Java generics.

Conclusion

Scala tries to define fewer language rules, which are however more universal. Many of these advanced features are not often used, but they pay off by allowing to create constructs, which in Java would require specific hardcoded additions to the language. In Scala, they can be defined simply as libraries.

Why does it matter that it's in the libraries, and not hardcoded in the language? You can more easily evolve and adapt these features, you can add your own extensions, and you can even disable some of the library parts or replace them.

The conclusion is that if a language is not designed to be extended, it will eventually develop features, which are not well-integrated and this language will collapse under the weight of its own complexity.

Finally, learning something so that you avoid a lot of routine error-prone operations reduces effort by increasing the level of abstraction, at the cost of additional complexity. When you were in school, it was complicated to learn multiplication, but if you got over it, it would save you from quite a bit of repetition than if you just used addition.

P.S. I realize it's not possible to resolve the issue once and for all which language is more complicated- Java or Scala- in a post or two. First of all, have in mind that simple is not the same as easy to use. There are also many topics which are open for discussion. I haven't touched on Scala traits; I haven't mentioned functions as first-class constructs compared to the Java 7 closure proposal; and there's a lot that can be said about how Scala obviates many Java design patterns. Extending the Scala syntax via compiler plugins is another interesting advanced topic.

I suppose someone could even write a blog post about these topics some day.

Monday, January 4, 2010

Is Scala more complicated than Java?

One Scala-related thread on Artima drew a lot of attention: "Is Scala really more complicated than Java?". This post really struck a nerve. Whoever claims that Scala is much more complicated than Java has clearly not seen a Java Programmer Certification in a while and is probably not using many new features since Java 5 came out.

What I'll try to prove in this post is not that Scala is not a complicated language. There are certainly many languages which are simpler. The core features which are used reasonably often are indeed a simplification over Java. Scala also has features which are more complicated than what Java has. However, the complicated Scala features are more specialized at extending the language while the complexity of Java is usually imposed on everyone including the beginner.

This post will also not try to describe the language features in exhaustive detail- that's what the language specification tries to achieve, and the blog post is already long enough. I will assume that you know about the core language rules or can easily look them up.

What is complexity? Many conflate it with unreadability, some say it's the opposite to ease of use. Let's start with the following definition of complexity: it's the many special exceptions (pun not intended) to the rules, making the whole system difficult to understand and remember.

Based on that definition, let's run a comparison of language features of Java and Scala.

Keywords

Java has more reserved words than Scala. Even when we remove the words of primitive types (boolean, int, float, etc.) Scala still has less keywords!

Scala misses some of Java's control structures

continue

break

if/then/else

for loop

for

Scala keywords not in Java

override

@Override

override

Access modifiers

Java has four access modifiers: default (package), private, public and protected. Scala has three, but it can combine them with a scope for protection. This flexibility allows you to define access control, for which Java has no analogs. They are also readable because they look self explanatory. For instance, if you have the class org.apache.esme.actor.UserActor, these are the Scala equivalents for Java's access modifiers:

private[actor]: Same as package access in Java
private[UserActor]: Same as private in Java

Scala's default access is public (yay, one less keyword!).

On the other hand, by defining the scope, Scala allows visibility types, which Java doesn't have:

private[this]: the member is accessible only from this instance. Not even other instances from the same class can access it
private[esme]: access for a package and subpackages. How many times did you have to use clumsy workarounds in Java because subpackages couldn't access the parent package?
protected: only subclasses can access the field. This is more consistent than Java's protected keyword. In Java, you have to remember that both subclasses and classes in the same package can access the field. How's that for remembering arbitrary rules?
private: This will not allow you to access inner objects. This is also more consistent than Java's private access, which is perhaps indicative of the fact that inner objects in Java were bolted on after the first version of the language was created

Namespaces

Java has four namespaces (fields, methods, packages, types), Scala has two (one for types, one for everything else). This has important implications when overriding. In Scala, you can start with a parameterless def (method definition) and then override with a val (immutable variable). This helps enforce the uniform access principle: whether you access a field or a method, this doesn't restrict you later on, because the syntax is the same everywhere you access it.

One thing which you cannot do is define a variable and a method with the same method. The "Programming in Scala" book explicitly mentions that allowing this would be a code smell because of possible code duplication and the ambiguity which could arise.

Types and the type system

Scala's type system has been criticized for being too complex, but let's have a look at Java's type system.

Primitive types

There are many exceptions in Java's type system, which make the language hard to use and evolve. Java has primitive types, while in Scala everything is an object, making this valid code:


(1 to 3) foreach println

Java's primitive types make a lot of things harder- before Java 5, manual wrapping in collections, e.g. int in Integer, boolean in Boolean, etc. was a pain. After auto-boxing and unboxing came out, the syntax is cleaner, but the overhead remains, as well as some very subtle bugs. For example, auto-unboxing a null wrapper type will cause a NullPointerException.

There is a lot of code duplication because of primitive types- you always have to specify special cases and can't be truly generic.

Generics

Scala has generics done right. For instance, you can define covariance at the declaration site, whereas Java requires you to do this at the call site. This, combined with Scala's type inference allows one to use generified libraries without having to know or define the complete type signature.

Java has yet another "special" type: arrays. As a result the rules for the underlying array type and the generified ArrayList are quite different and inconsistent. The type of arrays is checked at runtime, while the genericity of ArrayList is checked at compile-time. As a result, inappropriate assignment to an array element results in a ArrayStoreException only at runtime.

Constructors

Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors.

In Scala, any code which isn't part of a method or member variable declaration is executed as the primary constructor. You can define auxiliary constructors which call either the primary one or another auxiliary one defined in the same class before it. Can you come up with anything simpler?

Uniform syntax

Scala is sometimes accused of using too many symbols. Whatever you've seen, it's mostly not part of the language, but of the libraries. You can override them and even disable them. What does Java have to say about special symbols?

Arrays

Arrays in Java are accessed via square brackets. In Scala, parentheses are used, because accessing an array index is a method call (called apply). Don't worry though, the compiler optimizes this away.

Collection literals

Java has a lot of special syntax for array instantiation and will soon have ones for instantiating lists, sets and maps. Scala, again, creates these special collections using factory methods in the companion objects: Array(1,2,3), List(1,2,3), Map((1,2)). Lists can also be created using the cons "operator", but here's the trick: it's not actually an operator. It's a method, appending to the list. You can also create a map using the arrow tuple syntax: Map(1->2). And again, this is not "special" syntax, which is part of the language- it's a method, constructing a tuple!

Now someone might smirk and think: "Ha, gotcha! Do you mean to say that simply because you've pushed the complexity out of the language and into the libraries you don't need to deal with it?". True, but let's have a look at Java and its ever growing standard libraries. It has AWT and Swing. It has old I/O and new I/O (gosh, which one do I use?). It has RMI (do you remember RMI?) and OMG, it even has CORBA. These libraries will never die. Methods in Thread have been deprecated for ages. There's also no sign that the ill-conceived Date/Calendar classes will ever be removed, but you still must know JodaTime if you hope to get any job with dates done.

More importantly, extending the language easily helps abstract away the details and evolve the language without creating tons of special cases. As per our definition, special cases add up to increase complexity. We'll explore the topic of extending the language in the followup post.

Thursday, December 3, 2009

String interpolation in Scala

When I tried Scala for the first time, one of the first missing language features I noticed was string interpolation (you guessed it, I was using Ruby at the time). Of course, this is a small convenience rather than a major language feature and usually String.format and java.util.Formatter are pretty good substitutes. Still, string interpolation comes now and then in Scala discussions and one has to wonder if Scala's powerful language extension facilities can emulate a feature like this.

It turns out you can get reasonably close:

trait InterpolationContext {
  class InterpolatedString(val s: String) {

    def i = interpolate(s)
  }

  implicit def str2interp(s: String) = new InterpolatedString(s)

  def interpolate(s: String) = {
    val sb = new StringBuilder
    var prev = 0
    for (m <- "\\$\\{.+?\\}".r.findAllIn(s).matchData) {
      sb.append(s.substring(prev, m.start))
      val matchString = m.matched
      var identifier = matchString.substring(2, matchString.length - 1)
      try {
        val method = this.getClass.getMethod(identifier)
        sb.append(method.invoke(this))
      } catch {
        case _: NoSuchMethodException =>
        sb.append(matchString)
      }
      prev = m.end
    }
    sb.append(s.substring(prev, s.length))
    sb.toString
  }
}

object Test extends InterpolationContext {
  val a = "3"
  def main(args: Array[String]) {
    println("expanded: ${a}; not expanded: ${b}; more text".i)
  }
}

How does this work? If you call the i method on a String, it will force implicit conversion to a InterpolatedString, similar to how the r method works for converting to a Regex. The interpolate method uses reflection to find a method with the name equal to the identifier delimited by "${" and "}". This works both for vals and defs due to Scala's adhering to the uniform access principle. If the delimited string is not a valid identifier or a parameterless method with this name doesn't exist, it is not substituted, but stays as it is.

How do you use it? Just extend or mix in the InterpolationContext trait in the class where you want this to work, and then call i on the Strings where you want interpolation to work. The InterpolationContext trait serves two purposes- first of all, it imports the implicit conversion to the interpolated string, and second it provides access to the methods of the current object via reflection.

The limitations of this method are that interpolation only works on member vals and defs of the current object only. I rather like this, because you know you can't accidentally construct a String from user input in an html form like "Name: ${System.exec(\"echo echo pwned >> .bashrc\")}". Also, interpolation doesn't work for private members as well as local variables. Finally, you have to both mix in or extend the trait and call a method on every String (even though it's a one-letter method). This is not too bad, because you can control the scope where interpolation works and therefore avoid and debug nasty surprises easier.

I don't see this as a pattern, which can be used in real-life projects, and I don't see the use of implicits here worthy of emulation. Nevertheless, trying to copy features from other languages is instructive about the limitations of your current language of choice and the tradeoffs you get for these features. Last but not least- even if it's a gimmick feature, implementing it was fun.

Monday, November 23, 2009

Error detection with pattern matching

Scala's pattern matching is one of the most powerful features in the language. It not only helps write concise and very readable code, but also helps prevent trivial errors.

Let's say you want to save a line or two of code when comparing for some empty data structure like a List. You decide to use comparison for equality:


val items = Nil
if (items == Nil) println("No items")

Then you decide to refactor and turn the items collection into a Set. The Scala compiler is clever and should give you a warning, right? Well, not quite:


val items = Set()
if (items == Nil) println("No items")

This results in nothing printed, as the expression evaluates to false. Of course, given the types it's perfectly clear at compile-time that this will never be true. Can't the compiler give you a hint? Indeed, it will if you use pattern matching:


items match {
  case Nil => println("No items")
  case _ =>
}

This will result in the following error message:


error: pattern type is incompatible with expected type;
 found   : object Nil
 required: scala.collection.immutable.Set[Nothing]
         case Nil => println("No items")
              ^

The examples are contrived (unparameterized Nil and Set?), but you get the point. Yes, Nil can never be a value of the Set type. You might think this is fairly obvious, but pattern matching can be a life saver when implicit conversions are involved. Consider this:


"heh".reverse == "heh"

What does this evaluate to? This should be obvious, right? But the value is false! If you used pattern matching, you would easily see why:


"heh".reverse match {
  case "heh" => "obvious?"
}

This will make the compiler very nervous, and this is the reason why:


error: type mismatch;
 found   : java.lang.String("heh")
 required: scala.runtime.RichString
         case "heh" => "obvious?"
              ^

So reverse converts the String to the wrapper RichString, which is not the same type as String.

I have had similar problems detecting a bug where I was checking for equality with None a variable which was of type net.liftweb.common.Box (a type very similar conceptually to Scala's built-in Option).

This made me adopt a general rule to prefer pattern matching rather than equality comparison. The bugs it catches are sometimes subtle and hard to see, and that's exactly what Scala's rich static type checker tries to avoid. Use it to your advantage.

Since we're talking about bugs caught by pattern matching, there's one subtle bug, which is often (though not always) caught by the compiler (another contrived example follows):


val items = Set()
Set() match {
  case items => println("empty")
  case _ => println("full")
}

This will result in an error, which looks a little bit unusual to the newbie:


error: unreachable code
         case _ => println("full")
                   ^

This error is usually crying out loud: hey, you're inadvertently using variable binding instead of constant matching! The newly bound variable items shadows the existing variable with the same name and all other cases after it will never match.

One way to fix it is to use backticks to prevent the name to be bound to a new variable:


Set() match {
  case `items` => println("empty")
  case _ => println("full")
}

As a rule of thumb it is advised to use CapitalLetters for case classes and constants which you intend to pattern match.

This error wouldn't have occurred if you used equality comparison in the first place, but even in mildly complex cases pattern matching trumps plain equality checking in readability and detecting errors. Apparently, there are cases where pattern matching fails (for instance, matching structural types), so there's still no reason to deprecate good old "==". But there are many more errors, which pattern matching catches, like checking if the match is exhaustive. So there's no point in saving a couple of characters but lose the type safety you expect from Scala.

Wednesday, November 4, 2009

Embedded Scala interpreter

The Scala interpreter is proof that a statically typed language can have features most people only expect from a dynamically-typed language. One of the cool uses of an interpreter is embedding it within your application. This allows you to conveniently experiment with Scala and probably even interact with object instances in your running system. I won't explain how the interpreter works here, but I will try to show you a simple way of embedding the interpreter.

As it usually happens, someone beat me to it. Josh Suereth explains in great detail how to embed an interpreter, but he has done so many customizations that his solution would probably fit on several pages.

I wanted a simpler solution which one could understand at a glance. The code for Lord of the REPLs is much shorter although it doesn't customize much of what the standard interpreter offers.

I tried to come up with the shortest working version you could possibly get away with. Provided I create the settings properly, this is what I could muster:


val out = new java.io.StringWriter()
val interpreter = new scala.tools.nsc.Interpreter(settings, new PrintWriter(out))
interpreter.interpret(code)

Not too much code, is it? (Half of it is probably due to the full package names). Now you could collect your output from the "out" stream and probably convert to String if you need using "out.toString".

Not so fast, though. I said this works if I have the appropriate settings:


val settings = new scala.tools.nsc.Settings()

The problem here is that the interpreter doesn't find two of the crucial jars needed for its proper functioning: scala-compiler.jar and scala-library.jar. When it doesn't it spits out the following error:


scala.tools.nsc.FatalError: object scala not found.

Thanks to the following discussion by Eric Torreborre (author of Specs) I managed to find out that one needs to add to the bootclasspath of the settings object:


val origBootclasspath = settings.bootclasspath.value
val pathList = List(jarPathOfClass(compilerPath),
                         jarPathOfClass(libPath))
settings.bootclasspath.value = (origBootclasspath :: pathList).mkString(java.io.File.separator)

One could hardcode the path to these two jars, but that's not too flexible. If we want to do it right, we might create a function which discovers the path to the jar from the name of a class that's in it:


def jarPathOfClass(className: String) = {
  Class.forName(className).getProtectionDomain.getCodeSource.getLocation
}

Now you could find the paths to these jars like this:


val compilerPath = jarPathOfClass("scala.tools.nsc.Interpreter")
val libPath = jarPathOfClass("scala.ScalaObject")

I've read that getProtectionDomain.getCodeSource returns null in some classloaders and might have problems specifically with OSGi. In that case, one might need to resort to the following hack:


def jarPathOfClass(className: String) = {
  val resource = className.split('.').mkString("/", "/", ".class")
  val path = getClass.getResource(resource).getPath
  val indexOfFile = path.indexOf("file:")
  val indexOfSeparator = path.lastIndexOf('!')
  path.substring(indexOfFile, indexOfSeparator)
}

With the last ugly piece of code creating an interpreter is not so concise anymore, but sometimes you can't be both robust and concise.

If you want to see the above snippets assembled in one piece you can check out Apache ESME's source code for the ScalaInterpreter action.

Warning: interpreting code directly in your application is a huge security risk and might not always be a good idea.

Tuesday, October 13, 2009

Scala closures as poor man's mocks

Groovy has this feature that you can use a closure whenever you need an interface with one method only. A class implementing the interface is automatically created, and the closure provides the implementation of this single method. This process is called closure coercion and is particularly convenient to make tests readable and concise.

I'm not yet sure about the relative advantages of such code everywhere, since there might be ambiguities or the intent might be obscured. Tests usually contain a lot of boilerplate, though, so I'm all for making them more concise. Except for readability, people would be more likely to write a test if it doesn't take much effort to create yet another trivial mock.

Who tests your tests though? You don't need to answer. Still, the page about implementing interfaces in Groovy warns of this trap: "Be careful that you don't accidentally define the map with { }". Type safety could help sometimes when writing tests. Or at least that's a very good excuse to try to emulate this Groovy feature in Scala.

In Scala there are implicit conversions, which help define a controlled way to coerce a value to a different type. We cannot automatically convert all closures to all possible single-method interfaces in scope, but we can choose several we are using especially often (as it happens in tests). For instance, if we want to use a mock of java.lang.Readable often enough, we could define the following implicit conversion:


import java.nio.CharBuffer

implicit def fun2readable(f: Function1[CharBuffer,Int]): Readable =
  new Readable { def read(cb: CharBuffer) = f(cb) }

So now everywhere we need a Readable, we might just drop in the appropriate closure instead (technically, this is not a mock, but a stub):


val readable: Readable = {cb: CharBuffer => cb.put("12 34"); 5}
val s = new java.util.Scanner(readable)
assert (s.nextInt == 12)

Compare this with the Groovy version:


def readable = { it.put("12 34"); 5 } as Readable
def s = new Scanner(readable)
assert s.nextInt() == 12

In the Scala version, we are explicitly defining the type of the variable in order to force the implicit conversion. In methods, where Readable is expected as an argument, explicitly naming the type will not be necessary, whereas in Groovy you always need to coerce using "as Readable":


val s = new java.util.Scanner({cb: CharBuffer => cb.put("12 34"); 5})

So in Scala you're trading some verbosity up front for some conciseness in all of the code where the implicit conversion is visible. Furthermore, the type checker guarantees that only closures with the correct signature are converted to the interface in question (i.e. with an input of CharBuffer and output of Int) and will flag a compile-time error otherwise.

The reader might be tempted to also emulate map coercion, but beware of an unexpected difficulty: it's not easy to define both a closure (with one argument) and map coercion. Since a Map in scala is a function and due to type erasure, the compiler won't know whether to substitute the Map or the Function!

Can you generate these implicits automatically without defining them? Maybe with a Scala compiler plugin, but then you lose some of the transparency of knowing which implicits you've defined.