Thursday, October 21, 2010

Why Scala's Option won't save you from lack of experience

Some time ago Cedric Beust was the cause for some excitement in the Scala community by declaring that he doesn't see the advantages of using Scala's Option type, which is also similar to Haskell's Maybe type.

There were a lot of insightful comments which outlined the benefits of using Option. James Iry has written a well-reasoned post called Why Scala's "Option" and Haskell's "Maybe" types will save you from null.

I wanted to approach things differently. I wanted to show people some patterns which usually come with experience and let them decide which is better. That's why this turned into a very long post which looks more like a tutorial. Of course, I'm not adding anything new to the discussion, but just summarizing some of the common wisdom accumulated by the community. I'm sure this is not the last typical blog post showing the wonders of Option, but I hope it can clear up some (Some?) misunderstandings. Or Maybe not.

But more importantly, I wanted to explain why having a solution as a language feature is a premature optimization. It's neither as flexible, nor as powerful as having it as just a type in the library.

NullPointerException


Cedric is definitely not alone- programmers who decide to give Scala a try (and moreso Haskell or the ML family) face conceptual differences from popular imperative languages. Just reading that Option is a type wrapper does not mean it's easy to wrap one's head around it.

One advantage of using Scala is that if you are convinced that NullPointerExceptions solve your problem better, you are free to use it. Option is just one option. And of course, you can come back later any time if you make up your mind that Option has Some advantages (e.g. composability). Of course, some might view having too many choices as a disadvantage.

But both Scala and Clojure must live with the design decisions on the JVM which were taken before them, and with interoperating with a wealth of existing libraries. So allowing null is first of all a practical decision.


Getting started


First of all, let's define a simplistic employee type, a list of employees, and a map where the "builder" occupation points to the list of employees.

case class Person(name: String, email: String)
val employees = List(Person("bob", "bob@builder.com"))
val occupations = Map("builder" -> employees)



Compile-time safety and backward compatibility


One of the examples given by Cedric is pattern matching on an Option type. His main point of contention is that this looks very much like testing for null. Except one minor point: where is the example similar to the case when you don't want to test for null?

Exactly. No such valid example exists if you want to use the value inside Option, at least not unless you're explicit about it. As Paul Snively mentions, the compiler will stop you. Cedric has noticed, "the worst part about this example is that it forces me to deal with the null case right here". But this is not the worst part, it's maybe the best part.

Do you remember a feature which was added to Java 5, which was intended to save you from another type of exception, ClassCastException? Of course, that would be generics! The problem is, it gives you type-safety, but only as long as you use classes compiled with the Java 5 compiler and you don't use the escape hatch of raw types. As soon as you start using legacy code, you leave the safety of compiler checked code. You can ignore the warnings at your own risk, because then there's no guarantee you won't get a ClassCastException.

Does this remind you of something? Compile-time safety as long as you don't use legacy code or the escape hatch? These restrictions sound exactly like the ones Option has.

And of course, there is an escape hatch. You can use Option.get or instead of pattern matching, you can even use your old friend the if statement (Scala veterans, please close your eyes now):


val evangelists = occupations.get("evangelist")
// ugly, ugly, ugly
if (evangelists == None)
println("No such occupation here!")
else
println("Found occupation " + evangelists.get)

But, as you'll see later, this doesn't mean that you have to deal with the "no value" case right here. Pattern matching is not the only option.


Simplification through a generalization


Instead of solving the most obvious problem, it always pays out to see if it isn't a manifestation of a bigger class of problems. Having a related class of problems solved by a common pattern simplifies things. There are fewer rules to remember. Not only that, but the specific applications of the general solution begin to interact in ways you couldn't have anticipated before. Eventually problems will appear which would be solved by a general solution, problems which you didn't know about when you implemented the solution.

As it turns out, the problem of syntax similar to the safe dereference operator can be solved in Scala. I would say that having no explicit syntax for this, it's a fairly elegant solution, but this is subjective opinion.


Handling both value and lack of value and stop processing


This is handled by our well-known pattern match. It seems easy to use and obvious in what it does.

The advantage of pattern matching is that it uses the type system in such a way that forgetting to handle one of the cases explicitly will result in a compile time warning.

occupations.get("builder") match {
case Some(_) => println("builder occupation exists")
// oops, forgot to check for None or the catch-all _
}
// warning: match is not exhaustive!
// missing combination None


The disadvantage is that pattern matching doesn't compose very elegantly. If the result of the pattern match is just an intermediate step, you'll need to add another one, and another, and pattern matching does take some screen real estate.

Pattern matching is a bit like exception handling with try/catch blocks- you usually do it when you're interested in both the normal behaviour and the exceptional behaviour and that's fairly verbose. On a related note, did you know that you can use pattern matching in Scala's exception handlers?

So let's see what we can do to get more composable data processing.


Transform value


When we're interested in creating a series of steps for processing a value, we can use map. It will transform the value if it's there, but will leave it inside the Option. And map won't change the Option if it's empty (None.map will result in None).


val employee = employees find ( _.name == "bob" )
// Some(Person(bob,bob@builder.com))
employee map ( _.email )
// Some(bob@builder.com)



If you need to "flatten" the result you can use flatMap. This means that instead of an Option nested inside an Option, you get just one Option. It only results in Some (a "full" Option type) if it's called on Some and also results in Some:


val builders = occupations.get("builder")
// Some(List(Person(bob,bob@builder.com)))
val bobTheBuilder = builders flatMap { _ find ( _.name == "bob" ) }
// Some(Person(bob,bob@builder.com))


If you were using just map, you would get Some(Some(Person(bob,bob@builder.com))), which is probably a bit too nested for your taste.

Some of you are probably familiar with other languages which have map (like Ruby or Python) and are scratching their heads: "Wait, wasn't map defined only for lists/Enumerables?". Please be patient.


Only get the value if it satisfies a test


If you find only some of the possible values useful, you can weed out what you have by using filter. It will only result in Some for values which satisfy a certain condition (called a predicate).

bobTheBuilder filter { _.email endsWith "builder.com"}
// Some(Person(bob,bob@builder.com))


I'm sure at this point the folks who have used Google Collections have also joined the folks with past Ruby or Python experience screaming: "Hey, but filter is only used for Collections!"


Transform lack of value


That's fine, but eventually you want to get the value out. If there's no value, just assume some default value. We have to use pattern matching again, right?

But there is a shorter solution. getOrElse extracts the value or puts a default value of the same type if there's nothing in the Option container:


val larryWho = employees find ( _.name == "larry" )
// None
val emptyEmail = larryWho map ( _.email )

// None
emptyEmail.getOrElse("nobody@nowhere.com")
// nobody@nowhere.com


Groovy has this in the form of the Elvis operator. The trouble is, you can't get rid of the elvis operator, it's just adding cruft to the language, even though it's a useful one. It's also somewhat restrictive that it's all this operator can do.


Chain, chain, chain


The reason map, flatMap, filter and getOrElse are so useful is that they can be chained together, intermixed and the results can be passed around to other methods.

Let's shift to high gear and put it all together:

occupations.get("builder").
flatMap { _ find ( _.name == "bob" ) }.
map (_.email).
filter { _ endsWith "builder.com"}.
getOrElse("nobody@nowhere.com")
// bob@builder.com


If we're not yet interested in which step processing has failed, this is a clear way to express the process flow. It's also similar to the Fantom example Cedric desribed.

There is an ever shorter syntax for this using for expressions (or for comprehensions).


{for (builders <- occupations.get("builder");
bobTheBuilder <- builders find (_.name == "bob");
email = bobTheBuilder.email if email endsWith "builder.com"
) yield email
} getOrElse "nobody@nowhere.com"
// bob@builder.com


But wait, weren't for expressions a way to loop over stuff? Well, yes, this too. More generally, for expressions work with collections. And the beauty of it all is that we can use collections together with Option and do nested invocations. Let's modify the example a bit and suppose that there might be more than one person named Bob and we want them all.

for (builders <- occupations.get("builder") toList;
bobTheBuilder <- builders if bobTheBuilder.name == "bob";
email = bobTheBuilder.email if email endsWith "builder.com"
) yield email
// List(bob@builder.com)


Because for all practical purposes, Option behaves like a specialized collection. By viewing it as one, you reuse the experience of all the programmers using Groovy, Ruby, Python, Google Collections and whatnot, and flatten the learning curve.

Now imagine that the only way to work with a collection is to pattern match it. Would you use it? Yeah, me neither.


Safe invoke and composability


Now let's see how filter works in Fantom:

fansh> list := [1, 2, null]
fansh> list.findAll |v| { v.isEven }
sys::NullErr: java.lang.NullPointerException

Oh crap, then I need to to use the safe invoke operator:


fansh> list.findAll |v| { v?.isEven }
ERROR(20): Cannot return 'sys::Bool?' as 'sys::Bool'

But it all results in a compile-time error. It's the same story with the reduce higher-order function:

fansh> list.reduce(0) |r, v| { v + r }
sys::NullErr: java.lang.NullPointerException

So I can't practically use the safe invoke operator in nullable collections with filter/reduce, which the Fantom documentation has conveniently omitted from the documentation page. So we're back to checking for null the old way. This means that unlike flatMap, the safe invoke works fine when you chain, but not when you compose.


Iterator


Let's now see some other advantages of Option behaving like a collection. For instance, it lets you use Iterable's API in some elegant ways:

val noVal: Option[Int] = None
val someVal = Some(4)
List(1,2,3) ++ someVal ++ noVal
// List(1, 2, 3, 4)


Guess what happens here? Only the numbers contained in Some are added to the list. I think there's no operator for this in Fantom and Groovy, and it would be overkill to include one, too.


One size doesn't fit all


Wait, if Option is like a collection, does this mean that there are many types of Option? Does this mean I can create my own Option?

Yes, and yes. Just as there isn't just one type of List, or one type of Map, there can also be several types of Option. For instance, Lift defines its own, which it currently calls Box (I think it's a great metaphor). One of the things Box has in addition to Option is a type to collect Failures. It's no longer just a "dunno what happened, something failed along the way". It's a list of error messages which can pinpoint exactly what went wrong. This is invaluable for a web framework, because when a user expects a complex form to be validated, a simple "some of our input is wrong" just won't cut it.


for {
id <- S.param("id") ?~ "id param missing" ~> 401
u <- User.find(id) ?~ "User not found"
} yield u.toXml


And guess what, you can also define your own operators, which also work in for comprehensions.

for {
login <- get("/account/verify_credentials.xml", httpClient, Nil)
!@ "Failed to log in"
message <- login.post("/statuses/update.xml", "status" -> "test_msg1")

!@ "Couldn't post message"
xml <- message.xml
} yield xml


Except that they're not operators. When conventions and patterns evolve, Lift folks can always change the "operator". Or another framework can do it.

Another good example of using an enhanced Option is Josh Suereth's (jsuereth) Scala ARM library. It's collecting a list of errors, and Java's safe resource blocks proposed for Java 7 looks primitive in comparison.


Option is not only Scala's to have



Actually, there's nothing specific about Option that ties it to Scala. The only thing which is Scala specific is the syntax sugar of for comprehensions. You can use Option in Java if you want, although some of the examples above wouldn't be as concise and so it might be a bit of a pain. But this doesn't stop people from trying to recreate Maybe in Java.

Many folks have even tried to cheat and use Java's enhanced for expression as syntax sugar. So if Java can afford some syntax sugar over Iterator, and according to Joshua Bloch the for expression is a clear win, why shouldn't Scala do it? The difference is only that Scala's is applicable to a wider set of problems.


Why language syntax won't save you from the future


One advantage which some people don't realize Scala has is that it's a relatively minimal language with a relatively rich library. Apart from Option, there are other examples where having a library instead of a language feature has brought huge benefits to Scala. One such example is actors.

Let's compare Scala's actors to Erlang's. Undoubtedly Erlang is the daddy of practical actor implementations. Actors in the Erlang virtual machine have some superb characteristics which other runtime implementations will have a hard time catching up with. They're scalable and lightweight. They work across hosts and in the same virtual machine. They can be hot swapped and you can create millions of them in a single virtual machine.

But there's only one type of actor. This means it must deal with all possible cases, and as it usually happens, it deals better with some and worse with others. I have no doubt that having actors as part of the language, choosing only one type of actor is a very sensible decision, but it can still be restrictive sometimes.

Scala deals with this differently. Scala's actors are not part of the language, and the actor message send syntax (which is borrowed from Erlang) is just a method invocation in a library. This means that Scala's free to evolve different actor implementation, and you're free to choose the one which suits your case better. Some are more full-featured, some are lightweight and performant; some are remote, some are local; some use a thread pools, some use a single scheduler; some use managed hierarchies, some don't. Regarding actors Scala the language is smaller than Erlang, but the Scala libraries are richer.

Which actor library will win? I don't know. And probably neither do you. There might not be one best answer. That's why hardcoding stuff in the language is not a good way to prepare for the future. Only experience will, either the collective experience of the community or the extensive experience of a genius Benevolent Dictator For Life.

6 comments:

Ittay Dror said...

Note that Option is also good for when null is not an option ( ;-) ). For example, if your method returns one of the value classes (e.g., Int). In Java, you might try to return an unreasonable value, or box the value, both are ugly

Adam Rabung said...

Great article. One thing isn't making sense to me, however. The 'employees' collection that is shown a couple of times is of type List[Employee], not List[Option[Employee]], right?
1. Doesn't that mean find/filter will NPE if your list contains nulls? And, isn't 'employees' having nulls the premise of the example?

2. Wouldn't it be much more "idiomatic" to have List[Option[Employee]]?

Vassil Dichev said...

@Ittay:
Right, see Adam's followup: http://squirrelsewer.blogspot.com/2010/10/scalas-option-will-save-you-from-most.html

@Adam:
The employees collection is a List[Person]. The point of the example was to mirror to a certain degree Cedric's example, where the list variable itself could be null or an attempt to find an employee from this list could be null.

I've read your article and it offers an interesting way of classifying NullPointerException, but note that a type system with no escape hatch like Haskell's will not allow unintended NPEs as they will be caught at compile-time.

In this sense the employees variable is not intended to have nulls (or rather, None in this example).

I would say that having a List[Option[Employee]] only makes sense if the employee position in the list matters, otherwise why not just make a list so that values would only be added if they're not None? Similar to the example "List(1, 2, 3) ++ Some(4) ++ None".

A more likely operation where the value might be missing is getting a value out of a Map or finding a value in a list.

I also think it's more common to have a List[Option[_]] as a result of mapping over some other list, where the function doesn't necessarily return a meaningful value.

I hope that answers the question, but of course it depends on the use case.

Adam Rabung said...

Good point about the collection being optional rather than the members. I think I struggle more with navigating collections of Option, which is probably why I fixated on that use case. Thanks for a great post.

Tony Morris said...

Hello Vassil,
The blog posts you mention regarding Java are for demonstration only. The Option data type for Java is in wide use, mostly by people who are forced to use Java, by others who don't know any better.

http://functionaljava.googlecode.com/svn/artifacts/3.0/javadoc/fj/data/Option.html

Good luck debunking Cedric, but it can be a time-consuming (often with no reward) task.

Vassil Dichev said...

Tony, thanks for reading my blog and thank you for your informative posts, as well as your contributions to FunctionalJava and Scalaz- I learned a lot from them, and I hope others will.

As for debunking Cedric- that was not my primary goal. I wanted to collect bits of information in something like a cookbook or reference for how Option can be used, with references to different useful blog posts. The audience is definitely not only Cedric, so if someone else finds this useful, this would be rewarding enough.

Cedric has a point that this is one of the fundamental issues which folks have to come to grips with when they start learning Scala, so it's not a bad thing that there's a lot of awareness lately about the Option/Maybe types.