Monday, January 4, 2010

Is Scala more complicated than Java?

One Scala-related thread on Artima drew a lot of attention: "Is Scala really more complicated than Java?". This post really struck a nerve. Whoever claims that Scala is much more complicated than Java has clearly not seen a Java Programmer Certification in a while and is probably not using many new features since Java 5 came out.

What I'll try to prove in this post is not that Scala is not a complicated language. There are certainly many languages which are simpler. The core features which are used reasonably often are indeed a simplification over Java. Scala also has features which are more complicated than what Java has. However, the complicated Scala features are more specialized at extending the language while the complexity of Java is usually imposed on everyone including the beginner.

This post will also not try to describe the language features in exhaustive detail- that's what the language specification tries to achieve, and the blog post is already long enough. I will assume that you know about the core language rules or can easily look them up.

What is complexity? Many conflate it with unreadability, some say it's the opposite to ease of use. Let's start with the following definition of complexity: it's the many special exceptions (pun not intended) to the rules, making the whole system difficult to understand and remember.

Based on that definition, let's run a comparison of language features of Java and Scala.

Keywords



Java has more reserved words than Scala. Even when we remove the words of primitive types (boolean, int, float, etc.) Scala still has less keywords!


  • Scala misses some of Java's control structures
  • Yes, continue and break (well, at least until Scala 2.8) are not part of the language, as they are not deemed a best-practice way to solve programming problems.
  • if/then/else
  • in Scala returns a value, thus eliminating the need for the only ternary operator in Java, "?:", the use of which has long been discouraged.
  • for loop
  • Java folks discovered late in the game that the enhanced for loop is much less complicated to use in cases when you don't need the counter index. And they were right- it's one more detail which you (or at least newbies) can get wrong. But why stop there- Scala has a much more universal for loop, and there aren't two different syntaxes as in Java.
  • Scala keywords not in Java
  • one might argue that the override keyword in Scala complicates things as you might do the same thing in Java with an @Override annotation. That's not quite the case, as you still might override a method by accident and forget to put the annotation (as it's not mandatory), and then the compiler will not give as much as a warning! So that's one more special case you need to worry about and keep in your head. When you start using traits, you definitely start to appreciate that override is a keyword.


Access modifiers



Java has four access modifiers: default (package), private, public and protected. Scala has three, but it can combine them with a scope for protection. This flexibility allows you to define access control, for which Java has no analogs. They are also readable because they look self explanatory. For instance, if you have the class org.apache.esme.actor.UserActor, these are the Scala equivalents for Java's access modifiers:


private[actor]
Same as package access in Java

private[UserActor]
Same as private in Java



Scala's default access is public (yay, one less keyword!).

On the other hand, by defining the scope, Scala allows visibility types, which Java doesn't have:


private[this]
the member is accessible only from this instance. Not even other instances from the same class can access it

private[esme]
access for a package and subpackages. How many times did you have to use clumsy workarounds in Java because subpackages couldn't access the parent package?

protected
only subclasses can access the field. This is more consistent than Java's protected keyword. In Java, you have to remember that both subclasses and classes in the same package can access the field. How's that for remembering arbitrary rules?

private
This will not allow you to access inner objects. This is also more consistent than Java's private access, which is perhaps indicative of the fact that inner objects in Java were bolted on after the first version of the language was created



Namespaces



Java has four namespaces (fields, methods, packages, types), Scala has two (one for types, one for everything else). This has important implications when overriding. In Scala, you can start with a parameterless def (method definition) and then override with a val (immutable variable). This helps enforce the uniform access principle: whether you access a field or a method, this doesn't restrict you later on, because the syntax is the same everywhere you access it.

One thing which you cannot do is define a variable and a method with the same method. The "Programming in Scala" book explicitly mentions that allowing this would be a code smell because of possible code duplication and the ambiguity which could arise.

Types and the type system



Scala's type system has been criticized for being too complex, but let's have a look at Java's type system.

Primitive types


There are many exceptions in Java's type system, which make the language hard to use and evolve. Java has primitive types, while in Scala everything is an object, making this valid code:


(1 to 3) foreach println


Java's primitive types make a lot of things harder- before Java 5, manual wrapping in collections, e.g. int in Integer, boolean in Boolean, etc. was a pain. After auto-boxing and unboxing came out, the syntax is cleaner, but the overhead remains, as well as some very subtle bugs. For example, auto-unboxing a null wrapper type will cause a NullPointerException.

There is a lot of code duplication because of primitive types- you always have to specify special cases and can't be truly generic.

Generics


Scala has generics done right. For instance, you can define covariance at the declaration site, whereas Java requires you to do this at the call site. This, combined with Scala's type inference allows one to use generified libraries without having to know or define the complete type signature.

Java has yet another "special" type: arrays. As a result the rules for the underlying array type and the generified ArrayList are quite different and inconsistent. The type of arrays is checked at runtime, while the genericity of ArrayList is checked at compile-time. As a result, inappropriate assignment to an array element results in a ArrayStoreException only at runtime.

Constructors



Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors.

In Scala, any code which isn't part of a method or member variable declaration is executed as the primary constructor. You can define auxiliary constructors which call either the primary one or another auxiliary one defined in the same class before it. Can you come up with anything simpler?

Uniform syntax



Scala is sometimes accused of using too many symbols. Whatever you've seen, it's mostly not part of the language, but of the libraries. You can override them and even disable them. What does Java have to say about special symbols?

Arrays


Arrays in Java are accessed via square brackets. In Scala, parentheses are used, because accessing an array index is a method call (called apply). Don't worry though, the compiler optimizes this away.

Collection literals


Java has a lot of special syntax for array instantiation and will soon have ones for instantiating lists, sets and maps. Scala, again, creates these special collections using factory methods in the companion objects: Array(1,2,3), List(1,2,3), Map((1,2)). Lists can also be created using the cons "operator", but here's the trick: it's not actually an operator. It's a method, appending to the list. You can also create a map using the arrow tuple syntax: Map(1->2). And again, this is not "special" syntax, which is part of the language- it's a method, constructing a tuple!

Now someone might smirk and think: "Ha, gotcha! Do you mean to say that simply because you've pushed the complexity out of the language and into the libraries you don't need to deal with it?". True, but let's have a look at Java and its ever growing standard libraries. It has AWT and Swing. It has old I/O and new I/O (gosh, which one do I use?). It has RMI (do you remember RMI?) and OMG, it even has CORBA. These libraries will never die. Methods in Thread have been deprecated for ages. There's also no sign that the ill-conceived Date/Calendar classes will ever be removed, but you still must know JodaTime if you hope to get any job with dates done.

More importantly, extending the language easily helps abstract away the details and evolve the language without creating tons of special cases. As per our definition, special cases add up to increase complexity. We'll explore the topic of extending the language in the followup post.

26 comments:

Anonymous said...

Great post, but I noticed a couple of mistakes:

- If you actually count the reserved words from both of the links you posted, you'll find that Scala has MORE keywords (52 vs. Java's 50) and that's including Java's primitives.

- (nitpicking mode) Java has 5 access modifiers, not 4 (you forgot private)

Michael Neale said...

"and OMG, it even has CORBA"

- brilliant - I assume that pun was intended ;)

Nice writeup, thanks !

Daniel said...

@villane That Java list did not include operators and literals -- such as false, true and null -- which are keywords all the same. That's why it is smaller. And, then, there are the "special characters". In Java, * has, in certain contexts, similar meaning to Scala's "_". Java's @ has the same meaning as Scala's. These special symbols weren't listed either.

That's mostly because Java considers operators to be operators, literals to be literals and special symbols to be syntax. And then there are keywords and identifiers and numbers.

On the other hand, because Scala allows those special characters to be identifiers, it has to list those that are reserved.

Vassil Dichev said...

@villane As Daniel mentioned, you are obviously counting the reserved *symbols* in Scala along with the reserved *words*. Counting =, : and @ in Scala, but not in Java isn't fair. If you count the operators in Java and Scala, the difference will be even more pronounced in Scala's favor, since arithmetic "operators" as +,-,/,* are actually methods in Scala, as well as <,>, >=, etc.

Also, Java does have 4 access modifiers, I've written "package" instead of "private" (now corrected). The default *is* package access (means I listed the same modifier twice).

Anonymous said...

Oops... I forgot that package-private is actually default.

I'm not sure I agree about the operators though: you aren't exactly comparing two very different numbers anyway. 52 is already a pretty large number of keywords IMHO. Not that I'm saying there should be less, but for one example 'match' doesn't really need to be a keyword, could be a method on Any (AFAIK)

However, if you would count all the Unicode symbols that can't be used in names in Java (I don't know what those limits are exactly, though), Scala surely has a big advantage here.

Vassil Dichev said...

@villane Maybe 52 keywords is too many- there are languages with none, or close to zero. But I'm arguing here only that Scala is (in many ways) less complicated specifically than Java, not any language.

Anyway, what Daniel and I were trying to say is that just counting the lists from the links I've posted is comparing fruits to oranges. The Java list includes only keywords (50). The Scala list includes keywords, literals and operators (52)- the line between operators and keywords is a bit blurred in Scala. If you're only going to compare keywords, then Scala has 40, against 50 in Java. If you're going to count everything, then the link I've posted indicates that adding reserved words and literals makes the count 55 for Java, and that's not including operators. When we also include operators- 37 of them- then it's 92 against Scala's 52. Actually, make it 93 for Java, because the @ symbol for annotations wasn't listed as an operator in the specifiication.

And note that the operators I mentioned are not just "all the Unicode symbols that can't be used in names in Java". They're all combinations of ASCII characters, whose meaning is hardcoded in Java the language (not the libraries). In Scala, not only can you use these symbols in identifiers- you can redefine these exact same operators as methods/classes, and most of these are a part of the library. Stay tuned for my next post, where I plan to give a detailed explanation.

By the way, there's a curious fact: if you check the changelog, Scala's match operator was indeed a method in Scala pre-2.0. I think the reason for making it an operator is described in the change notes: "flexible typing [...] over heterogeneous class hierarchies". I didn't use Scala at that time yet, but perhaps this blog post can shed some light on the topic.

Oh, and pattern matching is another thing I intend to mention in my next post (phew, seems it will be longer even than this one).

serberoth said...

Not to be a hater, what you have here is a very nice comparison between the two languages. Yet I find myself not convinced that Scala is any less complicated than Java. Perhaps the complications are of a different nature.

Artur Biesiadowski said...

Let's look at Brainfuck (http://en.wikipedia.org/wiki/Brainfuck) and Malbolge (http://esolangs.org/wiki/Malbolge)
according to your criterias

Keywords:
8 in BF, 8 in Malbolge, 50+ in Scala.

BF and Malbolge are simplier.

Access modifiers:
Not existing in BF nor Malboldge, so easier there

Namespaces:
ditto

Primitive types:
Only integers in BF and Malboldge, no distinction between primitive and non-primitive, so again a lot easier.

Generics:
Clear win here for BF and Malboldge, as they do not have any generics, so no complication.

etc.

Both BF and Malboldge are WAY simplier that Scala (and of course Java) according to your rules. I will even agree with that statement.

Maybe you should ask yourself if creating or reading programs in language is easier/simplier instead of focusing if language itself is simple? In such case, 90% of your arguments are misplaced.

Take a look at
http://apocalisp.wordpress.com/2009/08/28/a-scala-puzzler/

with it's

trait S extends Lambda { type a[X <: Lambda] = Lambda { type a[Y <: Lambda] = Lambda { type a[Z] = X#a[Z]#a[Y#a[Z]] }} }

Most java puzzlers are like 'ouch, there is one autoboxing I have missed', 'aaaa, finally takes precedence over that', 'so private access means that another method will be visible by default'. You may be puzzled by them, but after you get the hint, it becomes obvious. Language might be inconsistent at times, might have 'gotcha' moments, but is does not make it inherently complicated.

Scala on the other hand, has ability to be majorly complicated without any surprising parts. It is more powerful, it allows combining functional programming with OO. You need to understand both OO and functional to read some of scala programs.
On top of that to understand some performance implications, you will also need to grok the mapping of scala high-level constructs to jvm, which is WAY more complicated than java, where it is mostly 1-to-1.

Unknown said...

In my opinion, the idea of Scala being more "complicated" than Java is also due to the fact that when you first see Scala code, it is a bit unreadable (or more difficult to read) to someone used to classic C/Java syntax and style. I like to think about this as "long-term readability": once you are familiar with the Scala syntax & concepts (how steep the learning curve may be), the conciseness of Scala will dramatically improve the readability and correctness of the code, over and over again.

Vassil Dichev said...

@serberoth The complications are indeed of a different nature. I have deliberately left out Scala features which have no Java equivalents, for instance implicits and operator overloading.

@Artur A very thought-provoking and informed comment. And yes, the rules for Brainfuck and Malboldge *are* simpler than Scala. Does anyone care when they are so hard to use?

My cover's blown- most of my arguments are for a fairly superficial understanding of complexity. Here's a puzzler for my next post: I'll explore features in Scala, which are much more complicated than Java, but which will make the code much less complicated to create/understand. Just don't make me reveal all of it in a comment... :-)

Unknown said...

@artur, I think your comment is unfair: You compare that piece of complex Scala code against what? First try to create the same type in Java (wait, that's impossible), then compare it to the Scala version.

Though I agree that Scala code is typically more difficult to read than Java, with a catch: in Java you need to write many reams of noisy extra code than Scala, so there is more of it to comprehend, whereas Scala can be much more to the point.

Sushrut said...
This comment has been removed by the author.
Sony Mathew said...

Artur said it exactly!
Dimitris, the eqivanlant Java code would be simpler using simpler constructs & forced into proper OO Design!. Yes, it will a more wordier. That said the next big lang will borrow Scala's implicit typing.

Unknown said...

@Mathew, you might put your money where your mouth is and actually try to express it in Java.

"Proper OO design" that is forced by Java is pure myth, that conveniently pats laziness on its back.

It's not without a reason that Gosling himself described Java as a blue-collar language. Also, check this out: http://www.adam-bien.com/roller/abien/entry/java_net_javaone_which_programming

But surely you must know better than James.

Grazer said...

Hi Vassil,

Thanks for this write-up. It's good to be reminded of some of the reasons I seem to code so slowly when writing in Java.

For what it's worth, I consider Scala as also having two for 'constructs': there is the for comprehension supported by the language and the foreach() method provided on every collection.

If we're talking about complexity, I would say Scala's for comprehension must be considered more complex than Java's simple loop due it its support of filters and yielding.

I also think it's interesting that you trashed a bunch of APIs in the Java SE. I absolutely sympathise with the bloat that is the SE API and the immeasurable pain that has been caused by assimilating immature products too early. On the other hand, possibly the greatest boost that Scala has received is that it integrates so seamlessly with Java and makes all those APIs, as silly as some of them are, available for any newbie to use in an instant. In fact, without them, Scala today may be little more than a cool little scripting language for processing XML.

We Scala crusaders might be careful not to bite the Java-flavoured hand that feeds us!

Cheers,

Graham.

Sony Mathew said...

@dimitris
I would if i actually cared to try & make sense of that piece of scala code & wanted to spare the time, the burden of proof is on u guys just as much cause C or Brainfuck can be much more terse.

You should read your own link re Gosling...& yeah blue collar is a good thing..the super geeks have python & perl garbage already..they dont need no stinkin scala or java for that matter..

Vassil Dichev said...

@Sony There's no way to "prove" a language is more complicated than another, except maybe by popularity. And certainly if one doesn't "need" yet another language, don't learn it.

I'm surprised that what people have focused on from my post is counting keywords and equating simple language rules with simple code. What I wanted people to think about is that Java's complexity in certain areas is underestimated just because people are more familiar with it. If one appreciates the attempts of Scala's authors to simplify the syntax one can also objectively compare and evaluate Java's complexity.

Saqlain said...

brilliant,,
nice written..!!

Unknown said...

Thanks for sharing. One of the great features of Scala is that it integrates features of object-oriented and functional languages, enabling Java and other programmers to be more efficient and productive.On the other hand, Java allows programmers to include complex programs within their sites.

Unknown said...

When I bought my computer and I didn´t know how to use java graphics, so I decided looking for information in a webside and I found an useful information that helped me a lot.. Now I am interested in to do the best investment and I found a webside very useful and interesting called costa rica investment opportunities , I think it´s a very wonderful site.

Anonymous said...

I think that it is so easy to use it but people said that because it is a program that is completely different than java and many others. Generic Viagra

Jules187 said...

No, Scala is not more complex than Java. And the truth is no one needs it. It is just another in a long line of programming languages designed just for the heck of it. And unfortunately people hop on the hype-wagon, because they feel the need to write code in a fresh programming language which, they thinks, sets them apart somehow - just like those poor deluded people who are totally hooked on Apple. There is nothing wrong with Java, or at least not any more then was before, but it does its job. There is absolutely no need whatsoever for another language, especially not Scala!

Leif Wickland said...

The link to the Scala reserved words is broken. It's now available at http://ofps.oreilly.com/titles/9780596155957/TypeLessDoMore.html#ReservedWords

Sébastien Lorber said...

Nice article!


"Java initialization order in object construction is a pain to get right. The puzzlers on the certification exam use the most bizarre mix of static and instance initializer blocks and constructors calling or inheriting other constructors."

There is no constructor inheritance in Java.
That's true this part is a bit complex... Even the SCJP books do not tell you everything about that (for exemple, in which order static or not initialization blocks will execute when you a a class and a subclass..., and you create a subclass instance etc).
You just have to figure it yourself :)


Note there is a scala tutorial for java programmers here:
http://www.scala-lang.org/docu/files/ScalaTutorial.pdf

Matthew Cornell said...

I'm new to Scala but I've programmed in lots of non-functional languages before (well, Python too) and my reaction after ~8 hours of absorbing tutorials is that Scala is far more complex when it comes to learning. Mainly there are so many ways to do things and syntactic sugar out the wazoo. For a typical example: http://stackoverflow.com/questions/8303817/nine-ways-to-define-a-method-in-scala . Following through the official tutorial for java programmers, I've got a running list of WTF questions, including basics like what do {} really do. So while I appreciate your defense, in my reality (and I'm not atypical), it's way more complicated than Java.

Vassil Dichev said...

@Matthew hey, I was looking for exactly this StackOverflow question and found it on my (obviously a bit neglected) blog!

Again, my point is not only that Scala is not a complicated language. My point is that Java is more complicated than most people realize because they're used to it.

The stackoverflow question is mostly based on a combination of the following:

1. You can skip the brackets for a single statement
2. In this particular example, you can skip the empty set of parentheses for method parameters
3. If you want to assign the value of the block, you use an assignment operator. If you want to execute the block for its side effects and discard the value, you may omit the equals sign. Note that the latter option was only added to comfort the Java users and there were discussions to deprecate it (https://groups.google.com/forum/#!topic/scala-debate/8G3WgfZNj9E)

What's interesting is that Java 8 was released last year and it has added lambda expressions. You'd find it interesting that for lambdas:

1. You can define the type of the parameters, but you can skip it if it can be inferred
2. You can have parentheses around a single parameter, but you may skip them
3. You may have braces around a single statement, but you may skip them
4. You may use a return keyword for the last statement, but you may skip it.

So if I want to be as nitpicky as the stackoverflow user in this post, I could just as well say: OMG, there are at least 9 ways to define the same lambda expression in Java!

(int x) -> { x + 1;}
(x) -> { x + 1;}
x -> { x + 1;}
(int x) -> x + 1
(x) -> x + 1
x -> x + 1
(int x) -> { return x + 1; }
(x) -> { return x + 1; }
x -> { return x + 1; }

Fair enough? You knew this was all valid Java, right?

I think some syntactic variability is inevitable even in languages which claim that they offer only one way to do it. The point is to be honest about it.