Java & Performance

Feb 8th, 2008 | By | Category: design, java, optimization
This entry is part 2 of 9 in the series optimization

I attended a No Fluff Just Stuff conference held at Newark,NJ in August 2006. This post is a condensation of what Brian Goetz mentioned in the course.

One of the topics that was presented out there was about Java and performance. That seemed odd since Java is not usually mentioned when the topic of performance comes up. That is the time when the Java evangelist excuses himself (or herself) and takes a small break while the C/C++ advocates unleash their spiel. But all these (un)popular urban myths are getting falsified with the new JVMs that are coming up.

The Java 1.5 VM is superior to its predecessors in many key respects. The new ones promise to be even better so much so that we should not be surprised if Java outperforms C++ in many respects.

We are going to examine the Java performance improvement over the years in the following areas:

Object Allocation & Garbage Collection

In JDK 1.0 object allocation was very slow. So was garbage collection. Allocation required first-fit/best-fit coalescing. What this means can be depicted in the picture below.

perf-1.jpg

Let us say that the JVM has acquired 100 bytes for the heap out of which some of them have been used as shown. A request for 12 bytes of data now requires the JVM to compact the heap and redistribute the objects so that the request can be satisfied. This gets to be very expensive with time as more objects get created and destroyed. In this case we have about 35 bytes of heap space but still cannot satisfy a request for 12 bytes without requiring the heap to be compacted.

This problem tended to lead to the rule that objects should not be proliferated. This lead to various anti-patterns such as object pools. Read about object pools here

In this example A, B, C and D are long lived objects that happened to share the same heap as short lived objects X, Y and Z. Hence when the short lived objects got cleared, the heap had to be compacted. In actual applications, it was observed that over 95% of the objects have high infant mortality (i.e they are short lived). This means that combining short lived and long lived objects into the same uniform heap is counter productive.

Generational Garbage Collection

From Java 1.2 onwards the trend started towards making generational garbage collectors. These substantially increased performance. In generational garbage collection the heap is divided into multiple parts. A discussion on the parts of the heap would be beyond the scope of this article but for all practical purposes we can visualize the heap as being made of two parts vizĀ young generation and old generation.

New objects typically get their memory from the young generation heap. Once this heap fills up, the objects get moved to the older generation. All objects dont have to be moved since most of the objects would have already become eligible for garbage collection. Hence only the extant objects get moved from the young generation to the old generation heap. Since most objects are short lived the time involved for transferring the extant objects is very less. This also automatically compacts the heap.

For more information, please see Brian Goetz’s article in the IBM developer works.

Thread Specific Heaps

In accordance with good programming practice, it is a good idea to create new _context_ objects for each thread and re-use command objects (the objects that actually perform the work) as *stateless singletons*. Hence for instance a Struts Action is a command object that gets reused across multiple threads while the Struts ActionForm is a context object that gets created once for every thread. If this programming model is adhered to as a best practice, it would be observed that there would be many objects created in the course of executing a thread. These objects would typically be eligible for garbage collection at the end of the thread. Hence a thread specific heap that allocates all “new” requests within a particular thread from one particular heap which gets garbage collected at the end of the thread, would scale very well.

Escape Analysis

Escape analysis is a powerful technique that the JVM can employ for optimization. In fact, the Sun JVM with Mustang (JDK 1.6) has already implemented this technique. To illustrate this, look at the snippet of code below:

  1. public class Foo {
  2. public void bar() {
  3. FooBar fb = new FooBar();
  4. // do something with fb
  5. // but dont copy fb to any field variable
  6. // dont update any array or collection from fb.
  7. // dont pass fb to any method. for instance dont call some method
  8. // baz(fb)  within this method.
  9. return;
  10. }
  11. }

A JVM can now analyse the variable “fb” and understand that it is not being outside of bar() because it is not copied to any field in class Foo nor is it being returned or updated into any array or collection. Hence it can deduce that this variable can be allocated within the “stack” of method bar() rather than in the heap. This way the JVM can avoid garbage collection. This is an example of escape analysis.

A comparison with C/C++

Memory allocation in Java is sophisticated as we have seen. But these techniques can in theory be replicated in a library in C/C++ and hence they can aspire for equalling Java’s memory management performance. But the malloc implementations which actually allocate the memory in an operating system are not anywhere close to the JVM in their sophistication.

As a consequence,it has been observed that the object allocation in Java is significantly faster than C/C++.

Dynamic Compilation

Java is interpreted code for the most part. The source code is compiled to a byte code which would still need to be interpreted by the JVM. But today’s JVMs take this one step further. They interpret the code for a while. Afterwards, the code gets compiled in the background to machine instructions and the compiled code is executed instead of the interpreted code. Once this happens, the java code is at least as performant as a C/C++ code that has been compiled. The compilation that is being done is by the JVM. This is an example of Dynamic Compilation as opposed to static compilation using a static compiler such as cc. But it gets better now.

Speculative Optimization

Dynamic compilers such as the JVM have more information about how the class is being used. For instance, the JVM is aware of all the classes that are getting loaded in this run. This knowledge can be used for doing Speculative Optimization – i.e optimization done with a certain amount of speculation. These speculations can get invalidated as the program executes and at that time they can be backed out.

Here is an example:

  1. public class Foo {
  2. public void baz(){
  3. // do something.
  4. }
  5.  
  6. }
  7.  
  8. public class Bar extends Foo {
  9. public void baz() { // override baz in Foo
  10. // do something else here.
  11.  
  12. }
  13. }
  14.  
  15. public class Main {
  16. public static void main(String[] args) {
  17. Foo foo = new Foo();
  18. loop(foo);
  19. foo = new Bar();
  20. loop(foo);
  21. }
  22.  
  23. public static void loop(Foo foo){
  24. // use foo in a BIG Loop.
  25. for (int i = 0; i < 1000000; i++){
  26. }
  27. }

In this example, Foo declares a baz() method that gets over-ridden in its subclass Bar. Main first uses Foo to do some complicated computation. Then it uses Bar to do the same thing.

In this case as Main is executing Foo gets loaded by the JVM. It observes that the method baz() in Foo is not final. But it also notices that there are no subclasses for Foo currently loaded (since Bar is not loaded as yet). It also observes that the class Foo is very highly used in the loop() method and hence is a candidate for dynamic compilation. When the JVM compiles Foo it can make a speculative optimization. It can say that the method baz() despite the fact that it is not final, can still be statically bound rather than dynamically bound. The speculation at this point rests on the fact that there have no subclasses for Foo() that over-ride baz(). This feature called MCT (Monomorphic Call Transformation) is an important optimization technique. This feature enables virtual method calls to be translated to direct calls.

The first loop completes with the dynamically compiled version of class Foo which has been speculatively optimized. But when the line “foo = new Bar()” is loaded for execution, the JVM has to load Bar and it understands that Bar over-rides baz() thereby invalidating the speculative optimization assumption. The JVM can now recompile Foo without the speculative optimization and use the newly compiled version of Foo for subsequent executions. This is possible because the JVM is aware of the runtime environment. This kind of speculative optimization cannot be achieved using a static compiler such as CC. On that count, Java has the scope to become faster and more optimally compiled than any C++ program.

Inlining

MCT discussed above enables inlining calls. Consider the following example:

  1. public class Inline {
  2. public final void inner(String s){
  3. if (s == null)
  4. return;
  5. else{
  6. // do something really complicated.
  7. }
  8. }
  9. public void outer(){
  10. String s = null;
  11. inner(s);
  12. }
  13. }

If the JVM can turn the virtual call to inner() into a direct call, then it can inline it, evaluate the test at compile time and optimize away all the complicated code.

Then outer() reduces to nothing!!

And if outer() can itself be inlined, so much the better! So we notice that MCT is the first step towards inlining which can significantly boost performance. MCT is only possible if a determination can be made about runtime usage of the class. So we see how the JVM can do optimizations that are not possible by static compilers.

In Java, the pointers are in the control of the JVM. In C/C++ the pointers are in the control of the programmer. Since the programmer can do anything with them (such as casting a pointer to an integer) it is impossible for the runtime environment to make optimizations. In Java, such optimizations are possible.

Synchronization Issues

Any synchronization is achieved by issuing a lock call on the OS. The JVM can optimize sycnchronizations in many ways with significant increases in performance.

Consider the following code:

  1. public String foo(){
  2. Vector v = new Vector();
  3. v.add("moe");v.add("joe");
  4. return v.toString();
  5. }

A clever JVM can realize that the code above uses a vector which is local to the block and hence free from synchronization issues since the same variable is not accessible from multiple threads. Hence the add() method can in effect be made a non synchronized method. Thus in modern JVMs, unshared synchronized collections are (in most cases) as fast as unsynchronized collections.

Conclusion & Coding Tips

The JVM has evolved over the years and has started profiling code dynamically as it runs. It utilizes the profiled information to make optimizations. Some of these optimizations are speculative and can get invalidated as the code continues to be executed. The JVM is smart enough to invalidate them and recompile the code.

With all these benefits, it is possible to achieve speed that has hitherto been considered impossible in an interpreted environment such as Java. Ironically, these optimizations are possible due to the fact that Java is interpreted and not statically compiled. With dynamic compilation Java can no longer be considered as a pure interpreted language. It is a smartly compiled code with the compilation being optimized to suit the current execution environment.

Statically compiled languages cannot achieve this. Besides, the fact that the JVM controls all the pointers means that it can optimize the code to use MCT and improved garbage collection. So this is the time when the trends get reversed. Optimized code is possible by giving up control of the programming environment as in Java- not by retaining control.

Coding Tips

To make the best use of the improvements in the JVM the code should be predictable and follow normal idioms. For instance, it is best to use normal for() loop semantics rather than weird ways of iterating thru arrays. JIT(Just in Time compilers)s are big pattern matchers. The typical rule that they have is “Look for code patterns like this because they can be replaced by optimized code that would execute faster”. Clever code often does not subscribe to these known patterns. Hence the compilers cannot optimize it.

So the best coding tip for enabling the JVM to generate optimized code is also the simplest – Dont write Clever code. Use simple patterns.

Series Navigation<< On IoC containers & Stateful componentsApp Optimization – Asynchronous Pre-fetching Strategies >>
Be Sociable, Share!

 Raja has been blogging in this forum since 2006. He loves a technical discussion any day. He finds life incomplete without a handy whiteboard and a marker.


Tags: ,

Leave Comment