WONTFIX

Tuesday, 30 August 2011

Checked Exceptions Part 3

I found a bug last week that was caused by a checked exception anti-pattern or maybe code smell. I name this smell "foo() throws Exception". The premise is that a method throws Exception, Throwable or any other exception which is low enough down in the class tree to capture the exceptions which may be thrown by the code that is being worked on. The software engineer happily beavers away without realising that the code is throwing all kinds of exceptions that are not being handled correctly. The handler is higher up the method chain and detached from what the code is actually doing so cannot reliably act, most likely it is simply logging exceptions. Once handled and logged nothing more is done possibly leaving the thread and/or application in an invalid state.

This can be caused by the programmer wanting the exception handling to go away perhaps because exceptions were used too liberally and now there is more exception handling than business logic. Maybe handling the exception is too uncomfortable since there is no plan B, there is only failure, but the application is long running and cannot be restarted so a catch-all was a last resort for an error that supposedly could not happen.

The application in question simulates a GSM-R application and calls are made to get the active call which the application is currently dealing with. The other party could end the call at any time and the system is completely asynchronous and multi-threaded and it is highly likely that no call could exist the next time you check for it. The exception being thrown was part of the null-checking strategy which has proved highly effective in reducing NPEs. Instead of a null being returned from a method, a checked exception is returned: CallDoesNotExistException. So this exception could happen in normal application flow and must be handled by application logic.

As it turned out the fix was remarkably elegant once the functionality as a whole was understood (read remembered). A catch for CallDoesNotExistException was added to the catch-all handler which would navigate the user out of the process they were in to one of the two possible application states fixing similar issues in the entire application. This change was helped by the fact that CallDoesNotExistException is task specific and other parts of the application could easily identify the issue in the code, as opposed to a generic exception which would be almost impossible to identify whether it was checked or not.

Wednesday, 3 August 2011

Using Method Handles

A few weeks ago I ran some tests pitting method handles against typical reflection to see what was faster. The result was that reflection was faster although since it was a quick test I don't have the result data or what run configuration I used. Now that I'm tasked with looking into porting our core systems to Java 7 I thought I'd take a more critical look.

Below is the source code I used to run the test. It calls a method on a class called Incrementer a million times. I didn't notice any difference when running the methods several times when running the test several times before timing values to "warm up" server mode.

Note that this code throws an exception when run from eclipse. I had to use the javac/java executables from the command line for this to work. When running the class in mixed mode I got the following results:

java -verbose:gc MethodHandleTest
[GC 4416K->141K(15872K), 0.0011714 secs]
[GC 4557K->141K(15872K), 0.0006216 secs]
[GC 4557K->141K(15872K), 0.0006387 secs]
reflection took 515ms, result is 1000000
method handles took 16ms, result is 1000000

There is plenty of garbage collection happening but not enough to account for the huge time difference. When running in server mode I got the following results:

java -verbose:gc -server MethodHandleTest
reflection took 47ms, result is 1000000
method handles took 15ms, result is 1000000

java -verbose:gc -server MethodHandleTest
reflection took 32ms, result is 1000000
method handles took 31ms, result is 1000000

All runs of the test gave almost identical results to the above where method handles would be twice as fast as reflection or equal with no garbage collection. Interestingly, if I comment out the increment of the long value the same result as with the increment is produced:

java -verbose:gc MethodHandleTest
[GC 4416K->141K(15872K), 0.0011488 secs]
[GC 4557K->141K(15872K), 0.0006595 secs]
[GC 4557K->141K(15872K), 0.0006482 secs]
reflection took 500ms, result is 0
method handles took 31ms, result is 0

This makes me think that incrementing the long is a very small proportion of the time it takes for the reflection example to run and the majority is for calling the method through reflection. In server mode the reflection call is JITted and almost identical results are produced. This is mostly conjecture since it is quite difficult to work out what the Hotspot compiler is doing.

In conclusion, method handles consistently performs as good as, or better than reflection no matter what run configuration you are using. Stability is an issue with the current tools, eclipse will have to support Java 7 before I could use method handles in our systems.

Saturday, 7 May 2011

An Immutable List in the JDK

It's quite strange that there is no immutable list class in the JDK, immutability is most likely the biggest part of any concurrent design. It's well understood that an immutable object can be safely passed or accessed by other threads meaning that it is not necessary to lock.

It's not quite true that there is no immutable lists in the JDK, there is always the Collections#unmodifiableList method, return a wrapper around the supplied List that does not allow modification. However, it is only a wrapper and the underlying data structure can be modified so there are no guarantees about immutability, only conventions which are only convention if you knew about it in the first place. Or the guy before you did.

The problem with the unmodifiable list is that once you have it, there is no way to check if it is unmodifiable. The implementation class, UnmodifiableList is package protected so it is not possible even to do an instanceof check. I have found this to be a big source of inefficiency. There is no way to check if the list is immutable so if you must have an immutable list you can "ask" via documentation that lists should be immutable or defensively copy the list. Typically I would using defensive copying, because there is no way to know how the code is going to be used in the future and if the engineer is going to check the documentation. I'm not even sure that it is a reasonable requirement to ask someone to check the small print for every method they use.

The fix for this is relatively easy and even doable without hacking the OpenJDK by using the -XbootClasspath command line option. Using this you can specify the boot classes you want to load, by default its your JRE rt.jar. If you modify some JDK classes you can specify that to be loaded first and then rt.jar so your modified classes will be first to load.

All that needs to be done as a starting point is to make the UnmodifiableList class public and make that the return type from Collections#unmodifiableList instead of List so we know what we're getting and other classes can pass it around also. It doesn't fix the possibility of the data structure being modified but we're 80% of the way there for such a small change. Because UnmodifiableList is still a List implementation all current code will still compile.

Unfortunately it is not legal to deploy this change to a production system, it's only for fun.

Saturday, 18 December 2010

String and CharSequence Concepts

Java String performance has been bothering me for a while. The String class is needlessly overencumbered with fields supporting features that are only needed in a few cases. For example the count and offset fields provide a range feature so that a string can use an existing char array without copying the required part into a new, smaller array. This is a useful saving when used but I would imagine that features that use it like subsequence are the exception.

I believe the problem is that the class hierarchy is fundamentally incorrect. There are two primary classes based around strings: String (concrete) and CharSequence (an interface). The concept is wrong, string is the concept and char sequence is an implementation of a string. The collections framework seems to agree, with the List, ArrayList (or other implementation) and SubList classes.

A small saving of two int fields maybe, but a lot of freedom to create custom string implementations.

Update 03/08/2011
I realised why all APIs require a String and not a CharSequence. Because the String class is immutable the char is guaranteed not to be changed mid way through an operation, and bugs that would wrongly be attributed to the API.

Wednesday, 10 November 2010

The cost of Java's finalize( ) method

We all know that the finalize method is generally bad, and shouldn't be used. However this is mostly for reasons of determinism and the fact that it is in no way like a c++ destructor, which means most of the reasons that it could be useful turn out to cause a bug (removing listeners).

What about the performance penalty? I was genuinely shocked at how bad performance got when using finalizable objects. The code snippet I used is below...

public class FinalizerTest
{
public static void main(String[] args)
{
    final long currentTimeMillis = System.currentTimeMillis();

    for (int i = 0; i < 1000000; i++)
    {
      new MyObject();
    }

    System.out.println("done in " + (System.currentTimeMillis() - currentTimeMillis) + "ms instances of MyObject created is " + MyObject.i + " and disposed " + MyObject.disposed);
}

static class MyObject
{
    static int i = 0, disposed = 0;

    public MyObject()
    {
      i++;
    }

    @Override
    protected void finalize() throws Throwable
    {
      super.finalize();
    }
}
}

Wit h the finalize method commented out and -verbose:gc on the command line the output was...

[GC 512K->106K(1984K), 0.0009172 secs]
[GC 618K->106K(1984K), 0.0003271 secs]
[GC 618K->106K(1984K), 0.0002747 secs]
[GC 618K->106K(1984K), 0.0000591 secs]
[GC 618K->106K(1984K), 0.0000533 secs]
[GC 618K->106K(1984K), 0.0000597 secs]
[GC 618K->106K(1984K), 0.0000590 secs]
[GC 618K->106K(1984K), 0.0000589 secs]
[GC 618K->106K(1984K), 0.0000587 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000586 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000586 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000584 secs]
done in 15ms instances of MyObject created is 1000000

With the finalize method implemented and -verbose:gc on the command line the output was...

[GC 512K->458K(1984K), 0.0033486 secs]
[GC 970K->968K(1984K), 0.0042475 secs]
[GC 1480K->1414K(1984K), 0.0032794 secs]
[GC 1926K->1861K(2496K), 0.0029895 secs]
[Full GC 1861K->1861K(2496K), 0.0178491 secs]
[GC 2373K->2372K(3680K), 0.0050145 secs]
[GC 2884K->2870K(3680K), 0.0036973 secs]
[GC 3382K->3369K(3936K), 0.0034726 secs]
[Full GC 3369K->3098K(3936K), 0.0227427 secs]
[GC 3610K->3609K(5744K), 0.0044828 secs]
[GC 4121K->4109K(5744K), 0.0036602 secs]
[GC 4621K->4556K(5744K), 0.0030668 secs]
[GC 5068K->5003K(5744K), 0.0032344 secs]
[GC 5515K->5450K(6000K), 0.0029319 secs]
[Full GC 5450K->4573K(6000K), 0.0283003 secs]
[GC 5085K->5084K(8200K), 0.0055667 secs]
[GC 5596K->5582K(8200K), 0.0041402 secs]
[GC 6094K->6085K(8200K), 0.0043665 secs]
[GC 6597K->6532K(8200K), 0.0028525 secs]
[GC 7044K->6979K(8200K), 0.0026690 secs]
[GC 7491K->7427K(8200K), 0.0026212 secs]
[GC 7939K->7874K(8456K), 0.0030884 secs]
[Full GC 7874K->6583K(8456K), 0.0454533 secs]
[GC 7415K->7414K(11872K), 0.0089436 secs]
[GC 8246K->8231K(11872K), 0.0063717 secs]
[GC 9063K->9054K(11872K), 0.0055485 secs]
[GC 9886K->9820K(11872K), 0.0046484 secs]
[GC 10652K->10587K(11872K), 0.0046174 secs]
[GC 11419K->11354K(12256K), 0.0046537 secs]
[Full GC 11354K->10217K(12256K), 0.0540124 secs]
[GC 11433K->11432K(18376K), 0.0124211 secs]
[GC 12648K->12623K(18376K), 0.0088227 secs]
[GC 13839K->13795K(18376K), 0.0084357 secs]
[GC 15011K->14882K(18376K), 0.0061044 secs]
[GC 16098K->15969K(18376K), 0.0063174 secs]
[GC 17185K->17056K(18376K), 0.0068424 secs]
[GC 18272K->18143K(19400K), 0.0071953 secs]
[Full GC 18143K->16065K(19400K), 0.0836026 secs]
[GC 17921K->17919K(28824K), 0.0205388 secs]
[GC 19775K->19750K(28824K), 0.0140122 secs]
[GC 21606K->21576K(28824K), 0.0124343 secs]
[GC 23432K->23239K(28824K), 0.0098796 secs]
[GC 25095K->24901K(28824K), 0.0105540 secs]
[GC 26757K->26564K(28824K), 0.0100624 secs]
[GC 28420K->28226K(30104K), 0.0102745 secs]
[Full GC 28226K->24606K(30104K), 0.1285938 secs]
[GC 27422K->27421K(44148K), 0.0293065 secs]
done in 2687ms instances of MyObject created is 1000000

Yikes, we've gone from 15ms to 2687ms! But this is java profiling after all and nothing is quite as it seems. If the call to super.finalize() is commented out the test takes zero milliseconds, I believe this is happening because the finalize method is now dead code so the JIT is removing it. If I put a static int and increment it every time finalize is called the time varies wildly from 1900ms to 3110ms.

Thursday, 4 November 2010

Finding if a number is a prime

Update 06/04/2012
Modified the for loop as proposed by Alan (comment #1) which makes the method roughly twice as fast.

Sunday, 12 September 2010

Definition of Estimation?

[The common definition of estimate is] "the most optimistic prediction that has a non-zero probability of coming true." … Accepting this definition leads irrevocably toward a method called what's-the-earliest-date-by-which-you-can't-prove-you-won't-be-finished estimating
—Tom DeMarco
—Software Estimation: Demystifying the Black Art: The Black Art Demystified, Chapter 3, Opening quip