Saturday 18 December 2010

String and CharSequence Concepts

Java String performance has been bothering me for a while.  The String class is needlessly overencumbered with fields supporting features that are only needed in a few cases.  For example the count and offset fields provide a range feature so that a string can use an existing char array without copying the required part into a new, smaller array.  This is a useful saving when used but I would imagine that features that use it like subsequence are the exception.

I believe the problem is that the class hierarchy is fundamentally incorrect.  There are two primary classes based around strings: String (concrete) and CharSequence (an interface).  The concept is wrong, string is the concept and char sequence is an implementation of a string.  The collections framework seems to agree, with the List, ArrayList (or other implementation) and SubList classes.

A small saving of two int fields maybe, but a lot of freedom to create custom string implementations.

Update 03/08/2011
I realised why all APIs require a String and not a CharSequence. Because the String class is immutable the char is guaranteed not to be changed mid way through an operation, and bugs that would wrongly be attributed to the API.

Wednesday 10 November 2010

The cost of Java's finalize( ) method

We all know that the finalize method is generally bad, and shouldn't be used. However this is mostly for reasons of determinism and the fact that it is in no way like a c++ destructor, which means most of the reasons that it could be useful turn out to cause a bug (removing listeners).

What about the performance penalty? I was genuinely shocked at how bad performance got when using finalizable objects. The code snippet I used is below...


public class FinalizerTest
{
  public static void main(String[] args)
  {
    final long currentTimeMillis = System.currentTimeMillis();
   
    for (int i = 0; i < 1000000; i++)
    {
      new MyObject();
    }
   
    System.out.println("done in " + (System.currentTimeMillis() - currentTimeMillis) + "ms instances of MyObject created is " + MyObject.i + " and disposed " + MyObject.disposed);
  }
 
  static class MyObject
  {
    static int i = 0, disposed = 0;
   
    public MyObject()
    {
      i++;
    }
   
    @Override
    protected void finalize() throws Throwable
    {
      super.finalize();
    }
  }
}

Wit h the finalize method commented out and -verbose:gc on the command line the output was...

[GC 512K->106K(1984K), 0.0009172 secs]
[GC 618K->106K(1984K), 0.0003271 secs]
[GC 618K->106K(1984K), 0.0002747 secs]
[GC 618K->106K(1984K), 0.0000591 secs]
[GC 618K->106K(1984K), 0.0000533 secs]
[GC 618K->106K(1984K), 0.0000597 secs]
[GC 618K->106K(1984K), 0.0000590 secs]
[GC 618K->106K(1984K), 0.0000589 secs]
[GC 618K->106K(1984K), 0.0000587 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000586 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000586 secs]
[GC 618K->106K(1984K), 0.0000585 secs]
[GC 618K->106K(1984K), 0.0000584 secs]
done in 15ms instances of MyObject created is 1000000

With the finalize method implemented and -verbose:gc on the command line the output was...

[GC 512K->458K(1984K), 0.0033486 secs]
[GC 970K->968K(1984K), 0.0042475 secs]
[GC 1480K->1414K(1984K), 0.0032794 secs]
[GC 1926K->1861K(2496K), 0.0029895 secs]
[Full GC 1861K->1861K(2496K), 0.0178491 secs]
[GC 2373K->2372K(3680K), 0.0050145 secs]
[GC 2884K->2870K(3680K), 0.0036973 secs]
[GC 3382K->3369K(3936K), 0.0034726 secs]
[Full GC 3369K->3098K(3936K), 0.0227427 secs]
[GC 3610K->3609K(5744K), 0.0044828 secs]
[GC 4121K->4109K(5744K), 0.0036602 secs]
[GC 4621K->4556K(5744K), 0.0030668 secs]
[GC 5068K->5003K(5744K), 0.0032344 secs]
[GC 5515K->5450K(6000K), 0.0029319 secs]
[Full GC 5450K->4573K(6000K), 0.0283003 secs]
[GC 5085K->5084K(8200K), 0.0055667 secs]
[GC 5596K->5582K(8200K), 0.0041402 secs]
[GC 6094K->6085K(8200K), 0.0043665 secs]
[GC 6597K->6532K(8200K), 0.0028525 secs]
[GC 7044K->6979K(8200K), 0.0026690 secs]
[GC 7491K->7427K(8200K), 0.0026212 secs]
[GC 7939K->7874K(8456K), 0.0030884 secs]
[Full GC 7874K->6583K(8456K), 0.0454533 secs]
[GC 7415K->7414K(11872K), 0.0089436 secs]
[GC 8246K->8231K(11872K), 0.0063717 secs]
[GC 9063K->9054K(11872K), 0.0055485 secs]
[GC 9886K->9820K(11872K), 0.0046484 secs]
[GC 10652K->10587K(11872K), 0.0046174 secs]
[GC 11419K->11354K(12256K), 0.0046537 secs]
[Full GC 11354K->10217K(12256K), 0.0540124 secs]
[GC 11433K->11432K(18376K), 0.0124211 secs]
[GC 12648K->12623K(18376K), 0.0088227 secs]
[GC 13839K->13795K(18376K), 0.0084357 secs]
[GC 15011K->14882K(18376K), 0.0061044 secs]
[GC 16098K->15969K(18376K), 0.0063174 secs]
[GC 17185K->17056K(18376K), 0.0068424 secs]
[GC 18272K->18143K(19400K), 0.0071953 secs]
[Full GC 18143K->16065K(19400K), 0.0836026 secs]
[GC 17921K->17919K(28824K), 0.0205388 secs]
[GC 19775K->19750K(28824K), 0.0140122 secs]
[GC 21606K->21576K(28824K), 0.0124343 secs]
[GC 23432K->23239K(28824K), 0.0098796 secs]
[GC 25095K->24901K(28824K), 0.0105540 secs]
[GC 26757K->26564K(28824K), 0.0100624 secs]
[GC 28420K->28226K(30104K), 0.0102745 secs]
[Full GC 28226K->24606K(30104K), 0.1285938 secs]
[GC 27422K->27421K(44148K), 0.0293065 secs]
done in 2687ms instances of MyObject created is 1000000

Yikes, we've gone from 15ms to 2687ms! But this is java profiling after all and nothing is quite as it seems. If the call to super.finalize() is commented out the test takes zero milliseconds, I believe this is happening because the finalize method is now dead code so the JIT is removing it. If I put a static int and increment it every time finalize is called the time varies wildly from 1900ms to 3110ms.

Thursday 4 November 2010

Finding if a number is a prime

Update 06/04/2012
Modified the for loop as proposed by Alan (comment #1) which makes the method roughly twice as fast.

Sunday 12 September 2010

Definition of Estimation?

[The common definition of estimate is] "the most optimistic prediction that has a non-zero probability of coming true." … Accepting this definition leads irrevocably toward a method called what's-the-earliest-date-by-which-you-can't-prove-you-won't-be-finished estimating
—Tom DeMarco
Software Estimation: Demystifying the Black Art: The Black Art Demystified, Chapter 3, Opening quip

Friday 27 August 2010

Finding the Maximum Possible Value in a Set Number of Bits

I needed to find the maximum possible value for an unsigned 64-bit int, what better way that to write a script a groovy to find it, and numbers of bits if I need it again? One problem is that java cannot handle unsigned values so the only option is to use the next biggest primitive but a long is the biggest int type available, except for the BigInteger which can handle any value. You can force a number in groovy to be a BigInteger by suffixing the number that is being assigned with G.


def bits = 64;
def maxValue = 0G;
def bitValue = 1G;

for (int i=0; i<bits; i++)
{
  maxValue += bitValue
  bitValue += bitValue
}
println maxValue

------------------------------

18446744073709551615

Monday 23 August 2010

Hungarian Notation on Type Names and Eclipse Java Browsing Perspective

For a few years I've been using notation on class names, for example IDisposable for interfaces and names like AbstractCollection for abstract classes but recently I stopped and started named classes only using terms in their domain.  I believed this described the code and make it more understandable and in some ways it was.  It was easy to see what kind of type the class with the cost of obscuring what the class is responsible for, classes also get alphabetically sorted so similiar classes are not next to each other in the package (having a greater number of classes in a package is useful since package scope can be enforced).

Another problem is that the class is less flexible to change, actually its usage is less flexible.  If this coding standard is imposed and a class needs to become an interface or abstract then the class name needs to change everywhere that its used, in a whole bunch of files.  This makes the code commit really ugly, in the worst case you don't own all of the code that used the class so its not possible to easily change it.  As long as the class didn't have a public constructor then its quite viable to make a class abstract or an interface.

The deal breaker for me was when I started to use the eclipse java browsing perspective, an excellent write up can be found here.  After dabbling with Pharo Smalltalk I started to wonder why this was preferred, after a couple of awkward days I found that it was much easier to use and showed me much more useful information, more concisely, including the class type information that I was encoding in the class name. 

Thursday 29 April 2010

Miglayout DSL

Here's an embryonic DSL for MigLayout in Java.  Statically imported enums turn out to be a pretty handy alternative to Smalltalk or Erlang atoms.  Unfortunately I can't find the HTML syntax highlighter for Java.


import static MigBuilder.*;

Object mig = mig(size(BUTTON_WIDTH, BUTTON_HEIGHT), gapleft(LEFT_GAP), hidemode(zero), growx, growy);
        
String migconstraint = "w 11!, h 52!, gapleft 3, growx, growy";

enum MigBuilder {
  grow, growx, growy, push, gapleft;
 
  public static Object mig(Object...strings)
  {
    StringBuilder builder = new StringBuilder();
    for (Object object : strings)
    {
      if(object instanceof MigBuilder)
        builder.append(',');
     
      builder.append(String.valueOf(object));
    }
    return builder.toString();     
  }
 
  public static Object size(int width, int height)
  {
    return mig("w ", width, "!,h ", height, '!');
  }
 
  public static Object gapleft(int pixels)
  {
    return mig(gapleft, pixels);
  }
 
  public static Object hidemode(HideMode hidemode)
  {
    return mig("hidemode ", hidemode);
  }
}

Friday 9 April 2010

UML

I was reading this article about the supposed best use of UML diagrams during the course of a project, in particular, generating code from UML is useless.  I have to agree that it seems pointless, once the generated code is modified the UML diagram is no longer useful and can no longer be used to generate the source.  Maybe this has just be used in the wrong way.  UML could be used in the same way as a unit test.  Perhaps initially it could generate code but also assert that code sticks to the design, if not then this test would fail and the architects would be notified.  This would help keep the architects up to date on what the codebase looked like without constant peer reviews, keep the documentation up to date and give warning when the design needed to change.  I have a feeling this is already done by someone.

Thursday 28 January 2010

Starting Listeners in the Correct State

A problem that I'm having with event listeners at the moment is making the listening object begin at a point in time in the correct state when it is really designed to build its state from the the events that are received while assuming a state to begin with.

For example, say you add a listener to an observable containing the state for a light switch, the listener should really know the state straight away but the event could be fired in 8 hours, or never. There are two ways I can see to fix this.

When adding the listener the listener gets the current state from the observable and updates itself.  This is fine but it does mean that the listener needs to know about the observable which adds a dependency so that the listening object is less reusable.  The Java property change framework is weakly typed, treating sources and values as objects anyway so you shouldn't really be expected to add this dependency.  Plus it makes the listener more difficult to test.

The observable holds a copy of the last event object it fired and when a listener is added fires all events to that listener.  This can give us even more problems.  Imagine firing an event with a complex or resource hungry value.  The complex object could have references to many other objects that would get garbage collected except that this stored event keeps them in memory.  An image is probably the best example of an object that you wouldn't like to keep in memory longer than necessary.  This really means that only primitive values, enums and other small immutables should be the values in change events.

These problems are really compounded for state machines where to get begin at the correct state the machine has to analyse system state in much the same way a conventional solution would need to.

Friday 22 January 2010

Statics

Statics: the natural nemesis of discoverability -_-

Thursday 21 January 2010

European Java Roadshow

My registration for the European Java Roadshow has been accepted, WOOT!  There are some real nuts and bolts talks about Java and the JVM at the London Roadshow which should be interesting.  Here's my views on the talks...
  • Firstly, Java Hotspot Optimisations.  I'm always fascinated at the work going on in this area, also I know that at the moment it would go way over my head if it got down to the nitty gritty and I think its the same for anyone who hasn't whole heartedly delved into the OpenJDK source.
  • Secondly, Java Garbage Collection.  This should go way beyond the generation garbage collection diagram that we've all seen implemented in every half arsed VM.  I really want to know what the G1 garbage collector is going to give software in the future, and more importantly the trade offs involved.
  • The More about Java SE embedded talk is interesting simply because I have almost no knowledge of whats going on in that area.  I've worked with software engineers doing embedded programming and don't think they would give up their beloved C lightly.  Its going to be good to hear about the inroads the JVM is making in this area, especially in memory usage!
  • Java Realtime VMs promises the same as above, although its more pertinent to my area of work; simulation.  There are certain cases where realtime is important and its certainly something to be strived for.
  • When I read Java Futures I instantly thought java.util.concurrent.Future, great a concurrency talk!  This is just my naive programmer's mind.  Hopefully it will be something like the previous Java Futures talk at JavaOne.
  • Not sure about Java for Business: Business Cases, is it a sales talk?  I'm guessing theres something more.

If you're going, or just commuting to London... See you on Feb 4th

Tuesday 19 January 2010

Implementing Application UI in XML

I just noticed the new release of Pivot on DZone, and the whole idea just leaves me confused.  Why would anyone want to do complex coding in XML?

Java isn't the best for the kind of declarative style that you need when coding components that need a lot of information to be passed in such as colours and fonts.  The rows of sets really become ugly but at least you can have named constants for things like points and rectangles which is really difficult in XML.  Ant is a good example of that, properties soon become complex which doesn't happen in a Java project.

I don't think JavaFx is the answer either, it's just as confusing.  It's a high level language without an advertised purpose.  Yes it can do nice graphics easily but this must account for a single figure percentage (if that) of the total amount of software currently being developed.  If it hasn't been made to scale up to more complex tasks then whats the point of it?  Thats more of an advertising blunder than a language problem since it has all of the features that I would expect of a non-innovative modern language.  Maybe it's just not for me, it's created for designers and I'm not the target market.  Concurrency features are a hard sell to hasn't seen their UI mysteriously lock up.  Overall I'm just not impressed enough to make the switch when I could spend my time learning much more interesting tech.

The Groovy builders are impressive but aren't so much greater than Java to make the massive leap of incorporating a new language into your system. 


Back to Java, what does it really need to catch up in productivity to the main players?
  • Closures,  this would just be really nice to have.  Anonymous classes achieve the same thing but Swing would look so much nicer with closures.
  • String interpolation and multiline string, so simple and so useful.  Over Christmas I read up on Ruby, the language is a good manual on how to make our lives easier for day to day tasks.
  • Clear up the cruddy Swing API, an average custom component has a multitude of inherited methods.  Just think how many different size properties there are, do you know which layout managers rely on which properties?  This can be improved.
  • Inbuilt listener framework, the property change listener framework is really underpowered.  The default framework for property listeners should have weak listeners, annotated methods that can generate handlers and threading strategies as standard.  Think EventBus or something similiar.
  • Configurable style, style should really be configurable and maybe even injectable as in Fuse.  The Swing Application Framework has this as well.
Using frameworks like Guice and MigLayout really improve design and coding as well and should be more standard in Java desktop development.

Sunday 10 January 2010

Twitter

The blogosphere isn't what it used to be, I've realised that a lot of the people that I keep up with have moved onto Twitter.  To keep up on the interesting tech news I've also got round to following people on my previously empty account.  I'm on as @andy_till suprisingly.

Thursday 7 January 2010

UI Code Don'ts

It's important to keep code quality in check constantly in UI classes probably more than others as it typically seems to get out of hand more quickly.  Here are some observations, they hopefully do not fall into the micro-optimisation category but I definitely feel they are bad to have around.

  • Images or other memory hungry objects/resources marked with static.  These objects will never be garbage collected even if they are never used.  A shared object should store these images short term or SoftReferences should be used to store the images for a greater time without the danger of memory leaks.
  • Style information spread out across the whole program.  Especially Font objects, this seems to be the first thing to change and if it is spread out across the whole project then its going to be a mess.
  • Creating new Color objects in setters.  This causes a couple of problems, you're creating new Color objects all over the place which is only a minor problem but still pointless.  The other problem is understandability, it's hard to see what that colour is from the RGB constructor.
  • Magic numbers.  Not much need to explain why this is bad, please name them.  If there are too many then create a separate class to house them, you can even use a static import to keep code readable.
  • Too many anonymous classes.  Anonymous classes are useful buf too many look ugly (I don't like the syntax noise) and can cause memory leaks as the reference isn't explicit.
  • SwingTimer.  Please no!  Even if you know the memory leak pitfall, it's still too easy to fall into it.  Just avoid.

    This is just a short list off the top of my head, I'll update it if I think of anymore.

    Wednesday 6 January 2010

    Snowtastrophy

    Your grandchildren will ask... "Where were you when it snowed a bit in January 2010"? Thats the situation as I understand it from Sky News anyway.

    Sunday 3 January 2010

    TweetDeck and Window Title Bars

    I've just downloaded tweetdeck and it looks really beautiful.  The Adobe AIR install worked seamlessly as well, working with Java it's easy to forget thats possible :P

    The problem is that TweetDeck uses the standard title bar in XP and I assume for other operating systems as well.  This is fine for systems that actually look nice like Mac and some of the newer Gnome skins but in XP this is noticably ugly since the rest of the UI looks so nice.  Firefox also suffers from this when using the Personas plugin.  It isn't that difficult to have an undecorated window and create a custom title bar is it?