Saturday, 18 December 2010

String and CharSequence Concepts

Java String performance has been bothering me for a while.  The String class is needlessly overencumbered with fields supporting features that are only needed in a few cases.  For example the count and offset fields provide a range feature so that a string can use an existing char array without copying the required part into a new, smaller array.  This is a useful saving when used but I would imagine that features that use it like subsequence are the exception.

I believe the problem is that the class hierarchy is fundamentally incorrect.  There are two primary classes based around strings: String (concrete) and CharSequence (an interface).  The concept is wrong, string is the concept and char sequence is an implementation of a string.  The collections framework seems to agree, with the List, ArrayList (or other implementation) and SubList classes.

A small saving of two int fields maybe, but a lot of freedom to create custom string implementations.

Update 03/08/2011
I realised why all APIs require a String and not a CharSequence. Because the String class is immutable the char is guaranteed not to be changed mid way through an operation, and bugs that would wrongly be attributed to the API.