Saturday, 18 December 2010

String and CharSequence Concepts

Java String performance has been bothering me for a while.  The String class is needlessly overencumbered with fields supporting features that are only needed in a few cases.  For example the count and offset fields provide a range feature so that a string can use an existing char array without copying the required part into a new, smaller array.  This is a useful saving when used but I would imagine that features that use it like subsequence are the exception.

I believe the problem is that the class hierarchy is fundamentally incorrect.  There are two primary classes based around strings: String (concrete) and CharSequence (an interface).  The concept is wrong, string is the concept and char sequence is an implementation of a string.  The collections framework seems to agree, with the List, ArrayList (or other implementation) and SubList classes.

A small saving of two int fields maybe, but a lot of freedom to create custom string implementations.

Update 03/08/2011
I realised why all APIs require a String and not a CharSequence. Because the String class is immutable the char is guaranteed not to be changed mid way through an operation, and bugs that would wrongly be attributed to the API.

No comments: