Saturday, 26 December 2009

Erlang CSV Parser

At long last I've finished the CSV parser I've been working on.  Most of the technical problems have been string matching problems, and that a string is a list of integers that can be matched singly prefixing a dollar sign to the char that should be matched.

The parser uses a state machine implemented by using a process and messages the next char until it reaches the end of the file/string at which point it messages an eof atom and awaits the process to message back the parsed CSV.  In the end the parser used quite a lot of erlang's features including processes, funs and parametrised macros and the end result was pretty clean.  It can take a plain string or an IO device such as a file as the string source which is handled in a nice way using funs to get the next char.  I found the switch from OOP to functional confusing at first since I wanted to use an input stream but the functional method I discovered is probably smaller than the Java stream based approach I would have used otherwise.

Other notable CSV parsers include ppolv's and an FSM OTP behaviour from Praveen Ray of Yellowfish.  I'm really impressed by the OTP behaviour, I can imagine this would improve reuse once comfortable with erlang and the OTP.

No comments: