comment on

Over the weekend I started playing with Parse::RecDescent and decided to apply it to a real world problem. Now perhaps using a tool this powerful for what I want to do is gross overkill, but it seemed that using it resulted in less code that is more maintainable than hand rolling a parser. Seeing how much advice in the monastery tends to argue that maintainable code is superior to faster code this route seemed ideal. Unfortunately using Parse::RecDescent is _much_ slower than I would like. So my question is threefold

Should I even be using Parse::RecDescent at all?
Can I change the grammer to make it more efficient?
Are there any recommendations for what is in effect my first attempt at using this very cool tool.

The data (a file) that I need to parse looks like this:

HDRCOMPNAME BIG000OLD111IDENTIFIER1020301WITH1010LOTS1010OF1010CRAP
ADD,1234567890,,COMPNAME
ADDRANGE,2468,4680,COMPNAME
DELETE,987654321,,COMPNAME
DELETERANGE,13579,13599,COMPNAME
TLR000004
[download]

and the grammer I am using looks like this:

my $Grammar=<<'END_GRAMMAR';

startrule   :   file

file        :   header record(s?) trailer_t
                { $return={
                           header=>$item[1],
                           records=>$item[2],
                           count=>$item[3]
                          } 
                }

header      :   header_t data_t
                { 
                 $return={
                          company=>$item[1],
                          code=>$item[2]
                         } 
                }

record      :   valid_rec | <error>

valid_rec   :   type_t ',' number_t ',' number_t(?) ',' name_t
                { $return=[
                           $item[1],
                           $item[3],
                           @{$item[5]} ? $item[5] : undef,
                           $item[7]
                          ] 
                }

header_t    :   /HDR\w+/ { $return=substr($item[1],3) }
trailer_t   :   /TLR\d+/ { $return=substr($item[1],3) }

data_t      :   /\w+/
type_t      :   /ADD(?:RANGE)?|DELETE(?:RANGE)/
number_t    :   /\d+/
name_t      :   /\w+/

END_GRAMMAR
[download]

If its not obvious I have used the postfix _t for tokens.

Any wisdom regarding this would be really appreciated. Especially if there is some way to modify the grammer to enhance speed. These files can contain thousands+ records and the speed hit is seriously making me think of hand rolling this (which I really really dont want to do).

Thanks in advance,

Yves / DeMerphq
--
This space for rent.

In reply to advice with Parse::RecDescent by demerphq

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.