in reply to Parse::RecDescent with lookahead or without ?

Does your format ressemble XML enough to use an XML parser? That would be best. The remainder of the post assumes you can't use an XML parser.

If I follow correctly, you're using the lookahead so that the < of <div> doesn't get absorbed by the regexp. You can avoid that by breaking codetext into smaller tokens instead of treating it as a single token.

entry : chunk(s?) EOF { join('', @{$item[1]}) } chunk : text { $item[1] } | code { $item[1] } text : TEXT { join('', '<div class="t +ext">', $item[1], '</div>') } code : CODE_OP CODE_TEXT CODE_CL { join('', '<div class="c +ode">', CGI::escapeHTML($item[2]), '</div>') } # Tokens EOF : m{^\Z} TEXT : m{[\w ]+} { $item[1] } CODE_OP : m{<xxx>} { $item[1] } CODE_CL : m{</xxx>} { $item[1] } CODE_CHARS : m{[\w $]+} { $item[1] } CODE_SPECIAL : m{<\w+>} { $item[1] ne '<xxx>' } { $item[1] } # Pseudo-token CODE_TEXT : CODE_TEXT_(s?) { join('', @{ +$item[1]}) } CODE_TEXT_ : CODE_CHARS { $item[1] } | CODE_SPECIAL { $item[1] }

UNTESTED.

Replies are listed 'Best First'.
Re^2: Parse::RecDescent with lookahead or without ?
by szabgab (Priest) on Jan 23, 2005 at 19:23 UTC
    I might be able to turn it to XML but I'll need to munge the input fot that. I am not sure I want to start going down that slope. See my updates above regarding what the input should look like.