Does your format ressemble XML enough to use an XML parser? That would be best. The remainder of the post assumes you can't use an XML parser.
If I follow correctly, you're using the lookahead so that the < of <div> doesn't get absorbed by the regexp. You can avoid that by breaking codetext into smaller tokens instead of treating it as a single token.
entry : chunk(s?) EOF { join('', @{$item[1]}) } chunk : text { $item[1] } | code { $item[1] } text : TEXT { join('', '<div class="t +ext">', $item[1], '</div>') } code : CODE_OP CODE_TEXT CODE_CL { join('', '<div class="c +ode">', CGI::escapeHTML($item[2]), '</div>') } # Tokens EOF : m{^\Z} TEXT : m{[\w ]+} { $item[1] } CODE_OP : m{<xxx>} { $item[1] } CODE_CL : m{</xxx>} { $item[1] } CODE_CHARS : m{[\w $]+} { $item[1] } CODE_SPECIAL : m{<\w+>} { $item[1] ne '<xxx>' } { $item[1] } # Pseudo-token CODE_TEXT : CODE_TEXT_(s?) { join('', @{ +$item[1]}) } CODE_TEXT_ : CODE_CHARS { $item[1] } | CODE_SPECIAL { $item[1] }
UNTESTED.
In reply to Re: Parse::RecDescent with lookahead or without ?
by ikegami
in thread Parse::RecDescent with lookahead or without ?
by szabgab
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |