in reply to Parse::RecDescent with lookahead or without ?
Does your format ressemble XML enough to use an XML parser? That would be best. The remainder of the post assumes you can't use an XML parser.
If I follow correctly, you're using the lookahead so that the < of <div> doesn't get absorbed by the regexp. You can avoid that by breaking codetext into smaller tokens instead of treating it as a single token.
entry : chunk(s?) EOF { join('', @{$item[1]}) } chunk : text { $item[1] } | code { $item[1] } text : TEXT { join('', '<div class="t +ext">', $item[1], '</div>') } code : CODE_OP CODE_TEXT CODE_CL { join('', '<div class="c +ode">', CGI::escapeHTML($item[2]), '</div>') } # Tokens EOF : m{^\Z} TEXT : m{[\w ]+} { $item[1] } CODE_OP : m{<xxx>} { $item[1] } CODE_CL : m{</xxx>} { $item[1] } CODE_CHARS : m{[\w $]+} { $item[1] } CODE_SPECIAL : m{<\w+>} { $item[1] ne '<xxx>' } { $item[1] } # Pseudo-token CODE_TEXT : CODE_TEXT_(s?) { join('', @{ +$item[1]}) } CODE_TEXT_ : CODE_CHARS { $item[1] } | CODE_SPECIAL { $item[1] }
UNTESTED.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Parse::RecDescent with lookahead or without ?
by szabgab (Priest) on Jan 23, 2005 at 19:23 UTC |