# takes a marked-up string, a regular expression to determine which # tag(s) we're interested in, and a code reference which will do the # sub-string transformation. returns the modified string. sub parse_and_replace { my ( $string, $tag_match, $transform_sub ) = @_; my @context_stack; my %deferred_transforms; # loop matching tags of the form {word} and {/word} while ( $string =~ m!(\{(/)?(\w+)\})!g ) { my ( $tag, $tag_length ) = ( $3, length($1) ); my $is_close = $2 ? 1 : 0; if ( $is_close ) { # pop and possibly transform on finding a matching close tag # syntax check: properly nested? my $popped = pop @context_stack; if ( $tag ne $popped->{tag} ) { die "close '${\( $popped->{tag} )}' mis-matched with open '$ta +g'\n"; } # save start index and length of tag content if we match the # tag_to_match param. if ( $tag =~ /$tag_match/ ) { my $start = $popped->{pos}, my $length = pos($string) - $popped->{pos}; my $text = substr ( $string, $start, $length ); if ( ! $deferred_transforms{$text} ) { $deferred_transforms{$text} = $transform_sub->("$text"); } } } else { # just push onto the context stack on finding an open tag push @context_stack, { tag => $tag, pos => pos($string) - $tag_l +ength}; } } # now do the replacements my $error; foreach my $text ( keys %deferred_transforms ) { $string =~ s/$text/$deferred_transforms{$text}/g; } return $string; } # and to invoke: my $string = q( Outside. {tag} Inside level 1. {tag} Inside level 2. {/tag} Inside level 1. {/tag} Outside. ); my $sub = sub { $_[0] =~ s/\{tag\}(.+)\{\/tag\}/--Marked--\n$1\n--EndMarked--/gis; return $_[0]; }; print "RESULT: " . parse_and_replace ( $string, 'tag', $sub );
I do think it's worth noting that parsing/manipulating recursively-nested markup is not trivial. (You should test the heck out of any custom-written solution -- including the one I just supplied -- before you even start to think about trusting it for your application.) I second Merlyn's advice about rolling modules into your distribution; Parse::RecDescent is powerful, flexible and de-bugged!
It isn't possible, as you're discovering, to do this kind of parsing with simple regexps. Unless you're willing to put severe limits on allowed markup structure, you'll need to parse recursively (or cheat a bit and save some context, as my code does, and as IO's code does in a much niftier way).
And parsing is only half the battle -- the transformation can be tricky, too. Unless you're willing to limit the kind of transformation that's allowed, you have to build a tree, do the transformations on each tree node, then put the tree back together into a string. (My sub above sidesteps tree-ization by limiting transformations to simple, stateless, one-to-one mappings between a given "{tag}content{/tag}" string and a "result" string.)
KwinIn reply to Re: Properly transforming strings with nested markup tags
by khkramer
in thread Properly transforming strings with nested markup tags
by tocie
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |