strat has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I would like to do some text replacements in a string. I want to replace each corresponding [xzy] and [/xyz] to html-strings:

[xyz]level 1.1[/xyz] [xyz]level 1.2[/xyz]
to:
<table><tr><td>level 1.1</td></tr></table> <table><tr><td>level 1.2</td></tr></table>

or:

[xyz]level 1.1 [xyz]level 2.1[/xyz] rest of 1.1 [/xyz]
to:
<table><tr><td>level 1.1 <table><tr><td>level 2.1</td></tr></table>res +t of 1.1 </td></tr></table>

or even:

[xyz] error [xyz] level 1.1 [xyz] level 2.1 [/xyz] [/xyz] [xyz] level 1.2 [/xyz]
to:
[xyz] error <table><tr><td>level 1.1 <table><tr><td>level 2.1 </td></t +r></table> </td></tr></table> <table><tr><td>level 1.2 </td></tr></table>
Here, the opening tag before error has no corresponding closing tag, so it should not be replaced.

I need something like

while ($string =~ s/ \[xyz\] (text not containing [xyz] or [\/xyz]) \[\/xyz\] / "<table><tr><td>$1</td></tr></table>" /gsiex;
Is this possible with regular expressions?

I've been trying to solve this with a regex for two hours by now, I even came up with code like the following ($tag contains xyz):

while ($string =~ s/ (\[\Q$tag\E\]) (.+?) (\[\/\Q$tag\E\]) / my ($pre, $text, $post) = ($1,$2,$3); if ($text =~ m|\[\Q$tag\E\]|) { $pre.$text.$post; } else { "<table><tr><td>$text<\/td><\/tr><\/table>" } /gsiex) { 1; # do nothing }
but this doesn't do the replacement in the correct order.

Please, could you push me into the right direction? Or do I really have to write my own recursive descent?

Best regards,
perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Replies are listed 'Best First'.
Re: "Not containing something" in substitution
by Aristotle (Chancellor) on Aug 28, 2003 at 12:24 UTC
    perldoc -q balanced finds Can I use Perl regular expressions to match balanced text? in perlfaq6. That answers your question precisely. It also refers you to perlre, which says

    (??{ code })

    WARNING: This extended regular expression feature is considered highly experimental, and may be changed or deleted without notice. A simplified version of the syntax may be introduced for commonly used idioms.

    This is a "postponed" regular subexpression. The code is evaluated at run time, at the moment this subexpression may match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

    The code is not interpolated. As before, the rules to determine where the code ends are currently somewhat convoluted.

    The following pattern matches a parenthesized group:

    $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x;

    Makeshifts last the longest.

      Neat RegEx. Note that this does not solve strat's problem with this RegEx, as he is looking for a substitution and not a matching, and as it is nested it will be really hard to do a substitution with it. This can be understood as a challenge ;-).
      Still, I modified the RegEx a bit to work with strat's problem, at least for matching
      use re 'eval'; $begin = '[xyz]'; $end = '[/xyz]'; $string = '[xyz]level 1.1 [xyz]level 2.1[/xyz] rest of 1.1 [/xyz]'; $re = qr{ \Q$begin\E (?: (?> (?:(?!=\Q$begin\E|\Q$end\E).)+ ) | (??{ $re }) )* \Q$end\E }x; print 'Yeah!' if $string =~ $re;
      I just have too much free time ;-)
      Cheers, CombatSquirrel.
      Entropy is the tendency of everything going to hell.
        Why?
        use strict; use warnings; use re 'eval'; my $begin = qr!\Q[xyz]!; my $end = qr!\Q[/xyz]!; my @match; my $re; $re = qr{ $begin ( (?: (?> (?:(?!=$begin|$end).)+ ) | (??{ $re }) )* ) $end }x; $_ = '[xyz][xyz]level 1.1 [xyz]level 2.1[/xyz] rest of 1.1 [/xyz]'; 1 while s!$re!<xyz>$1</xyz>!; print; __END__ <xyz><xyz>level 1.1 [xyz]level 2.1</xyz> rest of 1.1 </xyz>

        Makeshifts last the longest.

      Am I wrong in thinking that this will only handle two levels of nesting?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        Yes. The (??{ }) construct is evaluated at match time if and only when the engine reaches that point.

        Makeshifts last the longest.

Re: "Not containing something" in substitution
by gjb (Vicar) on Aug 28, 2003 at 12:24 UTC

    As far as I can see, there's no reason why the text between [xyz] and [/xyz] should not contain such a tag. Since the data is already structured XML (or HTML) like, you can do a straightforward replace of [xyz] -> <something bla="bla"> and [/xyz] -> </something>.

    Unless I miss something... Just my 2 cents, -gjb-

Re: "Not containing something" in substitution
by CombatSquirrel (Hermit) on Aug 28, 2003 at 12:31 UTC
    You were really close to it. I just modified your code the following way:
    $begin = '[xyz]'; $end = '[/xyz]'; $string = '[xyz]level 1.1 [xyz]level 2.1[/xyz] rest of 1.1 [/xyz]'; 1 while ($string =~ s@ \Q$begin\E ((?:(?!=\Q$end\E).)*) \Q$end\E @ "<table><tr><td>$1</td></tr></table>" @igex); print $string;
    Not terribly efficient, but it appers to do what you want.
    Cheers,
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
Re: "Not containing something" in substitution
by Abigail-II (Bishop) on Aug 28, 2003 at 14:04 UTC
    #!/usr/bin/perl use strict; use warnings; use Regexp::Common; $_ = <<'--'; [xyz]level 1.1[/xyz] [xyz]level 1.2[/xyz] [xyz]level 1.1 [xyz]level 2.1[/xyz] rest of 1.1 [/xyz] [xyz] error [xyz] level 1.1 [xyz] level 2.1 [/xyz] [/xyz] [xyz] level 1.2 [/xyz] -- 1 while s!$RE{balanced}{-begin => "[xyz]"}{-end => "[/xyz]"}{-keep}! "<table><tr><td>" . substr ($1, 5, -6) . "</td></tr></table> +"!gex; print; __END__ <table><tr><td>level 1.1</td></tr></table> <table><tr><td>level 1.2</td></tr></table> <table><tr><td>level 1.1 <table><tr><td>level 2.1</td></tr></table> re +st of 1.1 </td></tr></table> [xyz] error <table><tr><td> level 1.1 <table><tr><td> level 2.1 </td>< +/tr></table> </td></tr></table> <table><tr><td> level 1.2 </td></tr></table>

    Abigail