in reply to Regex help

Following regex should do the trick:

.+(?<=\])(.+?)(?=\[\/).+

Sample code:
#!/usr/bin/perl -w use strict; my $bold = "[b]bold text[/b]"; my $red = "[color=Red]Red text text[/color]"; my $red_bold = "[color=Red][b]Red bold text[/b][/color]"; my $regex = qr/.+(?<=\])(.+?)(?=\[\/).+/; $bold =~ s/$regex/$1/; $red =~ s/$regex/$1/; $red_bold =~ s/$regex/$1/; print "\$bold: $bold\n"; print "\$red: $red\n"; print "\$red_bold: $red_bold\n"; __END__ __OUTPUT__ $bold: bold text $red: Red text text $red_bold: Red bold text

Replies are listed 'Best First'.
Re^2: Regex help
by tachyon (Chancellor) on Jul 31, 2004 at 13:19 UTC

    The 1 while resursive subsitution trick is useful for this sort of problem. See my example above. I prefer a negated char class ie [^\]] in this example to an un-greedy .+? as it saves backtracking +/- improves accuracy as it is slightly more specific and it allows \n for example where . does not by default.

    Lots of ways to skin the cat, provided we can make a nice tasty stew TIMTOWDI.

    cheers

    tachyon

      I'd guess that your regex is still going to do a fair amount of backtracking. I'd say (?>(\w+))[^\]]* or (\w+)(=[^\]]*)? (untested).

      Update: this isn't just a backtracking issue; tachyon's original regex will match things like [color=Red][/col].

        You are probably right and as noted there are edge cases, as with all these sorts of things. Regardless of backtracking it will hit most strings at least twice. Given that I (at least) am unfamiliar with widgets that use this formatting spec I just put in a general suggestion. One of the great things about this site is that just about any hole/edge will be pointed out. Everyone learns. Something like you suggest that accurately deals with the 'blah' and 'blah=foo' forms (assumming they are the only options) with some \s* tokens to allow for whitespace issues is a little more robust. It is a pretty ugly RE but.....gotta hate metachars as formating tokens.

Re^2: Regex help
by kiat (Vicar) on Jul 31, 2004 at 12:39 UTC
    Thanks, Dietz!

    I ran your code. It doesn't completely remove the following bad tags:

    my $empty = "[color=Red][b][/b][/color]";
      Sorry kiat, seems I completely misunderstood the task
      Here's another go, though tachyon's solution is excellent:
      #!/usr/bin/perl -w use strict; my $bold = "[b]bold text[/b]"; my $red = "[color=Red]Red text text[/color]"; my $red_bold = "[color=Red][b]Red bold text[/b][/color]"; my $empty = "[color=Red][b][/b][/color]"; &check_tags($bold); &check_tags($red); &check_tags($red_bold); &check_tags($empty); sub check_tags { my $tag = shift; print $tag, $/ if $tag =~ /(?:\[[^\]]+\])+.+?(?<!\])(?:\[\/).+/; } __END__ __OUTPUT__ [b]bold text[/b] [color=Red]Red text text[/color] [color=Red][b]Red bold text[/b][/color]