Re: Regex help

Following regex should do the trick:

.+(?<=\])(.+?)(?=\[\/).+
[download]

Sample code:

#!/usr/bin/perl -w
use strict;

my $bold = "[b]bold text[/b]";
my $red = "[color=Red]Red text text[/color]";
my $red_bold = "[color=Red][b]Red bold text[/b][/color]";

my $regex = qr/.+(?<=\])(.+?)(?=\[\/).+/;

$bold =~ s/$regex/$1/;
$red =~ s/$regex/$1/;
$red_bold =~ s/$regex/$1/;

print "\$bold: $bold\n";
print "\$red: $red\n";
print "\$red_bold: $red_bold\n";

__END__
__OUTPUT__
$bold: bold text
$red: Red text text
$red_bold: Red bold text
[download]

Comment on Re: Regex help Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex help by tachyon (Chancellor) on Jul 31, 2004 at 13:19 UTC
The 1 while resursive subsitution trick is useful for this sort of problem. See my example above. I prefer a negated char class ie `[^\]]` in this example to an un-greedy .+? as it saves backtracking +/- improves accuracy as it is slightly more specific and it allows \n for example where . does not by default. Lots of ways to skin the cat, provided we can make a nice tasty stew TIMTOWDI. cheers tachyon	[reply] [d/l]
Re^3: Regex help by ysth (Canon) on Aug 01, 2004 at 05:54 UTC
I'd guess that your regex is still going to do a fair amount of backtracking. I'd say `(?>(\w+))[^\]]` or `(\w+)(=[^\]])?` (untested). Update: this isn't just a backtracking issue; tachyon's original regex will match things like `[color=Red][/col]`.	[reply] [d/l] [select]
Re^4: Regex help by tachyon (Chancellor) on Aug 01, 2004 at 08:31 UTC
You are probably right and as noted there are edge cases, as with all these sorts of things. Regardless of backtracking it will hit most strings at least twice. Given that I (at least) am unfamiliar with widgets that use this formatting spec I just put in a general suggestion. One of the great things about this site is that just about any hole/edge will be pointed out. Everyone learns. Something like you suggest that accurately deals with the 'blah' and 'blah=foo' forms (assumming they are the only options) with some \s* tokens to allow for whitespace issues is a little more robust. It is a pretty ugly RE but.....gotta hate metachars as formating tokens. Read more... (1086 Bytes)	[reply] [d/l]
Re^2: Regex help by kiat (Vicar) on Jul 31, 2004 at 12:39 UTC
Thanks, Dietz! I ran your code. It doesn't completely remove the following bad tags: `my $empty = "[color=Red][b][/b][/color]";` [download]	[reply] [d/l]
Re^3: Regex help by Dietz (Curate) on Jul 31, 2004 at 14:35 UTC
Sorry kiat, seems I completely misunderstood the task Here's another go, though tachyon's solution is excellent: #!/usr/bin/perl -w use strict; my $bold = "[b]bold text[/b]"; my $red = "[color=Red]Red text text[/color]"; my $red_bold = "[color=Red][b]Red bold text[/b][/color]"; my $empty = "[color=Red][b][/b][/color]"; &check_tags($bold); &check_tags($red); &check_tags($red_bold); &check_tags($empty); sub check_tags { my $tag = shift; print $tag, $/ if $tag =~ /(?:\[[^\]]+\])+.+?(?<!\])(?:\[\/).+/; } __END__ __OUTPUT__ [b]bold text[/b] [color=Red]Red text text[/color] [color=Red][b]Red bold text[/b][/color] [download]	[reply] [d/l]