Regex look-behind problem.

the_0ne has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks, have a regex problem that I'm hoping you can help with.

First off, disclaimer, the reason I am not using an html parser is the format I am going to is not very synonomous with html converters. I'm working with a very small subset, so I'm hoping to bang this out with regexes instead of a full-blown html parser.

Here's the code...

$foo = "<italic>Here's a <bold>larger<normal> paragraph, <italic>where
+ I'm<norma
l> going to <bold>bold some <italic>";
print "\nfoo before:\n$foo\n\n";

#foo.gsub!(/(?<=<italic>)(?<!<normal>)(.*?)<bold>/, '\1<bold-italic>')
$foo =~ s/(?<=<italic>)(?<!<normal>)(.*?)<bold>/\1<bold-italic>/g;
print "foo after:\n$foo\n";
[download]

Here's the output I am getting...

# Output is...
# <italic>Here's a <bold-italic>larger<normal> paragraph, <italic>wher
+e I'm<normal> going to <bold-italic>bold some <italic>
[download]

Notice the second <bold> is being replaced with <bold-italic>. By the regex (at least I think I have the regex right) the second bold *should not* be replaced since I perform a look-behind for <normal>. If <normal> is between the <italic> and the <bold>, then the <bold> should be left alone. At least this is what I am trying to get at.

Here what I would like to see...

# However, should be...
# <italic>Here's a <bold-italic>larger<normal> paragraph, <italic>wher
+e I'm<normal> going to <bold>bold some <italic>
[download]

Notice the second <bold> is not replaced.

I'm confused as to what is wrong with my regex.

Thanks again Monks for all your help.

Comment on Regex look-behind problem. Select or Download Code

Replies are listed 'Best First'.
Re: Regex look-behind problem. by ikegami (Patriarch) on Jul 12, 2007 at 22:43 UTC
`(?!<normal>).?` will happily match `" <normal>"<c>, so you need to check every <c>.` to make sure it's not the start of `<normal>`. Or since you're looking backwards, you could check to make sure every `.` is not the end of `<normal>`. `s/ (?<=<italic>) ( (?: .(?<! <normal>) ) ) <bold> /$1<bold-italic>/xg` [download] It's a lot more sane going forward instead of backwards. `s/ ( <italic> (?: (?!<normal>). )* ) <bold> /$1<bold-italic>/xg` [download] By the way, you should use `$1` in the second (non-regep) half of the substitution operator.	[reply] [d/l] [select]
Re^2: Regex look-behind problem. by the_0ne (Pilgrim) on Jul 12, 2007 at 23:53 UTC
Thanks ikegami, that worked perfectly. lol, thanks for the tip. However, there is a reason for going backwards. The conversions I'm doing actually matter in reverse more than forward. But believe me, if I can take your second example and convert some of my regexes, I will. Thanks again.	[reply]
Re: Regex look-behind problem. by runrig (Abbot) on Jul 12, 2007 at 21:30 UTC
The ".*?" allows the regex to find lots of places before the "bold" where it is not "normal". So it matches "bold" and replaces.	[reply]


We don't bite newbies here... much
	PerlMonks