Basically, greediness is only one consideration for how regexps find their matches, and in this case it's not affecting your results. What perl is doing is first looking for the first < character, then when it finds it, it looks forward (non-greedily) for the first matching >. There are a few ways around this, but I think what you might find best would be
$str = 'Some TextVenture</B Brothers</a>';
$str =~ s/<[^<]*>//;
Update: See How will my regular expression match? for more details on why greediness isn't the only factor. | [reply] [d/l] |
Though the match between < and > is non-greedy, perl's regular expressions start matching from the left. This means that the left-most < char is found first, then the .*? part matches until the first (and only) >
It is not 100% clear what you want to match in general, but for this string
s/<[^<>]*>//;
gives the desired result.
the perlre manpage has more info about greedyness vs left-first matching. See section "Version 8 Regular Expressions", paragraph 7, or the section on backtracking (search for got <d is under the bar in the >)
| [reply] [d/l] [select] |
OK, first of all, what are those slashes doing before the < and > signs? They're not necessary. You might have inherited that from doing regexes on HTML? It looks like it. So get into the habit of not using // as your separation pattern. Use || or ## or {}{} instead.
Second, this regex does exactly what you want it to, you've told it to find the first < it comes across and match, non-greedily, to the first > it comes across. That's why it's matching the way it is.
$str =~ s|(.*)<.*?>|$1|; does what you want. Match, greedily, and keep, anything up to a < (greediness is your friend in this kind of situation).
But then so does $str =~ s|</a>$||; if all you need is to take the closing anchor tag off the end. It's not possible to know exactly what you want from this example.
($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
=~y~b-v~a-z~s; print
| [reply] [d/l] [select] |
This is a good example of why one should not attack HTML or XML code with regexes but rather use some of the more sophisticated parsing modules on CPAN!And to boot, it is not even valid HTML: ex nihilo fit nihil as the Elder said.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
| [reply] |