regexp: least inclusive match?

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Monks. I have some text extracted from blog posts that in some cases includes an unwanted footer info that I want to lop off. Thing is, in some cases this is marked by a word that could theoretically occur legitimately earlier in the text. So for example let's say I have something like this

Tags are very useful to included in these posts.   Interested in growi
+ng a home based business go here.  
Tags:   Vemma Builder  ·  wallstrip  ·  Vemma  ·  Vegas  
//  Mar 2nd 2007 at 3:30 am      vemma killa        
Leave a Comment      Name      Mail      Website
[download]

and I want to get rid of everything from the second "Tags" to the end. If I say $text = s/Tags.+?$//; then it matches all the text. Is there some way to specify that it should use the "least inclusive" match from the end in order to prevent this?

Many TIA...

Steve

Comment on regexp: least inclusive match? Select or Download Code

Replies are listed 'Best First'.
Re: regexp: least inclusive match? by Sidhekin (Priest) on Mar 02, 2007 at 23:46 UTC
If the marker is a plain string (as in the example, without regex specials), substr and rindex ought to do the job: `my $idx = rindex($text, "Tags"); if ($idx > -1) { substr($text, $idx) = ''; }` [download] Otherwise you could for instance take advantage of * being greedy from the start: `$text =~ s/(.)Tags./$1/s;` [download] `print "Just another Perl ${\(trickster and hacker)},"` The Sidhekin proves Sidhe did it!	[reply] [d/l] [select]
Re^2: regexp: least inclusive match? by varian (Chaplain) on Mar 03, 2007 at 11:03 UTC
Sidhekin is right, with a minor change the regex will work. This is because principle 0 of regex says Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string So while your '.+?' dictates a preference for the smallest amount of characters to match, in the end this preference is overruled.	[reply]
Re: regexp: least inclusive match? by johngg (Canon) on Mar 03, 2007 at 09:35 UTC
Another way to do it would be to use regular expression with a negative look-ahead assertion. Substitute from the marker word to end of string with nothing as long as the marker word is not followed by another occurrence, thus it will only match from the last marker word onwards. `use strict; use warnings; my $blog; { local $/; $blog = <DATA>; } my $word = q{Tags}; my $rxStrip = qr {(?xs) \b$word\b (?!.\b$word\b) . }; $blog =~ s{$rxStrip}{}; print $blog; __END__ Tags are very useful to included in these posts. Interested in growi +ng a home based business go here. Tags: Vemma Builder · wallstrip · Vemma · Vegas // Mar 2nd 2007 at 3:30 am vemma killa Leave a Comment Name Mail Website` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l]