Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression

boo_radley has asked for the wisdom of the Perl Monks concerning the following question:

I mentioned that I had bought "programming the perl DBI" over "Mastering Regular Expressions" sometime this week in the CB. I now regret the choice :)

I have a very large string, $text. The string will have one or more start delimiters, foo, and an equal number of end delimters, bar. between the two will be an arbitrary amount of text.
example:

foo this is one example bar
this is a line I don't care about
foo here's another keeper! bar
foo yet another bar
and a line to reject
[download]

I devised the following :

while ($text =~/foo(*.?)bar/) {
   manipulate ($1)
}
[download]

that worked ok for the first one, but only captured the first instance. "Ah", I thought, "I'm not being greedy enough." Thus arose

while ($text =~/foo(*.?)bar/g) {
   manipulate ($1)
}
[download]

but the same thing happened -- the while loop would only process once, and then move on. I spent some Quality Time with the Bookshelf, perlre and perlop, and came up with :

while ($text =~/\Gfoo(*.?)bar/g) {
   manipulate ($1)
}
[download]

where \G matches from the last greedy regexp. That never seems to match, even once, so I thought some more. I figured "maybe I need to initialize the regexp somehow :

($text =~/foo(*.?)bar/g)
manipulate ($1);
while ($text =~/\Gfoo(*.?)bar/g) {
   manipulate ($1)
}
[download]

And that gave the same results, matched the first (non-loop) regexp, and ignored the regexp in the while loop. At this point, I'm pretty stumped. I've trawled through the regexp questions in the monastery, but most are labelled "regexp questions for the experts" or "quick regexp question" or, my favorite "available string 2463".

Comment on Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression Select or Download Code

Replies are listed 'Best First'.
Re: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by salvadors (Pilgrim) on Feb 02, 2001 at 21:11 UTC
You need to tell Perl that your string is multiple lines of text, using the "m" modifier: `my $text = qq{foo this is one example bar this is a line I don't care about foo here's another keeper! bar foo yet another bar and a line to reject}; foreach ($text =~ /^foo(.*)bar$/gm) { manipulate($_); } sub manipulate { print $_, "\n"; }` [download] (Note this assumes that foo and bar are going to be at the start and end of lines. It wasn't clear from your description whether "and a foo line to bar reject" should return nothing or "line to". Feel free to remove the ^ and $ to achieve the 2nd...) Tony	[reply] [d/l]
Re: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by chipmunk (Parson) on Feb 02, 2001 at 22:31 UTC
This is clearly not your actual code, not only because of the syntax error in your regexes that tilly pointed out, but also because, if it were your actual code (and the syntax error were fixed), then the second snippet would work. `#!perl undef $/; $text = <DATA>; while ($text =~/foo(.*?)bar/g) { manipulate ($1) } sub manipulate { print "@_\n"; } __DATA__ foo this is one example bar this is a line I don't care about foo here's another keeper! bar foo yet another bar and a line to reject` [download] and the output: `this is one example here's another keeper! yet another` [download] Please provide some code, whether it's your original code or a short snippet, that actually demonstrates the problem you are having.	[reply] [d/l] [select]
Re: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by jynx (Priest) on Feb 03, 2001 at 03:58 UTC
It's already solved, but while we're at it how about a but of fun? ; > This snippet pulls out the data beforehand and then processes it later: `my @data = $text =~ /foo(.?)bar/g; manipulate($_) foreach (@data);` [download] This could be very memory intensive if you have a large `$text` however (since it will probably generate all the entries before it builds the array). On the other hand, you only have to use the RE once, which should at least somewhat speed things up (unless you're keen on precompiling your RE's anyway : ) HTH, jynx Update:* You can also shorten this into: `manipulate($_) for (@{[$text =~ /foo(.*?)bar/g]});` [download] But that might be a little overboard...	[reply] [d/l] [select]
Re (tilly) 1: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by tilly (Archbishop) on Feb 02, 2001 at 21:07 UTC
`s/\\././g`	[reply] [d/l]
Re: Re (tilly) 1: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by boo_radley (Parson) on Feb 02, 2001 at 21:10 UTC
whoops. the regexps ARE actually .*s. I was just too busy being witty to notice the typo.	[reply]
Re (tilly) 3: Delving the regexp underdark -- how to understand \G's behavior, or how to loop through a regular expression by tilly (Archbishop) on Feb 02, 2001 at 21:22 UTC
`use strict; my $text = <<EOT; foo this is one example bar this is a line I don't care about foo here's another keeper! bar foo yet another bar and a line to reject EOT while ($text =~/foo(.*?)bar/gs) { print "$1\n"; }` [download] Works perfectly well for me. Note that the s modifier that I added to the RE allows your extracted text to extend across multiple lines. I don't know if that will matter to you.	[reply] [d/l]
Oh, the shame by boo_radley (Parson) on Feb 02, 2001 at 22:30 UTC
/i fixed it... Know your data folks :)	[reply]
Re: Delving the regexp underdark... by petral (Curate) on Feb 03, 2001 at 02:57 UTC
Watching pos() in the debugger and trying different things from the command line can help. I found this out last night. Turns out I couldn't get pos for $_, when I assigned it to a local variable, it worked fine. (This is in 5.05.003 and I was using it in recursive anonymous subs, so it may not be a general problem.) p	[reply]