one line regex eating CPU

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: one line regex eating CPU by GrandFather (Saint) on Jun 23, 2006 at 20:07 UTC
while puts the match in a scalar context and loops until killed by some external factor (every time you try the match succeeds unless it never does). Try this to achieve what you are after: `use strict; use warnings; my @titles; my $source = "<title>www.perlmonks.org</title><title>somewhere else</t +itle>"; push @titles, $source =~ m/<title>(.+?)<\/title>/ig; print join("\n", @titles);` [download] Prints: `www.perlmonks.org somewhere else` [download] Note too that you need to non-greedy match `(.+?)`, and as you know, this will very likely come unstuck used on HTML. :) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re^2: one line regex eating CPU by Anonymous Monk on Jun 23, 2006 at 20:11 UTC
Thanks, that did fix it. But I swear I did this before where I did push(@array, $1) ... because I actually like that form better than push @array, .. How would this method match $1 AND $2 then (assuming we had a second capture going on)?	[reply]
Re^3: one line regex eating CPU by GrandFather (Saint) on Jun 23, 2006 at 20:20 UTC
Something like this may do what you want: `use strict; use warnings; my @titles; my $source = <<TEXT; <title id='1'>www.perlmonks.org</title> <title id='2'>somewhere else</title> TEXT push @titles, $source =~ m/<title\s+id='(\d+)'>(.+?)<\/title>/igs; while (@titles) { my @pair = splice @titles, 0, 2; print "$pair[0]: $pair[1]\n"; }` [download] Prints: `1: www.perlmonks.org 2: somewhere else` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: one line regex eating CPU by ikegami (Patriarch) on Jun 23, 2006 at 20:14 UTC
Replace `push(@titles, $1) while $source =~ m/<title>(.+)<\/title>/i;` with `push(@titles, $1) while $source =~ m/<title>(.+?)<\/title>/ig;` to 1) avoid matching from the begining of `$source` every time, and 2) to avoid matching too much. `push(@titles, $source =~ m/<title>(.+?)<\/title>/i);` also works. It might even be faster. However, it uses more memory. Update: Actually, there should be at most one title, so you want `push(@titles, $1) if $source =~ m/<title>(.+?)<\/title>/i;`	[reply] [d/l] [select]
Re^2: one line regex eating CPU by shmem (Chancellor) on Jun 23, 2006 at 20:18 UTC
~~this~~ ikegami's advice seems to be better than mine - don't reset the regex-engine.	[reply]
Re^3: one line regex eating CPU by ikegami (Patriarch) on Jun 23, 2006 at 20:19 UTC
Hum? Yours "resets the regex-engine". Mine doesn't. How can you say we gave the same advice? `>perl -wle "$_='bacada'; print pos while /a/g" 2 4 6 >perl -wle "$_='bacada'; print pos while s/a//" Use of uninitialized value in print at -e line 1. Use of uninitialized value in print at -e line 1. Use of uninitialized value in print at -e line 1.` [download]	[reply] [d/l]
Re^4: one line regex eating CPU by shmem (Chancellor) on Jun 23, 2006 at 20:26 UTC
Re^5: one line regex eating CPU by ikegami (Patriarch) on Jun 23, 2006 at 20:53 UTC
Re: one line regex eating CPU by shmem (Chancellor) on Jun 23, 2006 at 20:13 UTC
Don't m//, do s///. `push(@titles, $1) while $source =~ s/<title>(.+)<\/title>//i;` [download] You must weed out what you've gathered so far, or you will get the same first match forever. _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l]