looping regex

ninja_byte has asked for the wisdom of the Perl Monks concerning the following question:

I've deployed a script to do a number of scans based on what kind of file is uploaded, i.e. if it's html, do certain checks to protect against uploading phishing webpages. if tarball or exe, run antivirus if plain text, run it past a regular expression to see if they are uploading a huge list of email addresses.

$filetext =~ m/(    [\w\-._+]+   #first part of email address
                    \@[\w\-.]+   #second part
                    .{0,5}       #any char/separator
                )   {50}/sgmx
[download]

of course, on files with 100+ email addresses, this thing loops like crazy. I understand *why* perl is tripping out on it somewhat, but I don't know what to use to fix it, or as an alternate method. now that I think about it... I could probably use grep... Anyway, and suggestions are appreciated. mC

Comment on looping regex Download Code

Replies are listed 'Best First'.
Re: looping regex by McDarren (Abbot) on Apr 11, 2006 at 02:02 UTC
Rather than re-inventing this particular wheel, you could try Regexp::Common::Email::Address In the description you will find the words: "Don't worry, it's fast." Cheers, Darren :)	[reply]
Re^2: looping regex by ninja_byte (Acolyte) on Apr 11, 2006 at 02:06 UTC
Thanks! Immediately after I posted, I was able to work out a solution: `my @test_text = $filetext =~ m/([\w\-._+]+\@[\w\-.]+)/gmx; if (scalar @test_text > 50) { #success! }` [download] I think I'll play around with that module though. I sometimes forget how awesome CPAN is. ;-) mC	[reply] [d/l]
Re^3: looping regex by jwkrahn (Abbot) on Apr 11, 2006 at 12:45 UTC
You don't need to populate an array just to get a count of the times the pattern matches. `my $count = () = $filetext =~ /[\w._+-]+\@[\w.-]+/g; if ( $count > 50 ) { #success! }` [download]	[reply] [d/l]