ninja_byte has asked for the wisdom of the Perl Monks concerning the following question:

I've deployed a script to do a number of scans based on what kind of file is uploaded, i.e. if it's html, do certain checks to protect against uploading phishing webpages. if tarball or exe, run antivirus if plain text, run it past a regular expression to see if they are uploading a huge list of email addresses.
$filetext =~ m/( [\w\-._+]+ #first part of email address \@[\w\-.]+ #second part .{0,5} #any char/separator ) {50}/sgmx
of course, on files with 100+ email addresses, this thing loops like crazy. I understand *why* perl is tripping out on it somewhat, but I don't know what to use to fix it, or as an alternate method. now that I think about it... I could probably use grep... Anyway, and suggestions are appreciated. mC

Replies are listed 'Best First'.
Re: looping regex
by McDarren (Abbot) on Apr 11, 2006 at 02:02 UTC
    Rather than re-inventing this particular wheel, you could try Regexp::Common::Email::Address

    In the description you will find the words: "Don't worry, it's fast."

    Cheers,
    Darren :)

      Thanks! Immediately after I posted, I was able to work out a solution:

      my @test_text = $filetext =~ m/([\w\-._+]+\@[\w\-.]+)/gmx; if (scalar @test_text > 50) { #success! }

      I think I'll play around with that module though. I sometimes forget how awesome CPAN is.
      ;-)
      mC
        You don't need to populate an array just to get a count of the times the pattern matches.
        my $count = () = $filetext =~ /[\w._+-]+\@[\w.-]+/g; if ( $count > 50 ) { #success! }