comment on

but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?

Just the fact that you hide a loop as regexp alternatives doesn't mean it's suddenly orders of a magnitude faster. In fact, it might as well be that splitting the regexp in smaller chunks is faster, because the optimizer kicks in.

Here's a benchmark:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw /cmpthese/;
                 
our @regexes = (
    '.*\.jpg$',
    '.*\.png$',
    'Perl',
    '\.mozilla/abigail',
);
                     
our @words = `find /home/abigail`;  # 38517 files.
our ($c1, $c2);
                    
cmpthese -60 => {
    single   => 'my $regex = join "|" => @regexes;
                 $c1 = 0;
                 for my $w (@words) {
                     $c1 ++ if $w =~ /$regex/
                 }',
     many    => '$c2 = 0;
               WORD:
                 for my $w (@words) {
                     for my $r (@regexes) {
                         $c2 ++, next WORD if $w =~ /$r/
                     }
                 }',
};
    
die "Unequal\n" unless $c1 == $c2;
                     
__END__
       s/iter single   many
single   4.86     --   -74%
many     1.28   281%     --
[download]

Now, for your particular data set results might be different. But don't assume alternatives are necessarely slower.

Abigail

In reply to Re: Returning regexp pattern that was used to match by Abigail-II
in thread Returning regexp pattern that was used to match by crabbdean

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.