in reply to Re^2: Pattern match array
in thread Pattern match array

Letting the regex alternation do the heavy lifting seems to be a bit faster. Tested with a 1235 line C program cat'ed together 20 times.

use strict; use warnings; use Benchmark q{cmpthese}; my @prims = qw{ int char long double static }; my $inFile = q{xxx.c}; open my $inFH, q{<}, $inFile or die qq{open: $inFile: $!\n}; my $outFile = q{/dev/null}; open my $outFH, q{>}, $outFile or die qq{open: $outFile: $!\n}; cmpthese( -10, { JohnGG => sub { seek $inFH, 0, 0; my $rxPrims = do { local $" = q{|}; qr{\b(@prims)\b}; }; while ( <$inFH> ) { next unless my @found = m{$rxPrims}g; print $outFH qq{Found @found on line $.\n}; } }, Narveson => sub { seek $inFH, 0, 0; while ( <$inFH> ) { for my $word ( m{\b(\w+)\b}g ) { if (grep {$word eq $_} @prims) { print $outFH qq{Found $word on line $.\n}; } } } }, } ); close $inFH or die qq{close: $inFile: $!\n}; close $outFH or die qq{close: $outFile: $!\n};

The benchmark output.

Rate Narveson JohnGG Narveson 1.39/s -- -63% JohnGG 3.78/s 173% --

I hope this is of interest.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^4: Pattern match array
by Narveson (Chaplain) on May 09, 2008 at 06:22 UTC

    Thanks, this is of interest.

    I think if I had known benchmarks would be run, I would have hashed instead of grepping.

    my %sought = map {$_ => 1} @prims;

    and later

    if ($sought{$word})

    And then there's List::MoreUtils::any, which would at least quit on the first match instead of checking the rest of the list.

      Using a hash instead of grepping does improve performance but the regex alternation still seems to retain the advantage.

      Rate Narveson Narveson2 JohnGG Narveson 1.39/s -- -30% -64% Narveson2 1.99/s 43% -- -48% JohnGG 3.81/s 174% 91% --

      Cheers,

      JohnGG