Re^2: Performance optimization question

Replies are listed 'Best First'.
Re^3: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 01:20 UTC
Yes. I forget how much difference that can make. Though joost's idea (with several modifications) works out fastest: #! perl -slw use strict; use Data::Dump qw[ pp ]; use Benchmark qw[ cmpthese ]; our $string = join '\|', map{ join rand() < 0.2 ? 'fred' : 'bill', 'pqr', 'xyz' } 1 .. 10000; our $first = 0; our %counts; cmpthese -1, { orig => q[ my @arr = split(/\\|/, $string); my @arr1 = grep { /fred/ } @arr; $counts{ orig } = @arr1; ], Buk1 => q[ my @arr1 = grep { /fred/ } split /\\|/, $string; $counts{ Buk1 } = @arr1; ], jwkrahn => q[ my @arr1 = grep /fred/, split /\\|/, $string; $counts{ jwkrahn } = @arr1; ], Buk2 => q[ my @arr1 = $string =~ m[(?:^\|\\|)(.?fred.?)(?=\\|\|$)]g; $counts{ Buk2 } = @arr1; ], JOOST => q[ my @arr1 = $string =~ /(?:^\|\\|)([^\|]?fred[^\|]?)(?=\\|\|$)/g; $counts{ JOOST } = @arr1; ], }; pp \%counts; __END__ c:\test>junk6 Rate orig Buk1 JOOST jwkrahn Buk2 orig 20.2/s -- -28% -48% -58% -84% Buk1 28.1/s 39% -- -28% -42% -77% JOOST 39.2/s 94% 39% -- -19% -68% jwkrahn 48.4/s 140% 72% 24% -- -61% Buk2 124/s 515% 342% 217% 157% -- { Buk1 => 2010, Buk2 => 2010, JOOST => 2010, jwkrahn => 2010, orig => +2010 } [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: Performance optimization question by jwkrahn (Abbot) on Apr 03, 2008 at 03:41 UTC
I get slightly different results: $ perl -le' open D, q[/usr/share/dict/words] or die $!; my $string = join "\|", <D>; use Benchmark qw/cmpthese/; cmpthese -20, { orig => sub { my @arr = split /\\|/, $string; my @arr1 = grep { /oug/ } @arr; return @arr1; }, Buk1 => sub { my @arr1 = grep { /oug/ } split /\\|/, $string; return @arr1; }, jwkrahn => sub { my @arr1 = grep /oug/, split /\\|/, $string; return @arr1; }, Buk2 => sub { my @arr1 = $string =~ /(?:^\|\\|)(.?oug.?)(?=\\|\|$)/g; return @arr1; }, JOOST => sub { my @arr1 = $string =~ /(?:^\|\\|)([^\|]?oug[^\|]?)(?=\\|\|$)/g; return @arr1; }, }; ' Rate JOOST Buk2 orig Buk1 jwkrahn JOOST 4.37/s -- -10% -13% -38% -67% Buk2 4.87/s 11% -- -4% -31% -63% orig 5.05/s 15% 4% -- -28% -62% Buk1 7.05/s 61% 45% 40% -- -47% jwkrahn 13.2/s 202% 171% 162% 87% -- [download] `YMMV` `:-)`	[reply] [d/l] [select]
Re^5: Performance optimization question by BrowserUk (Patriarch) on Apr 03, 2008 at 04:36 UTC
YMMV :-) It did :) If you add a hits counter as in my benchmark above, you'll see the reason why. Buk2 does not match anything at all (and fails slowly) in your benchmark: `c:\test>junk6-b Rate orig Buk1 JOOST Buk2 jwkrahn orig 1.59/s -- -30% -55% -59% -63% Buk1 2.29/s 44% -- -36% -41% -47% JOOST 3.57/s 125% 56% -- -8% -18% Buk2 3.88/s 144% 70% 9% -- -11% jwkrahn 4.35/s 173% 90% 22% 12% -- { Buk1 => 203, Buk2 => 0, JOOST => 203, jwkrahn => 203, orig => 203 } ...............*********` [download] And that intrigued me, until I noticed that you are not chomping your data. Once you chomp it, you'll find that Buk2 matches the same number as the others, and runs much (1100%) faster than grep: `c:\test>junk6-b Rate orig Buk1 JOOST jwkrahn Buk2 orig 1.59/s -- -29% -59% -64% -97% Buk1 2.23/s 40% -- -42% -49% -96% JOOST 3.84/s 141% 72% -- -12% -93% jwkrahn 4.37/s 175% 96% 14% -- -92% Buk2 53.0/s 3231% 2280% 1281% 1113% -- { Buk1 => 203, Buk2 => 203, JOOST => 203, jwkrahn => 203, orig => 203 +}` [download] Alternatively, just add `/s` to the regex in Buk2 in your benchmark, and it will also allow Buk2 to work, but it will only be 500% faster. `c:\test>junk6-b Rate orig Buk1 JOOST jwkrahn Buk2 orig 1.59/s -- -30% -55% -63% -94% Buk1 2.27/s 43% -- -36% -48% -92% JOOST 3.54/s 122% 56% -- -18% -88% jwkrahn 4.33/s 172% 91% 22% -- -85% Buk2 28.5/s 1693% 1155% 707% 559% -- { Buk1 => 203, Buk2 => 203, JOOST => 203, jwkrahn => 203, orig => 203 +}` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Performance optimization question by vit (Friar) on Apr 03, 2008 at 02:06 UTC
THANK YOU ALL!! I really appreciate, extremely useful !!!	[reply]

Yes. I forget how much difference that can make. Though joost's idea (with several modifications) works out fastest:

#! perl -slw
use strict;
use Data::Dump qw[ pp ];
use Benchmark qw[ cmpthese ];

our $string = join '|', map{
    join rand() < 0.2
        ? 'fred'
        : 'bill',
    'pqr', 'xyz'
} 1 .. 10000;

our $first = 0;
our %counts;
cmpthese -1, {
    orig => q[
        my @arr = split(/\|/, $string);
        my @arr1 = grep { /fred/ } @arr;
        $counts{ orig } = @arr1;
    ],
    Buk1 => q[
        my @arr1 = grep { /fred/ } split /\|/, $string;
        $counts{ Buk1 } = @arr1;
    ],
    jwkrahn => q[
        my @arr1 = grep /fred/, split /\|/, $string;
        $counts{ jwkrahn } = @arr1;
    ],
    Buk2 => q[
        my @arr1 = $string =~ m[(?:^|\|)(.*?fred.*?)(?=\||$)]g;
        $counts{ Buk2 } = @arr1;
    ],
    JOOST => q[
        my @arr1 = $string =~ /(?:^|\|)([^|]*?fred[^|]*?)(?=\||$)/g;
        $counts{ JOOST } = @arr1;
    ],
};

pp \%counts;
__END__
c:\test>junk6
          Rate    orig    Buk1   JOOST jwkrahn    Buk2
orig    20.2/s      --    -28%    -48%    -58%    -84%
Buk1    28.1/s     39%      --    -28%    -42%    -77%
JOOST   39.2/s     94%     39%      --    -19%    -68%
jwkrahn 48.4/s    140%     72%     24%      --    -61%
Buk2     124/s    515%    342%    217%    157%      --
{ Buk1 => 2010, Buk2 => 2010, JOOST => 2010, jwkrahn => 2010, orig => 
+2010 }
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]

I get slightly different results:

$ perl -le'
open D, q[/usr/share/dict/words] or die $!;
my $string = join "|", <D>;
use Benchmark qw/cmpthese/;
cmpthese -20, {
    orig => sub {
        my @arr = split /\|/, $string;
        my @arr1 = grep { /oug/ } @arr;
        return @arr1;
        },
    Buk1 => sub {
        my @arr1 = grep { /oug/ } split /\|/, $string;
        return @arr1;
        },
    jwkrahn => sub {
        my @arr1 = grep /oug/, split /\|/, $string;
        return @arr1;
        },
    Buk2 => sub {
        my @arr1 = $string =~ /(?:^|\|)(.*?oug.*?)(?=\||$)/g;
        return @arr1;
        },
    JOOST => sub {
        my @arr1 = $string =~ /(?:^|\|)([^|]*?oug[^|]*?)(?=\||$)/g;
        return @arr1;
        },
    };
'
          Rate   JOOST    Buk2    orig    Buk1 jwkrahn
JOOST   4.37/s      --    -10%    -13%    -38%    -67%
Buk2    4.87/s     11%      --     -4%    -31%    -63%
orig    5.05/s     15%      4%      --    -28%    -62%
Buk1    7.05/s     61%     45%     40%      --    -47%
jwkrahn 13.2/s    202%    171%    162%     87%      --
[download]

YMMV

:-)

[reply]
[d/l]
[select]

YMMV :-)

It did :)

If you add a hits counter as in my benchmark above, you'll see the reason why. Buk2 does not match anything at all (and fails slowly) in your benchmark:

c:\test>junk6-b
          Rate    orig    Buk1   JOOST    Buk2 jwkrahn
orig    1.59/s      --    -30%    -55%    -59%    -63%
Buk1    2.29/s     44%      --    -36%    -41%    -47%
JOOST   3.57/s    125%     56%      --     -8%    -18%
Buk2    3.88/s    144%     70%      9%      --    -11%
jwkrahn 4.35/s    173%     90%     22%     12%      --
{ Buk1 => 203, Buk2 => 0, JOOST => 203, jwkrahn => 203, orig => 203 }
...............*********
[download]

And that intrigued me, until I noticed that you are not chomping your data. Once you chomp it, you'll find that Buk2 matches the same number as the others, and runs much (1100%) faster than grep:

c:\test>junk6-b
          Rate    orig    Buk1   JOOST jwkrahn    Buk2
orig    1.59/s      --    -29%    -59%    -64%    -97%
Buk1    2.23/s     40%      --    -42%    -49%    -96%
JOOST   3.84/s    141%     72%      --    -12%    -93%
jwkrahn 4.37/s    175%     96%     14%      --    -92%
Buk2    53.0/s   3231%   2280%   1281%   1113%      --
{ Buk1 => 203, Buk2 => 203, JOOST => 203, jwkrahn => 203, orig => 203 
+}
[download]

Alternatively, just add /s to the regex in Buk2 in your benchmark, and it will also allow Buk2 to work, but it will only be 500% faster.

c:\test>junk6-b
          Rate    orig    Buk1   JOOST jwkrahn    Buk2
orig    1.59/s      --    -30%    -55%    -63%    -94%
Buk1    2.27/s     43%      --    -36%    -48%    -92%
JOOST   3.54/s    122%     56%      --    -18%    -88%
jwkrahn 4.33/s    172%     91%     22%      --    -85%
Buk2    28.5/s   1693%   1155%    707%    559%      --
{ Buk1 => 203, Buk2 => 203, JOOST => 203, jwkrahn => 203, orig => 203 
+}
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]
[select]

THANK YOU ALL!! I really appreciate, extremely useful !!!

[reply]