Re^4: Sort/Uniq Help

Replies are listed 'Best First'.
Re^5: Sort/Uniq Help by poolpi (Hermit) on Mar 18, 2008 at 13:47 UTC
By curiosity : This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /net/; use Benchmark qw( cmpthese ); my $line = q{127.0.0.1}; cmpthese -10, { RE => '$line =~ /\A $RE{net}{IPv4} [\|] password [\|] (ssn=) \z/xmi' +, RE_O => '$line =~ /\A $RE{net}{IPv4} [\|] password [\|] (ssn=) \z/xm +io', ORIG => '$line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\\|pas +sword\\|(ssn=)/i', RE_CHAR => 'use charnames qw( :full); $line =~ /\A $RE{net}{IPv4} \N{LINE TABULATION} password \N{LINE TABULATION} (ssn=) \z/xmi' }; [download] `Rate RE_CHAR RE RE_O ORIG RE_CHAR 17366/s -- -2% -2% -100% RE 17704/s 2% -- -0% -100% RE_O 17747/s 2% 0% -- -100% ORIG 12717477/s 73132% 71732% 71561% --` [download] PooLpi 'Ebry haffa hoe hab im tik a bush'. Jamaican proverb	[reply] [d/l] [select]
Re^6: Sort/Uniq Help by moritz (Cardinal) on Mar 18, 2008 at 14:37 UTC
This example shows that the main speed difference is reall Regexp::Common, which does a bit more than just match `\d{1,3}\.\d{1,3}\. ..`: `$ perl -MRegexp::Common=net -wle 'print $RE{net}{IPv4}' (?:(?:25[0-5]\|2[0-4][0-9]\|[0-1]?[0-9]{1,2})[.](?:25[0-5]\|2[0-4][0-9]\|[ +0-1]?[0-9]{1,2})[.](?:25[0-5]\|2[0-4][0-9]\|[0-1]?[0-9]{1,2})[.](?:25[0 +-5]\|2[0-4][0-9]\|[0-1]?[0-9]{1,2}))` [download] The optimzation I talked about kicks in when the string is much longer, and the literal char occurs only once or twice. Then the literal is used as an anchor, thus reducing the need for backtracking. `#!/usr/bin/perl use strict; use warnings; my $line = ('a' x 500) . 'b!' . ('a' x 20); use Benchmark qw( cmpthese ); cmpthese -3, { literal => sub {$line =~ /a.{1,10}b!/ }, class => sub {$line =~ /a.{1,10}[b][!]/}, }; __END__ Rate class literal class 3855/s -- -99% literal 712766/s 18390% --` [download] Update: added benchmark	[reply] [d/l] [select]
Re^7: Sort/Uniq Help by poolpi (Hermit) on Mar 20, 2008 at 09:36 UTC
Yes, it's interesting and i've made a few more tests to find some optimization. See below : #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my $line = ('a' x 500) . 'b!' . ('a' x 20); cmpthese -10, { literal => sub {$line =~ /a.{1,10}b!/ }, literal_exp => sub {$line =~ /a(?>.{1,10})b!/ }, literal_look => sub {$line =~ /a(?>.{1,10})(?<=b!)/ }, class => sub {$line =~ / [a] .{1,10} [b] [!] /x}, class_exp => sub {$line =~ / a .{1,10} b [!] /x}, class_back => sub {$line =~ / a (?> .{1,10} ) b [!] /x}, class_back_look => sub {$line =~ / a (?> .{1,10} ) (?<=b[!]) /x +}, class_b => sub {$line =~ / a .{1,10} (?<=b[!]) /x +}, }; [download] Rate class_b class class_back_look literal_look l +iteral class_back literal_exp class_exp class_b 2172/s -- -53% -72% -75% + -100% -100% -100% -100% class 4651/s 114% -- -40% -46% + -99% -99% -99% -99% class_back_look 7809/s 259% 68% -- -10% + -99% -99% -99% -99% literal_look 8658/s 299% 86% 11% -- + -99% -99% -99% -99% literal 640333/s 29376% 13666% 8100% 7296% + -- -9% -11% -13% class_back 704687/s 32338% 15050% 8924% 8039% + 10% -- -2% -4% literal_exp 722209/s 33145% 15426% 9149% 8242% + 13% 2% -- -2% class_exp 733546/s 33667% 15670% 9294% 8373% + 15% 4% 2% -- [download] Thanks for your interest and your enlightment. PooLpi 'Ebry haffa hoe hab im tik a bush'. Jamaican proverb	[reply] [d/l] [select]

By curiosity :

This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

#!/usr/bin/perl 
use strict;
use warnings;
use Regexp::Common qw /net/;
use Benchmark qw( cmpthese );

my $line = q{127.0.0.1}; 

cmpthese -10, {
    RE => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xmi'
+,

    RE_O => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xm
+io',

    ORIG => '$line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\|pas
+sword\|(ssn=)/i',

    RE_CHAR => 'use charnames qw( :full);
                $line =~ /\A $RE{net}{IPv4}
                             \N{LINE TABULATION}
                             password
                             \N{LINE TABULATION}
                             (ssn=) \z/xmi'

};
[download]

             Rate    RE_CHAR   RE     RE_O    ORIG
RE_CHAR    17366/s      --     -2%     -2%   -100%
RE         17704/s      2%      --     -0%   -100%
RE_O       17747/s      2%      0%      --   -100%
ORIG    12717477/s  73132%  71732%  71561%      --
[download]

'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

[reply]
[d/l]
[select]

Regexp::Common

\d{1,3}\.\d{1,3}\. ..

$ perl -MRegexp::Common=net -wle 'print $RE{net}{IPv4}'
(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[
+0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0
+-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))
[download]

#!/usr/bin/perl
use strict;
use warnings;

my $line = ('a' x 500) . 'b!' . ('a' x 20);

use Benchmark qw( cmpthese );

cmpthese -3, {
    literal => sub {$line =~ /a.{1,10}b!/ },
    class   => sub {$line =~ /a.{1,10}[b][!]/},
};
__END__
            Rate   class literal
class     3855/s      --    -99%
literal 712766/s  18390%      --
[download]

Update: added benchmark

[reply]
[d/l]
[select]

Yes, it's interesting and i've made a few more tests to find some optimization.
See below :

#!/usr/bin/perl 
use strict;
use warnings;
use Benchmark qw( cmpthese );

my $line = ('a' x 500) . 'b!' . ('a' x 20);


cmpthese -10, {
    literal     => sub {$line =~    /a.{1,10}b!/ },
    literal_exp => sub {$line =~    /a(?>.{1,10})b!/ },
    literal_look => sub {$line =~   /a(?>.{1,10})(?<=b!)/ },
    class       => sub {$line =~  / [a]    .{1,10}  [b] [!] /x},
    class_exp   => sub {$line =~   / a     .{1,10}   b  [!] /x},
    class_back  => sub {$line =~   / a (?> .{1,10} ) b  [!] /x},
    class_back_look  => sub {$line =~  /  a (?> .{1,10} ) (?<=b[!]) /x
+},
    class_b  => sub {$line =~          /  a     .{1,10}   (?<=b[!]) /x
+},
};
[download]

                    Rate class_b  class class_back_look literal_look l
+iteral class_back literal_exp class_exp
class_b           2172/s      --   -53%            -72%         -75%  
+ -100%      -100%       -100%     -100%
class             4651/s    114%     --            -40%         -46%  
+  -99%       -99%        -99%      -99%
class_back_look   7809/s    259%    68%              --         -10%  
+  -99%       -99%        -99%      -99%
literal_look      8658/s    299%    86%             11%           --  
+  -99%       -99%        -99%      -99%
literal         640333/s  29376% 13666%           8100%        7296%  
+    --        -9%        -11%      -13%
class_back      704687/s  32338% 15050%           8924%        8039%  
+   10%         --         -2%       -4%
literal_exp     722209/s  33145% 15426%           9149%        8242%  
+   13%         2%          --       -2%
class_exp       733546/s  33667% 15670%           9294%        8373%  
+   15%         4%          2%        --
[download]

Thanks for your interest and your enlightment.

PooLpi

'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

[reply]
[d/l]
[select]