Re^3: Yet Another "Matching over a list of conditions"-like technique

But isn't "surprisingly annoying" a bit excessive?!?

It's what it is. It took me a moment of staring to figure out that you were doing something much simpler than the code suggested at first glance.

As far as I'm concerned I do not like to iterate over indices

Instead you iterate over elements and increment $cnt for each one. A rose by any other name… Neither do I like to, btw. I'm usually the one telling people not to. But when that's what's gotta be done, then that's what's gotta be done. (Perl6, as always, will have the solution, but alas, it's so far away yet…)

i.e. use some trick to let the "most probable" choice be the first one to be tested.

In that case I'd bubble single elements to the top. That retains memory of second-, third-, etc most likely matches based on previous data. Depending on the patterns in your data that may or may not be more, less or equally efficient.

while ( <> ) {
    chomp;
    for my $i ( 0 .. $#pre ) {
        local $_ = $pre[ $i ] . $_;
        next if not -e;
        print;
        @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i;
        last;
    }
}
[download]

I'm too lazy to set up a test environment to benchmark this, sorry. Depending on how much the array lookup costs it might pay to maintain the counter explicitly as you did in your code, though I can't quite believe that.

Note that in your case, a chdir '../../pics/' might speed things up quite a bit if you're testing a lot of files, since stat(2) won't have to traverse the same directories over and over for each and every single file test.

Makeshifts last the longest.

Comment on Re^3: Yet Another "Matching over a list of conditions"-like technique Select or Download Code

Replies are listed 'Best First'.
Re^4: Yet Another "Matching over a list of conditions"-like technique by blazar (Canon) on Dec 27, 2004 at 12:38 UTC
i.e. use some trick to let the "most probable" choice be the first one to be tested. In that case I'd bubble single elements to the top. That retains memory of second-, third-, etc most likely matches based on previous data. Depending on the patterns in your data that may or may not be more, less or equally efficient. <snip code> I'm too lazy to set up a test environment to benchmark this, sorry. Depending on how much the array lookup costs it might pay to maintain the counter explicitly as you did in your code, though I can't quite believe that. Well, first of all I may silently avoid to mention this detail, but... to be honest the benchmark of the previous post is not correct for, briefly speaking '../../pics/' was wrong, so that C<-e> never succeeded. I did the test again with the code attached at the end of this post. Here's what I get: Original list: Benchmark: timing 1000 iterations of aristotle, blazar... aristotle: 3.07708 wallclock secs ( 1.88 usr 1.20 sys + 0.00 cusr 0.00 csys = 3.08 CPU) @ 324.68/s (n=1000) blazar: 2.96745 wallclock secs ( 1.59 usr 1.38 sys + 0.00 cusr 0.00 csys = 2.97 CPU) @ 336.70/s (n=1000) Rate aristotle blazar aristotle 325/s -- -4% blazar 337/s 4% -- Same list, shuffled: Benchmark: timing 1000 iterations of aristotle, blazar... aristotle: 10.6048 wallclock secs ( 5.79 usr 4.81 sys + 0.00 cusr 0.00 csys = 10.60 CPU) @ 94.34/s (n=1000) blazar: 14.7952 wallclock secs ( 6.58 usr 8.21 sys + 0.00 cusr 0.00 csys = 14.79 CPU) @ 67.61/s (n=1000) Rate blazar aristotle blazar 67.6/s -- -28% aristotle 94.3/s 40% -- List of non-existing files of same length as above: Benchmark: timing 1000 iterations of aristotle, blazar... aristotle: 26.4785 wallclock secs (11.17 usr 15.31 sys + 0.00 cusr 0.00 csys = 26.48 CPU) @ 37.76/s (n=1000) blazar: 25.285 wallclock secs ( 9.81 usr 15.47 sys + 0.00 cusr 0.00 csys = 25.28 CPU) @ 39.56/s (n=1000) Rate aristotle blazar aristotle 37.8/s -- -5% blazar 39.6/s 5% -- I notice that my solution is still slightly more efficient than yours both in 'regular' and 'extremely adverse' conditions whereas it is outperformed by yours in the second case. I wonder if that may be due to the 'history' approach of your new version of the snippet... UPDATE: Indeed I did some more experiments and I noticed that it is definitely so. Not to clobber too much the original text here I've added the new results below, at 'UPDATED RESULTS'. Note that in your case, a chdir '../../pics/' might speed things up quite a bit if you're testing a lot of files, since stat(2) won't have to traverse the same directories over and over for each and every single file test. I am aware of this. Of course this was originally a quick hack. The data set I'm testing this against is a realistic one and the times involved are those shown above, so that I'm not really that mad about performance. As I wrote repeatedly this was meant more of an illustrative example than the definitive word on this issue: in another situation it may apply to the frequently asked question about how to match against a list of regexen or something similar... Here's the code: #!/usr/bin/perl use strict; use warnings; use Benchmark qw/:all :hireswallclock/; my @ARGV0=@ARGV; my @pre=map "pics/$_/", '00'..'16'; cmpthese 1000, { blazar => \&blazar, aristotle =>\&aristotle }, 'all'; sub blazar { @ARGV=@ARGV0; while (<>) { chomp; my $cnt; for my $p (@pre) { local $_ = $p . $_; if (-e) { # print; push @pre, splice @pre, 0, $cnt if $cnt; last; } $cnt++; } } } sub aristotle { @ARGV=@ARGV0; while ( <> ) { chomp; for my $i ( 0 .. $#pre ) { local $_ = $pre[ $i ] . $_; next if not -e; # print; @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i; last; } } } __END__ [download] UPDATED RESULTS Due to the observation above I modified my sub, now the only difference with Aristotle's one is that I iterate over the array and maintain a counter manually (what that incidentally he considers to be surprisingly annoying to follow) whereas he iterates over indices. I also included a "naive" sub for comparison; the tested subs are now as follows: `sub blazar { @ARGV=@ARGV0; while (<>) { chomp; my $cnt; for my $p (@pre) { local $_ = $p . $_; if (-e) { # print; @pre[0..$cnt] = @pre[$cnt,0..$cnt-1] if $cnt; last; } $cnt++; } } } sub aristotle { @ARGV=@ARGV0; while ( <> ) { chomp; for my $i ( 0 .. $#pre ) { local $_ = $pre[ $i ] . $_; next if not -e; # print; @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i; last; } } } sub naive { @ARGV=@ARGV0; while (<>) { chomp; for my $p (@pre) { local $_ = $p . $_; # print, last if -e; } } }` [download] The results are: Original list: Benchmark: timing 1000 iterations of aristotle, blazar, naive... aristotle: 3.05163 wallclock secs ( 1.86 usr 1.19 sys + 0.00 cusr 0.00 csys = 3.05 CPU) @ 327.87/s (n=1000) blazar: 2.7879 wallclock secs ( 1.63 usr 1.15 sys + 0.00 cusr 0.00 csys = 2.78 CPU) @ 359.71/s (n=1000) naive: 11.2071 wallclock secs ( 4.28 usr 6.93 sys + 0.00 cusr 0.00 csys = 11.21 CPU) @ 89.21/s (n=1000) Rate naive aristotle blazar naive 89.2/s -- -73% -75% aristotle 328/s 268% -- -9% blazar 360/s 303% 10% -- Same list, shuffled: Benchmark: timing 1000 iterations of aristotle, blazar, naive... aristotle: 11.3231 wallclock secs ( 5.89 usr 5.41 sys + 0.00 cusr 0.00 csys = 11.30 CPU) @ 88.50/s (n=1000) blazar: 10.1344 wallclock secs ( 5.46 usr 4.67 sys + 0.00 cusr 0.00 csys = 10.13 CPU) @ 98.72/s (n=1000) naive: 7.86385 wallclock secs ( 3.19 usr 4.66 sys + 0.00 cusr 0.00 csys = 7.85 CPU) @ 127.39/s (n=1000) Rate aristotle blazar naive aristotle 88.5/s -- -10% -31% blazar 98.7/s 12% -- -23% naive 127/s 44% 29% -- List of non-existing files of same length as above: Benchmark: timing 1000 iterations of aristotle, blazar, naive... aristotle: 32.2168 wallclock secs (11.26 usr 20.92 sys + 0.00 cusr 0.00 csys = 32.18 CPU) @ 31.08/s (n=1000) blazar: 25.3138 wallclock secs ( 9.91 usr 15.40 sys + 0.00 cusr 0.00 csys = 25.31 CPU) @ 39.51/s (n=1000) naive: 24.7686 wallclock secs ( 9.31 usr 15.46 sys + 0.00 cusr 0.00 csys = 24.77 CPU) @ 40.37/s (n=1000) Rate aristotle blazar naive aristotle 31.1/s -- -21% -23% blazar 39.5/s 27% -- -2% naive 40.4/s 30% 2% -- I also tried it on the three lists all at a time, and while I do not consider this result to be terribly indicative, still I find it encouraging: Benchmark: timing 1000 iterations of aristotle, blazar, naive... aristotle: 39.3629 wallclock secs (18.54 usr 20.81 sys + 0.00 cusr 0.00 csys = 39.35 CPU) @ 25.41/s (n=1000) blazar: 37.3122 wallclock secs (16.47 usr 20.84 sys + 0.00 cusr 0.00 csys = 37.31 CPU) @ 26.80/s (n=1000) naive: 38.9548 wallclock secs (15.16 usr 23.79 sys + 0.00 cusr 0.00 csys = 38.95 CPU) @ 25.67/s (n=1000) Rate aristotle naive blazar aristotle 25.4/s -- -1% -5% naive 25.7/s 1% -- -4% blazar 26.8/s 5% 4% --	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Yet Another "Matching over a list of conditions"-like technique
by blazar (Canon) on Dec 27, 2004 at 12:38 UTC

i.e. use some trick to let the "most probable" choice be the first one to be tested.

In that case I'd bubble single elements to the top. That retains memory of second-, third-, etc most likely matches based on previous data. Depending on the patterns in your data that may or may not be more, less or equally efficient.

I'm too lazy to set up a test environment to benchmark this, sorry. Depending on how much the array lookup costs it might pay to maintain the counter explicitly as you did in your code, though I can't quite believe that.

Original list:

Benchmark: timing 1000 iterations of aristotle, blazar...
 aristotle: 3.07708 wallclock secs ( 1.88 usr  1.20 sys +  0.00 cusr  0.00 csys
=  3.08 CPU) @ 324.68/s (n=1000)
    blazar: 2.96745 wallclock secs ( 1.59 usr  1.38 sys +  0.00 cusr  0.00 csys
=  2.97 CPU) @ 336.70/s (n=1000)
           Rate aristotle    blazar
aristotle 325/s        --       -4%
blazar    337/s        4%        --

Same list, shuffled:

Benchmark: timing 1000 iterations of aristotle, blazar...
 aristotle: 10.6048 wallclock secs ( 5.79 usr  4.81 sys +  0.00 cusr  0.00 csys
= 10.60 CPU) @ 94.34/s (n=1000)
    blazar: 14.7952 wallclock secs ( 6.58 usr  8.21 sys +  0.00 cusr  0.00 csys
= 14.79 CPU) @ 67.61/s (n=1000)
            Rate    blazar aristotle
blazar    67.6/s        --      -28%
aristotle 94.3/s       40%        --

List of non-existing files of same length as above:

Benchmark: timing 1000 iterations of aristotle, blazar...
 aristotle: 26.4785 wallclock secs (11.17 usr 15.31 sys +  0.00 cusr  0.00 csys
= 26.48 CPU) @ 37.76/s (n=1000)
    blazar: 25.285 wallclock secs ( 9.81 usr 15.47 sys +  0.00 cusr  0.00 csys = 25.28 CPU) @ 39.56/s (n=1000)
            Rate aristotle    blazar
aristotle 37.8/s        --       -5%
blazar    39.6/s        5%        --

slightly

UPDATE: Indeed I did some more experiments and I noticed that it is definitely so. Not to clobber too much the original text here I've added the new results below, at 'UPDATED RESULTS'.

Note that in your case, a chdir '../../pics/' might speed things up quite a bit if you're testing a lot of files, since stat(2) won't have to traverse the same directories over and over for each and every single file test.

Here's the code:

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw/:all :hireswallclock/;

my @ARGV0=@ARGV;
my @pre=map "pics/$_/", '00'..'16';

cmpthese 1000, { blazar => \&blazar,
                 aristotle =>\&aristotle }, 'all';

sub blazar {
    @ARGV=@ARGV0;
    while (<>) {
        chomp;
        my $cnt;
        for my $p (@pre) {
            local $_ = $p . $_;
            if (-e) {
                # print;
                push @pre, splice @pre, 0, $cnt if $cnt;
                last;
            }
            $cnt++;
        }
    } 
}

sub aristotle {
    @ARGV=@ARGV0;
    while ( <> ) {
        chomp;
        for my $i ( 0 .. $#pre ) {
            local $_ = $pre[ $i ] . $_;
            next if not -e;
            # print;
            @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i;
            last;
        }
    }
}

__END__
[download]

UPDATED RESULTS

surprisingly annoying to follow

sub blazar {
    @ARGV=@ARGV0;
    while (<>) {
        chomp;
        my $cnt;
        for my $p (@pre) {
            local $_ = $p . $_;
            if (-e) {
                # print;
                @pre[0..$cnt] = @pre[$cnt,0..$cnt-1] if $cnt;
                last;
            }
            $cnt++;
        }
    } 
}

sub aristotle {
    @ARGV=@ARGV0;
    while ( <> ) {
        chomp;
        for my $i ( 0 .. $#pre ) {
            local $_ = $pre[ $i ] . $_;
            next if not -e;
            # print;
            @pre[ 0 .. $i ] = @pre[ $i, 0 .. $i - 1 ] if $i;
            last;
        }
    }
}

sub naive {
    @ARGV=@ARGV0;
    while (<>) {
        chomp;
        for my $p (@pre) {
            local $_ = $p . $_;
            # print,
              last if -e;
        }
    } 
}
[download]

Original list:

Benchmark: timing 1000 iterations of aristotle, blazar, naive...
 aristotle: 3.05163 wallclock secs ( 1.86 usr  1.19 sys +  0.00 cusr  0.00 csys
=  3.05 CPU) @ 327.87/s (n=1000)
    blazar: 2.7879 wallclock secs ( 1.63 usr  1.15 sys +  0.00 cusr  0.00 csys =  2.78 CPU) @ 359.71/s (n=1000)
     naive: 11.2071 wallclock secs ( 4.28 usr  6.93 sys +  0.00 cusr  0.00 csys
= 11.21 CPU) @ 89.21/s (n=1000)
            Rate     naive aristotle    blazar
naive     89.2/s        --      -73%      -75%
aristotle  328/s      268%        --       -9%
blazar     360/s      303%       10%        --

Same list, shuffled:

Benchmark: timing 1000 iterations of aristotle, blazar, naive...
 aristotle: 11.3231 wallclock secs ( 5.89 usr  5.41 sys +  0.00 cusr  0.00 csys
= 11.30 CPU) @ 88.50/s (n=1000)
    blazar: 10.1344 wallclock secs ( 5.46 usr  4.67 sys +  0.00 cusr  0.00 csys
= 10.13 CPU) @ 98.72/s (n=1000)
     naive: 7.86385 wallclock secs ( 3.19 usr  4.66 sys +  0.00 cusr  0.00 csys
=  7.85 CPU) @ 127.39/s (n=1000)
            Rate aristotle    blazar     naive
aristotle 88.5/s        --      -10%      -31%
blazar    98.7/s       12%        --      -23%
naive      127/s       44%       29%        --

List of non-existing files of same length as above:

Benchmark: timing 1000 iterations of aristotle, blazar, naive...
 aristotle: 32.2168 wallclock secs (11.26 usr 20.92 sys +  0.00 cusr  0.00 csys
= 32.18 CPU) @ 31.08/s (n=1000)
    blazar: 25.3138 wallclock secs ( 9.91 usr 15.40 sys +  0.00 cusr  0.00 csys
= 25.31 CPU) @ 39.51/s (n=1000)
     naive: 24.7686 wallclock secs ( 9.31 usr 15.46 sys +  0.00 cusr  0.00 csys
= 24.77 CPU) @ 40.37/s (n=1000)
            Rate aristotle    blazar     naive
aristotle 31.1/s        --      -21%      -23%
blazar    39.5/s       27%        --       -2%
naive     40.4/s       30%        2%        --

Benchmark: timing 1000 iterations of aristotle, blazar, naive...
 aristotle: 39.3629 wallclock secs (18.54 usr 20.81 sys +  0.00 cusr  0.00 csys
= 39.35 CPU) @ 25.41/s (n=1000)
    blazar: 37.3122 wallclock secs (16.47 usr 20.84 sys +  0.00 cusr  0.00 csys
= 37.31 CPU) @ 26.80/s (n=1000)
     naive: 38.9548 wallclock secs (15.16 usr 23.79 sys +  0.00 cusr  0.00 csys
= 38.95 CPU) @ 25.67/s (n=1000)
            Rate aristotle     naive    blazar
aristotle 25.4/s        --       -1%       -5%
naive     25.7/s        1%        --       -4%
blazar    26.8/s        5%        4%        --

[reply]
[d/l]
[select]