comment on

I am to read many textual files which represent 2D data/matrices, where "interesting" lines contain column and row indexes. Some of them should be skipped. In fact, the whole project works great and fast enough, I'm just puzzled, idly, at benchmarks when later I sought to "improve"/refactor. Data and code are reduced to nonsense for SSCCE.

use strict;
use warnings;
use feature 'say';

use List::Util 'any';
use Benchmark 'cmpthese';

my $data = '';
for my $r ( 0 .. 31 ) {
    for my $c ( 0 .. 31 ) {
        $data .= "$c $r whatever\n"
    }
}
# say $data; die;
my @skip = ( 0, 15, 16, 31 );

cmpthese -1, {
    ugly => sub {
        while ( $data =~ /^(\d+) (\d+)/mg ) {
            next if $1 == 0 or $1 == 15 or $1 == 16 or $1 == 31;
            next if $2 == 0 or $2 == 15 or $2 == 16 or $2 == 31;
            
            # something useful happens here,
            # after uninteresting entries have been skipped
            
        }
        return 1
    },
    ugly_cr => sub {
        while ( $data =~ /^(\d+) (\d+)/mg ) {
            my ( $c, $r ) = ( $1, $2 );
            next if $c == 0 or $c == 15 or $c == 16 or $c == 31;
            next if $r == 0 or $r == 15 or $r == 16 or $r == 31;
        }
        return 1
    },
    any => sub {
        while ( $data =~ /^(\d+) (\d+)/mg ) {
            next if any { $1 == $_ } @skip;
            next if any { $2 == $_ } @skip;
        }
        return 1
    },
    any_cr => sub {
        while ( $data =~ /^(\d+) (\d+)/mg ) {
            my ( $c, $r ) = ( $1, $2 );
            next if any { $c == $_ } @skip;
            next if any { $r == $_ } @skip;
        }
        return 1
    }
};
[download]

Output:

          Rate  any_cr     any    ugly ugly_cr
any_cr   331/s      --    -54%    -64%    -74%
any      724/s    119%      --    -22%    -43%
ugly     930/s    181%     28%      --    -26%
ugly_cr 1265/s    282%     75%     36%      --
[download]

Initial/working code is similar to "ugly_cr". Then I thought maybe I'd postpone assignment to lexicals until filtering out irrelevant lines. Will it be faster? No. The fact that "ugly" gets slower I speculate is related to $1, etc. being read-only, they are numified on each of the 4 comparisons. Is this correct?

Then maybe "any" because it's XS will be fast and nice to look at and easy to add more r/c to skip later? It's a little slow for just 4 elements in array to skip, I wouldn't be surprised too much about result I got. What I'm completely puzzled about is "any_cr" is slower yet. Why? And why asymmetry about "ugly vs. ugly_cr" and "any vs. any_cr"? I don't understand.

In reply to Why is "any" slow in this case? by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.