comment on

I notice that you're trying to use the range operator, .., inside your character class. [A..Z] won't do what you think; it will match an A, or a period, or a Z. The way to specify a range in a character class is with a dash; [A-Z] will match any capital letter.

The part we really want to benchmark is just the code that does the check; the creation of the test character and the if block can be factored out. Note that, except for comp, the actual character doesn't matter, just whether or not it matches.

Due to your use of rand 256 to generate the characters, your benchmark is heavily weighted towards non-matching instances. I've split each sub into matching and non-matching versions (comp required three non-matching versions).

I shortened the names of the subs just so the chart would be more compact with all the extra subs I added. :)

#!perl -w

use strict;

use Benchmark qw/cmpthese/;

my %tran_types;
foreach ('A'..'Z') { $tran_types{$_} = "" };

my $char_y = 'J';       # yes
my $char_n = '_';       # no
my $char_n_l = 'AA';    # no; too long
my $char_n_A = '0';     # no; lt 'A'
my $char_n_Z = 'a';     # no; gt 'Z'

cmpthese(-(shift), {
    're_Y'  => sub { $char_y =~ /^[A-Z]\z/ },
    're_n'  => sub { $char_n =~ /^[A-Z]\z/ },
    'co_Y'  => sub { length $char_y == 1 and
                       $char_y ge "A" and $char_y le "Z" },
    'co_nA' => sub { length $char_n_A == 1 and
                       $char_n_A ge "A" and $char_n_A le "Z" },
    'co_nZ' => sub { length $char_n_Z == 1 and
                       $char_n_Z ge "A" and $char_n_Z le "Z" },
    'co_nl' => sub { length $char_n_l == 1 and
                       $char_n_l ge "A" and $char_n_l le "Z" },
    'ex_Y'  => sub { exists $tran_types{$char_y} },
    'ex_n'  => sub { exists $tran_types{$char_n} },
});
[download]

And the results... I first tried running each snippet for 3 seconds, but the results varied too much from run to run. I forced myself to be patient and run each snippet for 15 seconds. :)

Benchmark: running co_Y, co_nA, co_nZ, co_nl, ex_Y, ex_n, re_Y, re_n, 
+each for at least 15 CPU seconds...
      co_Y: 18 wallclock secs (14.95 usr +  0.05 sys = 15.00 CPU) @ 26
+6935.00/s (n=4004025)
     co_nA: 15 wallclock secs (15.63 usr + -0.03 sys = 15.60 CPU) @ 35
+5186.35/s (n=5540907)
     co_nZ: 18 wallclock secs (15.71 usr +  0.10 sys = 15.81 CPU) @ 26
+2582.80/s (n=4151434)
     co_nl: 17 wallclock secs (16.12 usr + -0.09 sys = 16.03 CPU) @ 72
+0797.75/s (n=11554388)
      ex_Y: 14 wallclock secs (16.11 usr + -0.02 sys = 16.09 CPU) @ 71
+2203.05/s (n=11459347)
      ex_n: 14 wallclock secs (16.72 usr + -0.62 sys = 16.10 CPU) @ 86
+2264.35/s (n=13882456)
      re_Y: 16 wallclock secs (15.18 usr +  0.12 sys = 15.30 CPU) @ 20
+2467.06/s (n=3097746)
      re_n: 17 wallclock secs (15.97 usr + -0.02 sys = 15.95 CPU) @ 24
+3489.40/s (n=3883656)
          Rate  re_Y  re_n co_nZ  co_Y co_nA  ex_Y co_nl  ex_n
re_Y  202467/s    --  -17%  -23%  -24%  -43%  -72%  -72%  -77%
re_n  243489/s   20%    --   -7%   -9%  -31%  -66%  -66%  -72%
co_nZ 262583/s   30%    8%    --   -2%  -26%  -63%  -64%  -70%
co_Y  266935/s   32%   10%    2%    --  -25%  -63%  -63%  -69%
co_nA 355186/s   75%   46%   35%   33%    --  -50%  -51%  -59%
ex_Y  712203/s  252%  192%  171%  167%  101%    --   -1%  -17%
co_nl 720798/s  256%  196%  175%  170%  103%    1%    --  -16%
ex_n  862264/s  326%  254%  228%  223%  143%   21%   20%    --
[download]

No surprises here; generally the same results as in your benchmark. The results do show the value of considering matching and non-matching cases. exists is still the winner, executing quickly whether the string matches or not. The revised regex solution is still slower than the others. comp is actually quite fast when the string is too long, but if it has to execute all three comparisons it's much slower.

In reply to Re: Testing a string for a range of characters by chipmunk
in thread Testing a string for a range of characters by BoredByPolitics

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.