blazar
Now, is this test flawed?

You are basically correct here. I was too zealous here to advertise the vantages of index() and tr//. They have their run elsewhere, but not in this special case. Thanks for pointing this out.

I abused your benchmark code (of course) to find out on how good the index() optimization in Perl5 really is ;-)

... use Benchmark qw/cmpthese :hireswallclock/; my @a = map { my $s='PM is cool, ' x 10_000; substr($s, rand(length $s), 1, '_'); $s } 1..1000; cmpthese -3 => { C_Idx => sub () { grep C_Idx($_, '_') < 0, @a }, Index => sub () { grep index($_, '_') < 0, @a }, Regex => sub () { grep ! /_/, @a }, Tr => sub () { grep ! tr/_//, @a } }; use Inline C => qq[ int C_Idx(SV* src, SV* chr) { STRLEN srclen, chrlen; char *ssrc = SvPV(src, srclen), *schr = SvPV(chr, chrlen); char *p = ssrc; if( chrlen != 1 ) croak("single characters only for now!"); return (p=memchr(p, *schr, srclen)) != NULL ? p-ssrc : -1; } ]; ...

On my system, somehow above 60-70K strings - the index() falls behind the c-library function for finding a character (memchr). For the above strings:

Rate Tr Regex Index C_Idx Tr 3.17/s -- -74% -74% -87% Regex 12.2/s 284% -- -0% -52% Index 12.2/s 285% 0% -- -52% C_Idx 25.2/s 696% 107% 107% --

I personally believe it'd be much better If I'd read my own posts and think about their assumptions next time much more thoroughly ;-)

Regards

mwa


In reply to Re^3: Regex for Differentiating Underscore and Whitespace by mwah
in thread Regex for Differentiating Underscore and Whitespace by neversaint

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.