in reply to counting regex hits

I know that it is well-known that tr is faster than m//, nevertheless I thought I write my first Benchmark.pm-using, aehmm, well, benchmark :-)

#!/usr/bin/perl -w use strict; use Benchmark qw(:all); my @rands = qw (A---B-B---B---D----F----G----H----J----K-----L---F-D-- +F---G--H---_R_-f-F-ff-----f ----F----G-- ----F----G------F----G------F----G-- ----F----G------F----G------F----G------F----G-- ----F----G------F----G------F----G------F----G-----F----G--- ----F----G-----F----G--- ----F----G------F----G------F----G--); my $max = @rands; cmpthese(1000000, { 'tr' => sub { $_ = $rands[int(rand($max))]; my $x = tr/-//; }, 'm//' => sub { $_ = $rands[int(rand($max))]; my $x = () = $_ =~ /-/g; }, } );

I'd be glad to hear if I did something considerably stupid here :-), under this proviso the (well known!) results:

__END__ Benchmark: timing 1000000 iterations of m//, tr... m//: 57 wallclock secs (54.53 usr + 0.07 sys = 54.60 CPU) @ 18 +315.02/s (n=1000000) tr: 1 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 52 +0833.33/s (n=1000000) Rate m// tr m// 18315/s -- -96% tr 520833/s 2744% --

Quite impressive difference, isn't it?

regards,
tomte


Replies are listed 'Best First'.
Re: Re: counting regex hits Benchamark
by cees (Curate) on Mar 18, 2003 at 15:35 UTC

    The only thing that I would change is the 'random feature' you put in the benchmark. This will throw off your results, since you are not providing the same value to each function. In the worst case the 'tr' function could get the shortest string and the 'm//' function gets the longest string every time.

    Of course this ambiguity would be evened out since you do a million iterations, and I'll bet your results will not change much by fixing this. But, I would replace the random function and just loop through each item in every iteration.

    cmpthese(100000, { 'tr' => sub { foreach (@rands) { my $x = tr/-//; } }, 'm//' => sub { foreach (@rands) { my $x = () = $_ =~ /-/g; } }, } );

    This way you are guaranteed an even distribution of your sample data.

    I agree with you that the results are impressive. Definately something to keep in the bin of useful perl knowledge...

      No regex required:
      $strg = qq[This has- some- d-a-she-s -in it]; %cnt = (); @chrs = split('',$strg); foreach (@chrs) { $cnt{$_}++; } print qq[$cnt{'-'}\n];

        Why is this a reply to me?
        I do not get your point...

        ...but I "can't hold my water" anyways:
        If this is a reminder that I forgot in my (possibly silly ;-) benchmark a viable alternative, I assure you that a for-loop counting each characters occurences isn't a viable alternative if you're interested in /one/ character only, as is the OP. ¹

        kind regards,
        tomte


        ¹ silly me: ;-D

Re^2: counting regex hits Benchamark (except)
by tye (Sage) on Mar 19, 2003 at 00:13 UTC
    I know that it is well-known that tr is faster than m//

    Except when it is not.

    Quite impressive difference, isn't it?
    No. The difference is 1/18982.5th of second which isn't impressive at all. q-: The quotient looks impressive but benchmark has to go to a lot of work to be able to guess at that so you won't see a quotient anything close to that in practice.

    Congratulations, you've now prematurely optimized this nano-operation.

    I find that the difference is usually more indicative of the practical value of the optmization than the quotient.

                    - tye (cheap philosophy shouldn't cost a lot)

      I know that it is well-known that tr is faster than m//
      Except when it is not.

      Point taken: I amend my original proposition with:
      if the problem is as simple as the OPs, that is if a simple transliteration is your goal, or a sideeffect of this transliteration, that you may as well use the transliteration operator seems to be a well known fact.

      :-)

      Congratulations, you've now prematurely optimized this nano-operation.

      Do not fall pray to false conclusions:
      I was interested in the abstract comparison of two nano-operations and in how to use Benchmark.pm myself, I didn't optimize anything, how could I? I don't have code lying around using any of these oprarators to actually count anything. So as this wasn't optimization, how could it be premature?;-p

      I find that the difference is usually more indicative of the practical value of the optmization than the quotient.

      this point taken again

      kind regards,
      tomte