in reply to Ways to delete start of string

I just found out the time taken by a string assignment is not constant for a given argument. It's dependent on the previous state of the variable to which the string is assigned.

In your test, the time taken by $_ = '|0|0|0|0|0|0|'; is not constant because the previous state of $_ isn't constant. That means you aren't testing what you think you are testing. Using a lexical instead of $_ solves that problem.

Your tests really shouldn't be in subs either. They add a serious overhead, especially since your data is so small.

use strict; use warnings; use Benchmark qw( cmpthese ); my %tests = ( subst => '$x =~ s/.//;', substr_lval => 'substr($x,0,1) = "";', substr_mod => 'substr($x,0,1,"");', reverse => '$x = reverse $x; chop($x); $x = reverse($x);', substr_copy => '$x = substr($x,1);', ); for (values %tests) { $_ = 'use strict; use warnings; my $x = "|0|0|0|0|0|0|"; ' . $_; } cmpthese(-5, \%tests);

Replies are listed 'Best First'.
Re^2: Ways to delete start of string
by BrowserUk (Patriarch) on May 27, 2008 at 10:13 UTC

    You are still testing subroutine call speed rather than the snippets you purport to be testing.

    Whatever code snippets you supply to benchmark, get wrapped internally into subs (See Benchmark::runloop.)

    By using strings instead of subs, you have removed one layer of indirection, but you are still swamping the time taken for the code under test, by the time taken to invoke the subroutine that gets wrapped around it.

    The only way to get anything like an accurate measurement for this type of micro-benchmark, is to add a multiplier loop inside the subroutine Benchmark constructs, so as to amortise the costs of calling that sub over a large number of iterations, to give a+(k/1e4) ~= b/(k/1e4). (* or whatever multiplier is appropriate.)

    Also, I'm not sure what the cost of use strict; and use warnings is, when they have already been loaded, but there must be some if only to discover they are already loaded plus the calling of (or attempted call of) their import subs.

    As Benchmark already adds use strict to the subs it constructs, that's pure duplication. And as it already has use warnings in force internally, when it eval's the subs into existance, I don't think you are gaining anything by adding it to the code that gets eval'd. You are simply mudding the waters further by adding another fixed cost to the tests.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Whatever code snippets you supply to benchmark, get wrapped internally into subs

      That's true. That's why I usually do

      $_ = "use strict; use warnings; for (1..10_000) { my \$x = '|0|0|0|0|0 +|0|'; $_ }";

      to minimize the cost of that sub call.

      Also, I'm not sure what the cost of use strict; and use warnings is, when they have already been loaded,

      Zero. use is executed once, at compile-time. It doesn't generate any code in the tree.

      >perl -MO=Concise -e"use strict; print 'a'" 6 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 2 -e:1) v/2 ->3 5 <@> print vK ->6 3 <0> pushmark s ->4 4 <$> const[PV "a"] s ->5 -e syntax OK
      'purport'! You are suggesting that I'm not testing? Or is this one of those pond translation errors? Must be since I'm clearly posting results. Good point about strict and warnings They are an artifact of a 'new' file in my editor. Could you give an example of 'add a multiplier loop inside the subroutine'? While I still insist that I'm looking for better ways to pre-chop, learning about benchmarking is both fascinating and useful.

      --hsm

      "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

        For an example of multiplier loops inside the subroutines, have a look at Re: Bug or WAD in lvalue substr? (again.).

        If you call a sub in a loop, you are timing how long it takes to call the sub as well as the time it takes to execute the contents of the sub. Often this is not an issue since the contents of the sub take a lot longer than just calling the sub itself. If the loop is inside the sub instead, then you can focus more on the contents of the loop rather than the time it takes to call subs, which is important when you're timing something that's as fast as sub calls.

        For more on benchmarking in general, have a look at the "Benchmarking Perl" chapter in Mastering Perl.

        'purport'! You are suggesting that I'm not testing?

        No. I should have said: "That your benchmark purports to be testing". (Where 'your' refered to ikegami). The point is that it looks like it's testing the right thing, but when you factor in the overhead of the tests, it serves to completely obscure the results.

        Like others, I think this is a fairly fruitless test. If you are doing this once, then the method used will make very little difference. If you are doing it hundreds of thousands of times, then there are far more efficient ways of doing it. For example,

        1. if the idea is to chip off characters until you reach a particular character, then search for the character and then chop the lump off:
          ## Either substr( $x, 0, index( $x, $char ), ''); ## or $x =~ s[(^/*$char)][];
        2. if the requirement is to remove and process the first N chars of a string, then chop off the lump then separate them:
          for my $char ( unpack '(A1)*', substr $x, 0, 100, '' ) { ## }

        The only time the relative performance of these methods is likely to make a significant difference is if you were applying it once to each of a large number of strings, say a large array, much as you might with chop( @array ). And it would have to be in the order of 10e7 elements before it would have any significant effect upon an application. If it was a common requrement, then there would probably be a leading character equivalent of chop (chip()?:) built-in.

        Could you give an example of 'add a multiplier loop inside the subroutine'?

        This was how I constructed my variation:

        use strict; use warnings; use Benchmark qw( cmpthese ); my %tests = ( subst => '$x =~ s/.//;', substr_lval => 'substr($x,0,1) = "";', substr_mod => 'substr($x,0,1,"");', reverse => '$x = reverse $x; chop($x); $x = reverse($x);', substr_copy => '$x = substr($x,1);', ); our $loops ||= 1e4; for (values %tests) { $_ = <<EOT my \$x = 'X' x $loops; for( 1 .. $loops ) { $_ } EOT } cmpthese(-3, \%tests);

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Ways to delete start of string
by hsmyers (Canon) on May 27, 2008 at 06:47 UTC
    The results of this formulation are:
    Rate substr_lval subst reverse substr_mod s +ubstr_copy substr_lval 489949/s -- -9% -37% -67% + -69% subst 539218/s 10% -- -31% -63% + -66% reverse 776376/s 58% 44% -- -47% + -52% substr_mod 1473954/s 201% 173% 90% -- + -8% substr_copy 1606272/s 228% 198% 107% 9% + --
    Not hard to see the problem with $_. That said I'm not sure that what you say about how things shouldn't be in a sub is correct. Shouldn't it factor out since it would be true for all cases?

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."

      Shouldn't it factor out since it would be true for all cases?

      No. (a+k)/(b+k) is not equal to a/b. And it diminishes the value of the absolute numbers.