in reply to Re^4: Speed Improvement
in thread Speed Improvement

I got the same results once I fixed the typo: yours and McA's algorithms performed slightly worse, toolic's only marginally better when study was added. So something happened, definitely not a no-op, but nothing useful either! With such short strings, I shouldn't be surprised.

1 Peter 4:10

Replies are listed 'Best First'.
Re^6: Speed Improvement
by BrowserUk (Patriarch) on Dec 02, 2014 at 16:24 UTC
    yours and McA's algorithms performed slightly worse,

    studying takes time and rarely yields enough extra performance to cover the cost, unless you are searching very long strings -- eg. whole files -- repeatedly. Ie. study once; search many times.

    definitely not a no-op,

    I think if you look back at Dave_the_M's post you'll see that he said "a noop since 5.16". (I'm assuming the use 5.10; is indicative of what you are using?)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      No, I'm running 5.20.1, at least on my pc. I took out the use 5.010 and the results are pretty much the same -- something is happening, but nothing good. Yes, study takes the time to build a table to make searching more efficient, something akin to a database index. If the strings are short, or there are only a very few searches to perform, study isn't worth the effort. I should really try it on a sample problem like the one suggested in the docs.

      1 Peter 4:10
        No, the study keyword does absolutely nothing except return an appropriate boolean for backwards compatibilty. In particular, it is not building up any tables. Here is the C source code for pp_study() in 5.18.0:
        PP(pp_study) { dVAR; dSP; dPOPss; STRLEN len; (void)SvPV(sv, len); if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv) || SvVAL +ID(sv)) { /* Historically, study was skipped in these cases. */ RETPUSHNO; } /* Make study a no-op. It's no longer useful and its existence complicates matters elsewhere. */ RETPUSHYES; }

        Dave.

        something is happening,

        I'm guessing the differences you are seeing is just down to the overhead of calling into C, doing (almost) nothing, and returning again.

        From pp.c/5.18

        /* Pattern matching */ PP(pp_study) { dVAR; dSP; dPOPss; STRLEN len; (void)SvPV(sv, len); if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv) || SvVAL +ID(sv)) { /* Historically, study was skipped in these cases. */ RETPUSHNO; } /* Make study a no-op. It's no longer useful and its existence complicates matters elsewhere. */ RETPUSHYES; }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^6: Speed Improvement (insignificant)
by tye (Sage) on Dec 03, 2014 at 08:07 UTC
    [...] performed slightly worse, toolic's only marginally better [....]definitely not a no-op

    You seem to be unaware that Benchmark's attempts to subtract "overhead" from its calculations means that differences of about 20% or less are likely to have absolutely no meaning (especially for ridiculously short operations for which optimizing is almost always just a waste of your time, such as the operations y'all have been optimizing). You can easily get a report of a 20% performance difference after just changing the names you used so they get run in a different order or just running it enough times to get a big enough sample.

    I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").

    The line from the original question:

    Every bit of speed I can squeeze out of this sub will help!

    just made me laugh. :)

    - tye        

      For (really hardcore) optimizing, the current bleadperl has Porting/bench.pl by dave_the_m, which uses the cachegrind tool to count the CPU cycles executed for a benchmark. It's mostly intended for benchmarking perl itself, but I guess it could well be used for microbenchmarking what approach is faster, after having arrived at a minimal op count.

      Hi tye,

      first of all: I just like these kinds of threads. Really. You see several approaches, the monks participating start to get little boys (me too) presenting the hopefully fastest approach. Many approaches are sources for new ideas or point of views. Some approaches are just beautiful (a personal taste of source code beautiness).

      There have been many discussions about whether microbenchmarks are meaningful. Yes, of cource. The context of the original question is missing to judge whether an optimization in this field matters or not.

      But we learned also: Don't trust a statistics which you haven't forged yourself. ;)

      I learned that a short time of studying before starting to work (even when this kind of studying is just making nothing) can result in an extraordinary performance boost (by just doing something different and letting people believe you did what they want). These are the lessons of and for life... :))

      Last but not least: As you said by yourself, the original post made you laugh. That's not the worst, is it?

      Regards
      McA

        You seem to think that I was advocating that this thread should have never happened. Trying to optimize such tiny operations (certainly in Perl) is really just a waste of time as far as achieving actual practical performance improvements. Certainly, it can be fun.

        I occasionally enjoy Perl "golf". But I really doubt anybody thinks such code is meant to provide examples of how best to get real work done. Have fun, but where is the harm in having the fun while being aware of the practical meaninglessness of the scores being measured? Do you think the original author is going to pick an approach based on much of anything beyond these scores after such an opening sentence (especially when nobody will even hint at how insignificant the scores will end up being in practice)?

        Do you object to GotToBTru perhaps learning a little such that they could avoid again jumping to invalid conclusions based on reports from Benchmark?

        The context of the original question is missing to judge whether an optimization in this field matters or not.

        Not really. That these alternatives will have almost no impact in a full script is easy to predict from evidence already presented: The high iterations/sec of even the slowest version, the fact that something practical is desired, and that this program will be written in Perl. That "messages" are being constructed just adds another nail to the coffin.

        You can even tell Benchmark to notify you when it is producing especially pointless numbers. Just set the iteration count to something reasonable like 10_000. If you get told "too few iterations", then the resulting numbers (no matter how high you raise the iteration count) are more fiction than they are useful.

        The math of how performance changes of very small operations very quickly become completely insignificant is quite simple (more "arithmetic" than "math"). And I didn't plan to rehash such in this thread.

        Sorry to have interrupted your "learning" with stuff I guess you didn't want to learn? :)

        - tye        

      I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").

      The lines being modified are obviously some kind of templating system. For what? We can only guess, but perhaps some kind of scientific Monte Carlo simulation runs... At some point, the templated random values have to be populated with actual numbers.

      To that end, I generated 100 files averaging 500k using this code:

      Not exotic, but 'good enough'.

      I then used the following code to slurp the files in turn, do the substitutions and then write the modified data to new files in another directory. Once using Nar's OP code (tweaked to work) and once using my posted version:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; sub nar_substitute { my @numeric_chars = ( 0 .. 9 ); my $message = shift; my @numeric_matches = ($message =~ m/\{\\d\d+\}/g); foreach (@numeric_matches) { my $substitution = $_; my ($substitution_count) = ($substitution =~ m/(\d+)/); my $number = ''; for (1..$substitution_count) { $number .= $numeric_chars[int rand @numeric_chars];; } $message =~ s[\Q$substitution][$number]e; } return $message; } sub buk_substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s } our $O //= 0; $|++; my $start = time; for my $fname ( glob 'templ*.txt' ) { printf "\rProcessing $fname"; my $file = do{ local( @ARGV, $/ ) = $fname; <> }; $file = $O ? buk_substitute( $file ) : nar_substitute( $file ); open O, '>', "modified/$fname" or die $!; print O $file; close O; } printf "\n\nTook %.6f secs\n", time() - $start;
      [16:39:37.79] C:\test\junk>..\junk63 Processing templ99.txt Took 2064.037575 secs [17:16:13.24] C:\test\junk>del modified\* C:\test\junk\modified\*, Are you sure (Y/N)? y [17:25:20.32] C:\test\junk>..\junk63 -O=1 Processing templ99.txt Took 6.626883 secs [17:25:35.03] C:\test\junk>

      100 - ( 6.626883 / 2064.037575 * 100 ) = 99.7% saving or 311 times faster!

      Worth the effort I would say.

      Maybe this needs doing once per run of the simulation. Maybe once a day; maybe hundreds. Maybe 100 files is overkill; maybe it requires thousands of files. Maybe 500k average is oversized; maybe they a in the GB range. I don't know...and neither do you.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Note that it is not the case that a 300% speed increase on a micro operation resulted in an over-all speed-up of 31100%. This demonstration is impressive but, to be clear, is also rather tangential to what I was actually talking about.

        This is a decisive refutation, but of a straw-man argument that I wasn't actually making. It resoundingly defends the idea that optimization can sometimes be quite significant. But I was not attacking that idea (and I never have). (Though, I certainly allow that BrowserUk might have sincerely believed I was doing so.)

        It might be interesting to reuse the code to test my actual prediction.

        I was a bit surprised that I didn't see anybody benchmark the original code (but perhaps I just missed it). Significant performance degradation on this type of stuff is usually due to regexes making entire copies of very long strings (especially if applying a large number of regexes per string) or due to pathological backtracking. The latter seems unlikely after a quick scan or the original code. So I'll take a wild guess (without having even glanced at the code hidden above) and predict that the disconnect has much much more to do with the size of strings being used and rather little to do with the original implementation actually being even close to 31100% slower on strings similar to those actually specified in the root node.

        It should be fairly simple to show how the gulf between my "<20%" prediction and the above "31100%" demonstration is distributed over the various disconnects:

        • Inaccuracy of my prediction
        • Using the original implementation not one of the Benchmark'ed ones
        • Using much, much longer strings
        • Applying many more regexes per string than in the Benchmarks
        • Pathological backtracking
        • Something else?

        Perhaps something informative, even useful, could result. It might even provide practical lessons for some on a few differences between micro optimizations and useful optimizations.

        - tye        

      just made me laugh. :)

      What made me laugh, is the way you ignored stuff you find so ludicrous, at such great length.

      Thanks for the entertainment :)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.