Re^8: Speed Improvement (Is 311 times faster: "insignificant"?)

Note that it is not the case that a 300% speed increase on a micro operation resulted in an over-all speed-up of 31100%. This demonstration is impressive but, to be clear, is also rather tangential to what I was actually talking about.

This is a decisive refutation, but of a straw-man argument that I wasn't actually making. It resoundingly defends the idea that optimization can sometimes be quite significant. But I was not attacking that idea (and I never have). (Though, I certainly allow that BrowserUk might have sincerely believed I was doing so.)

It might be interesting to reuse the code to test my actual prediction.

I was a bit surprised that I didn't see anybody benchmark the original code (but perhaps I just missed it). Significant performance degradation on this type of stuff is usually due to regexes making entire copies of very long strings (especially if applying a large number of regexes per string) or due to pathological backtracking. The latter seems unlikely after a quick scan or the original code. So I'll take a wild guess (without having even glanced at the code hidden above) and predict that the disconnect has much much more to do with the size of strings being used and rather little to do with the original implementation actually being even close to 31100% slower on strings similar to those actually specified in the root node.

It should be fairly simple to show how the gulf between my "<20%" prediction and the above "31100%" demonstration is distributed over the various disconnects:

Inaccuracy of my prediction
Using the original implementation not one of the Benchmark'ed ones
Using much, much longer strings
Applying many more regexes per string than in the Benchmarks
Pathological backtracking
Something else?

Perhaps something informative, even useful, could result. It might even provide practical lessons for some on a few differences between micro optimizations and useful optimizations.

- tye

Comment on Re^8: Speed Improvement (Is 311 times faster: "insignificant"?)

Replies are listed 'Best First'.
Re^9: Speed Improvement (Is 311 times faster: "insignificant"?) by BrowserUk (Patriarch) on Dec 03, 2014 at 20:43 UTC
I was a bit surprised that I didn't see anybody benchmark the original code I did. If you look back at my first post, you'll see `nar_func` listed in the benchmark results. So I'll take a wild guess ... Why guess. Do some work. It should be fairly simple to show ... Go for it. on strings similar to those actually specified in the root node. Perhaps those strings are ... oh, I don't know ... just short, non-typical examples. You know, in keeping with the typical requests for "some short examples to demonstrate the problem". The bottom line is that the scope for optimisation is clearly shown in the OPs code: `sub nar_substitute { my @numeric_chars = ( 0 .. 9 ); my $message = shift; my @numeric_matches = ($message =~ m/\{\\d\d+\}/g); foreach (@numeric_matches) { my $substitution = $_; my ($substitution_count) = ($substitution =~ m/(\d+)/); my $number = ''; for (1..$substitution_count) { $number .= $numeric_chars[int rand @numeric_chars];; } $message =~ s[\Q$substitution][$number]e; } return $message; }` [download] Two, nested, perl-level loops; three calls into the regex engine per substitution; and N-calls to rand for each N-digit substitution. All of which can and was replaced by a single, C-level loop (/g); a single call to the regex engine per file; and a single call to rand per substitution. 'Tain't rocket science. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^9: Speed Improvement (Is 311 times faster: "insignificant"?)
by BrowserUk (Patriarch) on Dec 03, 2014 at 20:43 UTC

I was a bit surprised that I didn't see anybody benchmark the original code

I did. If you look back at my first post, you'll see nar_func listed in the benchmark results.

So I'll take a wild guess ...

Why guess. Do some work.

It should be fairly simple to show ...

Go for it.

on strings similar to those actually specified in the root node.

Perhaps those strings are ... oh, I don't know ... just short, non-typical examples. You know, in keeping with the typical requests for "some short examples to demonstrate the problem".

The bottom line is that the scope for optimisation is clearly shown in the OPs code:

sub nar_substitute {
    my @numeric_chars = ( 0 .. 9 );
    my $message = shift;
    my @numeric_matches = ($message =~ m/\{\\d\d+\}/g);
    foreach (@numeric_matches) {
        my $substitution = $_;
        my ($substitution_count) = ($substitution =~ m/(\d+)/);
        my $number = '';
        for (1..$substitution_count) {
            $number .= $numeric_chars[int rand @numeric_chars];;
        }
        $message =~ s[\Q$substitution][$number]e;
    }
    return $message;
}
[download]

Two, nested, perl-level loops; three calls into the regex engine per substitution; and N-calls to rand for each N-digit substitution.

All of which can and was replaced by a single, C-level loop (/g); a single call to the regex engine per file; and a single call to rand per substitution.

'Tain't rocket science.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]