in reply to should this backspace removal code be done better?

#!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); my $code = { while_index => sub { # BrowserUk local $_ = $::TESTSTRING; s{(?:[^\cH]\cH|^\cH)}{}g while 1+index $_, chr(8); $_; }, altr_subst => sub { # smackdab local $_ = $::TESTSTRING; 1 while s/(?:[^\cH]\cH|^\cH+)//g; $_; }, noaltr_subst => sub { # smackdab local $_ = $::TESTSTRING; s/^\cH+//; 1 while s/[^\cH]\cH//g; $_; }, sexeger_while => sub { # shenme local $_ = reverse $::TESTSTRING; s{\cH[^\cH]|\cH$}{}g while 1+index $_, chr(8); scalar reverse $_; }, true_sexeger => sub { # Aristotle local $_ = reverse $::TESTSTRING; s/(\cH+)(??{ '.{0,'.length($1).'}' })//g; scalar reverse $_; }, substr => sub { # Aristotle local $_ = $::TESTSTRING; while(/([\b]+)/g) { my $len = 2 * length($1); substr($_, pos() - $len, $len, ''); } $_; }, }; $::TESTSTRING = "\bthis is an\b correct\b\b\b usage\b"; print "\nBenchmarking length ", length($::TESTSTRING), "\n"; cmpthese(-5, $code); $::TESTSTRING x= 100; print "\nBenchmarking length ", length($::TESTSTRING), "\n"; cmpthese(-5, $code); __END__
Benchmarking length 30
Benchmark: running altr_subst, noaltr_subst, sexeger_while,
substr, true_sexeger, while_index for at least 5 CPU seconds...
altr_subst:  5 wallclock secs ( 5.05 usr +  0.02 sys =  5.07 CPU) @ 35011.64/s (n=177509)
noaltr_subst:  5 wallclock secs ( 5.18 usr +  0.01 sys =  5.19 CPU) @ 143818.11/s (n=746416)
sexeger_while:  5 wallclock secs ( 5.20 usr +  0.01 sys =  5.21 CPU) @ 49023.42/s (n=255412)
substr:  5 wallclock secs ( 5.20 usr +  0.02 sys =  5.22 CPU) @ 16090.04/s (n=83990)
true_sexeger:  6 wallclock secs ( 5.30 usr +  0.02 sys =  5.32 CPU) @ 24390.23/s (n=129756)
while_index:  5 wallclock secs ( 5.29 usr +  0.03 sys =  5.32 CPU) @ 41607.89/s (n=221354)
Rate substr true_sexeger altr_subst while_index sexeger_while noaltr_subst
substr 16090/s -- -34% -54% -61% -67% -89%
true_sexeger 24390/s 52% -- -30% -41% -50% -83%
altr_subst 35012/s 118% 44% -- -16% -29% -76%
while_index 41608/s 159% 71% 19% -- -15% -71%
sexeger_while 49023/s 205% 101% 40% 18% -- -66%
noaltr_subst 143818/s 794% 490% 311% 246% 193% --
Benchmarking length 3000
Benchmark: running altr_subst, noaltr_subst, sexeger_while, substr, true_sexeger, while_index for at least 5 CPU seconds...
altr_subst:  6 wallclock secs ( 5.28 usr +  0.01 sys =  5.29 CPU) @ 411.53/s (n=2177)
noaltr_subst:  5 wallclock secs ( 5.25 usr +  0.01 sys =  5.26 CPU) @ 2662.74/s (n=14006)
sexeger_while:  5 wallclock secs ( 5.26 usr +  0.00 sys =  5.26 CPU) @ 630.80/s (n=3318)
substr:  6 wallclock secs ( 5.39 usr +  0.00 sys =  5.39 CPU) @ 167.72/s (n=904)
true_sexeger:  5 wallclock secs ( 5.39 usr +  0.02 sys =  5.41 CPU) @ 367.47/s (n=1988)
while_index:  6 wallclock secs ( 5.23 usr +  0.02 sys =  5.25 CPU) @ 516.57/s (n=2712)
Rate substr true_sexeger altr_subst while_index sexeger_while noaltr_subst
substr 168/s -- -54% -59% -68% -73% -94%
true_sexeger 367/s 119% -- -11% -29% -42% -86%
altr_subst 412/s 145% 12% -- -20% -35% -85%
while_index 517/s 208% 41% 26% -- -18% -81%
sexeger_while 631/s 276% 72% 53% 22% -- -76%
noaltr_subst 2663/s 1488% 625% 547% 415% 322% --

The winner is pretty clear. This isn't something that sexeger do much better than normal regexes either, since you need to process nested pairs of characters. It also proves once again that Perl optimization consists in offloading as much work into a single operation as possible. In C, something close to the algorithm I posted would have beaten all others; not so in Perl, where it requires that the interpreter traverse the optree and dispatch opcodes all the time, where the other solutions do a most of their work implicitly in the regex engine.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: Re: should this backspace removal code be done better?
by shenme (Priest) on Oct 05, 2003 at 19:21 UTC
Re: Re: should this backspace removal code be done better?
by BrowserUk (Patriarch) on Oct 05, 2003 at 19:22 UTC

    A couple of thoughts. First, the 'substr' test code doesn't return the same results as the others.

    I'm also not sure that it makes sense to remove a backspace at the beginning of the line? What would that mean?

    I don't think that it makes too much sense to benchmark strings much beyond a 80 chars or so. Mostly we don't type lines much longer than that, and optimising for the occasional use is pretty pointless. Actually, optimising this whole thing is pretty pointless unless we are going to write programs that simulate a person typing including their typos and corrections:) Even the slowest version is way faster than (I) can type a line of input.

    Finally, if the optimisation is just for fun, a way to while away a boring Sunday afternoon, here's a version that does pretty well on the short tests, but comes in second on the longer strings.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.