Re: Difference Of Two Strings (complete benchmarks)

Another implementation for you. This one seems pretty speedy (update: optimized and benchmarks updated with everyone else's updates):

sub fastolfe {
        my $source = shift;
        my $chop   = shift;
        local($_);

        my %found;
        $found{$_}++ while ($_ = chop($source)) ne '';
        while (($_ = chop($chop)) ne '') {
                return if --$found{$_} < 0;
        }

        my $result;
        foreach (sort keys %found) {
                $result .= $_ while $found{$_}--;
        }
        $result;
}
[download]

Hybridizing the above with merlyn's version, we can squeeze out a bit more speed:

sub fast_merl {
        my ($source, $chop) = @_;
        local($_);

        my %found;
        $found{$_}++ while ($_ = chop($source)) ne '';
        while (($_ = chop($chop)) ne '') {
                return if --$found{$_} < 0;
        }

        return join "", map { $_ x $found{$_} } keys %found;
}
[download]

Benchmarking most of the versions I see thus far (minus a couple of the less interesting ones, because these benchmarks are getting big), using test input from above (2 success, 2 fail) as well as demerphq's tests below:

# demerphq's test set
             Rate blakem merlyn demq_scan demerphq fastolfe fast_merl 
+fast_c scan_c
blakem      479/s     --   -24%      -39%     -44%     -46%      -47% 
+  -97%   -98%
merlyn      629/s    31%     --      -20%     -27%     -29%      -30% 
+  -96%   -97%
demq_scan   790/s    65%    26%        --      -8%     -11%      -12% 
+  -95%   -97%
demerphq    861/s    80%    37%        9%       --      -3%       -4% 
+  -95%   -96%
fastolfe    887/s    85%    41%       12%       3%       --       -1% 
+  -94%   -96%
fast_merl   901/s    88%    43%       14%       5%       2%        -- 
+  -94%   -96%
fast_c    15708/s  3179%  2396%     1889%    1723%    1671%     1644% 
+    --   -31%
scan_c    22648/s  4627%  3499%     2767%    2529%    2453%     2415% 
+   44%     --

# simple success case
             Rate blakem merlyn demerphq fastolfe fast_merl demq_scan 
+fast_c scan_c
blakem     6244/s     --   -14%     -26%     -32%      -35%      -43% 
+  -90%   -93%
merlyn     7221/s    16%     --     -14%     -21%      -24%      -34% 
+  -89%   -92%
demerphq   8429/s    35%    17%       --      -8%      -12%      -23% 
+  -87%   -91%
fastolfe   9181/s    47%    27%       9%       --       -4%      -16% 
+  -86%   -90%
fast_merl  9563/s    53%    32%      13%       4%        --      -12% 
+  -85%   -90%
demq_scan 10908/s    75%    51%      29%      19%       14%        -- 
+  -83%   -88%
fast_c    65634/s   951%   809%     679%     615%      586%      502% 
+    --   -28%
scan_c    91428/s  1364%  1166%     985%     896%      856%      738% 
+   39%     --

# simple failure case
              Rate blakem merlyn demerphq demq_scan fastolfe fast_merl
+ scan_c fast_c
blakem      7759/s     --   -39%     -51%      -60%     -63%      -63%
+   -94%   -94%
merlyn     12666/s    63%     --     -20%      -35%     -39%      -40%
+   -89%   -91%
demerphq   15783/s   103%    25%       --      -19%     -24%      -26%
+   -87%   -89%
demq_scan  19581/s   152%    55%      24%        --      -5%       -8%
+   -84%   -86%
fastolfe   20720/s   167%    64%      31%        6%       --       -2%
+   -83%   -85%
fast_merl  21209/s   173%    67%      34%        8%       2%        --
+   -82%   -85%
scan_c    119642/s  1442%   845%     658%      511%     477%      464%
+     --   -14%
fast_c    139171/s  1694%   999%     782%      611%     572%      556%
+    16%     --
[download]

Source: http://fastolfe.net/transient/2001/11/02/pm.string.difference.bench

Comment on Re: Difference Of Two Strings (complete benchmarks) Select or Download Code

Replies are listed 'Best First'.
Re: Re: Difference Of Two Strings (Benchmarks 2) by demerphq (Chancellor) on Nov 03, 2001 at 08:01 UTC
UPDATED to reflect Fastolfes reply. Heh. While I was setting up my benchmark there everyone else was as well. I came up with a different set of results. First jungleboy, runrig and yuckfoo failed outright on some of my test cases. Second both merlyn and fast_merl failed because the resulting set of letters are out of order, but when I put a sort clause in as was recommended they passed fine. Yours was the fastest. Only yours, mine and blakem succeded outright. # Updated: Removed unecessary debug code, minor tidy. use strict; use warnings; use Benchmark 'cmpthese'; #----------------------------------------------------------- our %subs = ( yuckfoo => sub { my ( $full, $part ) = @_; my $left = ""; my ($regx) = join ( '+.', ( split ( '', $part ) ) ); if ( $full =~ m{$regx} ) { my (@fulls) = split ( '', $full ); my (@parts) = split ( '', $part ); while ( my $ch = shift (@fulls) ) { ( @parts && $ch eq $parts[0] ) ? shift (@parts) : ( $left .= $ch ); } } else { $left = "__undef__"; } return $left; }, jungleboy => sub { my ( $full, $part ) = @_; my ($ch); my (@parts) = split ( '', $part ); my ($regx) = join ( '+.', @parts ); if ( $full =~ m{$regx} ) { foreach $ch (@parts) { $full =~ s/[$ch]{1}//; } } else { $full = "__undef__"; } return $full; }, demerphq => sub { my ( $from, $to ) = @_; my %ltrs; $ltrs{ substr( $from, $_, 1 ) }++ foreach 0 .. length($from) - + 1; --$ltrs{ substr( $to, $_, 1 ) } < 0 && return "__undef__" foreach 0 .. length($to) - 1; return join ( "", sort map { $_ x $ltrs{$_} } keys %ltrs ); }, demq_scan => sub { my ( $from, $to ) = @_; my $ret = ""; my ( $f, $t ) = ( 0, 0 ); while (1) { my ( $fc, $tc ) = ( substr( $from, $f, 1 ), substr( $to, $ +t, 1 ) ); if ( $fc eq $tc ) { $t++; $f++; if ( substr( $to, $t, 1 ) ne $tc ) { $f++, $ret .= $fc while substr( $from, $f, 1 ) eq +$fc; } last if $t == length($to); } elsif ( $fc lt $tc ) { $ret .= $fc; $f++; return "__undef__" if $f >= length $from; } else { return "__undef__"; } } return $ret . substr( $from, $f ); }, blakem => sub { my $string = shift; my $letters = shift; $string =~ s/$_// \|\| return "__undef__" for split // => $lette +rs; return $string; }, fastolfe => sub { my $source = shift; my $chop = shift; local ($_); my %found; $found{$_}++ while ( $_ = chop($source) ) ne ''; while ( ( $_ = chop($chop) ) ne '' ) { return "__undef__" if --$found{$_} < 0; } my $result = ""; #fixed demerphq foreach ( sort keys %found ) { $result .= $_ while $found{$_}--; } $result; }, merlyn => sub { my ( $full, $part ) = @_; my %count; $count{$_}++ for split //, $full; for ( split //, $part ) { return "__undef__" if --$count{$_} < 0; } return join "", map { $_ x $count{$_} } sort keys %count; }, fast_merl => sub { my ( $source, $chop ) = @_; local ($_); my %found; $found{$_}++ while ( $_ = chop($source) ) ne ''; while ( ( $_ = chop($chop) ) ne '' ) { return "__undef__" if --$found{$_} < 0; } return join "", map { $_ x $found{$_} } sort keys %found; } # Not sure why it fails.. # runrig => sub { # my $str = shift; # my $letters = shift; # my ( %hash1, %hash2 ); # my @arr1 = split //, $str; # @hash1{@arr1} = @arr1; # my @arr2 = split //, $letters; # my @deleted = grep $_, delete @hash1{@arr2}; # @hash2{@arr2} = @arr2; # delete @hash2{@deleted}; # return "__undef__" if %hash2; # join '', keys %hash1; # }, ); sub test { my ($sub) = @_; foreach my $t ( # tests [0] - [1] = [2] [qw"ab a b"], [qw"acciimmnnnnoootu acimnotu cimnnnoo"], [qw"acciimmnnnnoootu acimnotuu __undef__"], [qw"ab ab", ""], [qw"aaaaaaabbbbbbbcccccccddddddddde e aaaaaaabbbbbbbcccccccddd +dddddd"], [qw"aaaaaaabbbbbbbcccccccddddddddde aaaaaaabbbbbbbcccccccddddd +dddd e"], [qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxyz aaaaa +aabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxy z"], [qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxyz aaaaa +bbbbbbcccccddddddddghijklmnopqrstuvwxy aabccdefz"], [qw"abbcccdde bccd abcde"], [qw"abbcccdde bccdf __undef__"], [qw"bbcccdde abccdf __undef__"], #uncomment me to kill the regex versions #[qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxz aaaaa +bbbbbbcccccddddddddghijklmnopqrstuvwxy __undef__"], ) { #warn "$sub @$t\n"; my $r = $subs{$sub}->( $t->[0], $t->[1] ); $r = ( !defined($r) ? "undef" : $r ); #warn "'$r'\n"; die "in $sub expected $t->[2] got $r from $t->[0] - $t->[1]" if $t->[2] ne $r; } } my %tests = map { $_ => "test($_)" } keys %subs; cmpthese( -10, \%tests ); __END__ [download] _{Benchmark: running blakem, demerphq, demerphq_scan, fast_merl, fastolfe, jungleboy, merlyn, yuckfoo, each for at least 10 CPU seconds... blakem: 11 wallclock secs (10.52 usr + 0.00 sys = 10.52 CPU) @ 497.48/s (n=5231) demerphq: 11 wallclock secs (10.72 usr + 0.00 sys = 10.72 CPU) @ 755.88/s (n=8100) demq_scan: 11 wallclock secs (10.57 usr + 0.00 sys = 10.57 CPU) @ 731.63/s (n=7737) fast_merl: 10 wallclock secs (10.61 usr + 0.00 sys = 10.61 CPU) @ 768.34/s (n=8149) fastolfe: 10 wallclock secs (10.55 usr + 0.00 sys = 10.55 CPU) @ 779.23/s (n=8217) jungleboy: 10 wallclock secs (10.18 usr + 0.00 sys = 10.18 CPU) @ 20.03/s (n=204) merlyn: 10 wallclock secs (10.56 usr + 0.00 sys = 10.56 CPU) @ 595.40/s (n=6285) yuckfoo: 10 wallclock secs (10.50 usr + 0.00 sys = 10.50 CPU) @ 20.10/s (n=211) Rate jungleboy yuckfoo blakem merlyn demerphq_scan demerphq fast_merl fastolfe jungleboy 20.0/s -- -0% -96% -97% -97% -97% -97% -97% yuckfoo 20.1/s 0% -- -96% -97% -97% -97% -97% -97% blakem 497/s 2383% 2374% -- -16% -32% -34% -35% -36% merlyn 595/s 2872% 2861% 20% -- -19% -21% -23% -24% demq_scan 732/s 3552% 3539% 47% 23% -- -3% -5% -6% demerphq 756/s 3673% 3660% 52% 27% 3% -- -2% -3% fast_merl 768/s 3736% 3722% 54% 29% 5% 2% -- -1% fastolfe 779/s 3790% 3776% 57% 31% 7% 3% 1% --} Yves / DeMerphq -- Have you registered your Name Space?	[reply] [d/l]
Re: Re: Re: Difference Of Two Strings (Benchmarks 2) by Fastolfe (Vicar) on Nov 03, 2001 at 08:07 UTC
Yah I took out the sort in fast_merl because the requirements didn't state the returned string had to be sorted, just that the input was. The sort has a very slight performance penalty. It's pretty easy to put back. Good set of exhaustive tests.	[reply]
Re: Re: Difference Of Two Strings (complete benchmarks) by YuckFoo (Abbot) on Nov 07, 2001 at 04:27 UTC
Thanks for the chop modification. Nice touch. It seems to be at least 10-20% faster than split, I'll take it! I also appreciate the extensive bench compliation. Very interesting, and very informative.	[reply]