in reply to Re: Difference Of Two Strings (complete benchmarks)
in thread Difference Of Two Strings

UPDATED to reflect Fastolfes reply.

Heh. While I was setting up my benchmark there everyone else was as well. I came up with a different set of results. First jungleboy, runrig and yuckfoo failed outright on some of my test cases. Second both merlyn and fast_merl failed because the resulting set of letters are out of order, but when I put a sort clause in as was recommended they passed fine. Yours was the fastest. Only yours, mine and blakem succeded outright.

# Updated: Removed unecessary debug code, minor tidy. use strict; use warnings; use Benchmark 'cmpthese'; #----------------------------------------------------------- our %subs = ( yuckfoo => sub { my ( $full, $part ) = @_; my $left = ""; my ($regx) = join ( '+.*', ( split ( '', $part ) ) ); if ( $full =~ m{$regx} ) { my (@fulls) = split ( '', $full ); my (@parts) = split ( '', $part ); while ( my $ch = shift (@fulls) ) { ( @parts && $ch eq $parts[0] ) ? shift (@parts) : ( $left .= $ch ); } } else { $left = "__undef__"; } return $left; }, jungleboy => sub { my ( $full, $part ) = @_; my ($ch); my (@parts) = split ( '', $part ); my ($regx) = join ( '+.*', @parts ); if ( $full =~ m{$regx} ) { foreach $ch (@parts) { $full =~ s/[$ch]{1}//; } } else { $full = "__undef__"; } return $full; }, demerphq => sub { my ( $from, $to ) = @_; my %ltrs; $ltrs{ substr( $from, $_, 1 ) }++ foreach 0 .. length($from) - + 1; --$ltrs{ substr( $to, $_, 1 ) } < 0 && return "__undef__" foreach 0 .. length($to) - 1; return join ( "", sort map { $_ x $ltrs{$_} } keys %ltrs ); }, demq_scan => sub { my ( $from, $to ) = @_; my $ret = ""; my ( $f, $t ) = ( 0, 0 ); while (1) { my ( $fc, $tc ) = ( substr( $from, $f, 1 ), substr( $to, $ +t, 1 ) ); if ( $fc eq $tc ) { $t++; $f++; if ( substr( $to, $t, 1 ) ne $tc ) { $f++, $ret .= $fc while substr( $from, $f, 1 ) eq +$fc; } last if $t == length($to); } elsif ( $fc lt $tc ) { $ret .= $fc; $f++; return "__undef__" if $f >= length $from; } else { return "__undef__"; } } return $ret . substr( $from, $f ); }, blakem => sub { my $string = shift; my $letters = shift; $string =~ s/$_// || return "__undef__" for split // => $lette +rs; return $string; }, fastolfe => sub { my $source = shift; my $chop = shift; local ($_); my %found; $found{$_}++ while ( $_ = chop($source) ) ne ''; while ( ( $_ = chop($chop) ) ne '' ) { return "__undef__" if --$found{$_} < 0; } my $result = ""; #fixed demerphq foreach ( sort keys %found ) { $result .= $_ while $found{$_}--; } $result; }, merlyn => sub { my ( $full, $part ) = @_; my %count; $count{$_}++ for split //, $full; for ( split //, $part ) { return "__undef__" if --$count{$_} < 0; } return join "", map { $_ x $count{$_} } sort keys %count; }, fast_merl => sub { my ( $source, $chop ) = @_; local ($_); my %found; $found{$_}++ while ( $_ = chop($source) ) ne ''; while ( ( $_ = chop($chop) ) ne '' ) { return "__undef__" if --$found{$_} < 0; } return join "", map { $_ x $found{$_} } sort keys %found; } # Not sure why it fails.. # runrig => sub { # my $str = shift; # my $letters = shift; # my ( %hash1, %hash2 ); # my @arr1 = split //, $str; # @hash1{@arr1} = @arr1; # my @arr2 = split //, $letters; # my @deleted = grep $_, delete @hash1{@arr2}; # @hash2{@arr2} = @arr2; # delete @hash2{@deleted}; # return "__undef__" if %hash2; # join '', keys %hash1; # }, ); sub test { my ($sub) = @_; foreach my $t ( # tests [0] - [1] = [2] [qw"ab a b"], [qw"acciimmnnnnoootu acimnotu cimnnnoo"], [qw"acciimmnnnnoootu acimnotuu __undef__"], [qw"ab ab", ""], [qw"aaaaaaabbbbbbbcccccccddddddddde e aaaaaaabbbbbbbcccccccddd +dddddd"], [qw"aaaaaaabbbbbbbcccccccddddddddde aaaaaaabbbbbbbcccccccddddd +dddd e"], [qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxyz aaaaa +aabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxy z"], [qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxyz aaaaa +bbbbbbcccccddddddddghijklmnopqrstuvwxy aabccdefz"], [qw"abbcccdde bccd abcde"], [qw"abbcccdde bccdf __undef__"], [qw"bbcccdde abccdf __undef__"], #uncomment me to kill the regex versions #[qw"aaaaaaabbbbbbbcccccccdddddddddefghijklmnopqrstuvwxz aaaaa +bbbbbbcccccddddddddghijklmnopqrstuvwxy __undef__"], ) { #warn "$sub @$t\n"; my $r = $subs{$sub}->( $t->[0], $t->[1] ); $r = ( !defined($r) ? "undef" : $r ); #warn "'$r'\n"; die "in $sub expected $t->[2] got $r from $t->[0] - $t->[1]" if $t->[2] ne $r; } } my %tests = map { $_ => "test($_)" } keys %subs; cmpthese( -10, \%tests ); __END__
Benchmark: running blakem, demerphq, demerphq_scan, fast_merl, fastolfe, jungleboy, merlyn, yuckfoo, 
each for at least 10 CPU seconds...
    blakem: 11 wallclock secs (10.52 usr +  0.00 sys = 10.52 CPU) @ 497.48/s (n=5231)
  demerphq: 11 wallclock secs (10.72 usr +  0.00 sys = 10.72 CPU) @ 755.88/s (n=8100)
 demq_scan: 11 wallclock secs (10.57 usr +  0.00 sys = 10.57 CPU) @ 731.63/s (n=7737)
 fast_merl: 10 wallclock secs (10.61 usr +  0.00 sys = 10.61 CPU) @ 768.34/s (n=8149)
  fastolfe: 10 wallclock secs (10.55 usr +  0.00 sys = 10.55 CPU) @ 779.23/s (n=8217)
 jungleboy: 10 wallclock secs (10.18 usr +  0.00 sys = 10.18 CPU) @ 20.03/s (n=204)
    merlyn: 10 wallclock secs (10.56 usr +  0.00 sys = 10.56 CPU) @ 595.40/s (n=6285)
   yuckfoo: 10 wallclock secs (10.50 usr +  0.00 sys = 10.50 CPU) @ 20.10/s (n=211)
                Rate jungleboy yuckfoo blakem merlyn demerphq_scan demerphq fast_merl fastolfe
jungleboy     20.0/s        --     -0%   -96%   -97%          -97%     -97%      -97%     -97%
yuckfoo       20.1/s        0%      --   -96%   -97%          -97%     -97%      -97%     -97%
blakem         497/s     2383%   2374%     --   -16%          -32%     -34%      -35%     -36%
merlyn         595/s     2872%   2861%    20%     --          -19%     -21%      -23%     -24%
demq_scan      732/s     3552%   3539%    47%    23%            --      -3%       -5%      -6%
demerphq       756/s     3673%   3660%    52%    27%            3%       --       -2%      -3%
fast_merl      768/s     3736%   3722%    54%    29%            5%       2%        --      -1%
fastolfe       779/s     3790%   3776%    57%    31%            7%       3%        1%       --

Yves / DeMerphq
--
Have you registered your Name Space?

Replies are listed 'Best First'.
Re: Re: Re: Difference Of Two Strings (Benchmarks 2)
by Fastolfe (Vicar) on Nov 03, 2001 at 08:07 UTC
    Yah I took out the sort in fast_merl because the requirements didn't state the returned string had to be sorted, just that the input was. The sort has a very slight performance penalty. It's pretty easy to put back.

    Good set of exhaustive tests.