comment on

> found bug in benchmark, will correct later. vec still among slowest

as promised:

Integrity of the results are tested in a way that excludes possible sideeffects of benchmark-routines.

Benchmarks are run with different datasets with shrinking densities ("vec"-approach breaks >1e10)

Result: "vec" doesn't scale well for larger values...

Perlversion v5.10.0


 Setting values range: 1..1e4

ok 1 - hash_grep ~~ hash_key (3231 entries)
ok 2 - hash_key ~~ hash_values (3231 entries)
ok 3 - hash_values ~~ using_vec (3231 entries)
ok 4 - using_vec ~~ hash_grep (3231 entries)
ok 5 - hash_grep ~~ hash_key (3231 entries)
ok 6 - hash_key ~~ hash_values (3231 entries)
ok 7 - hash_values ~~ using_vec (3231 entries)
1..7


 Setting values range: 1..1e4

              Rate hash_values    hash_key   using_vec   hash_grep
hash_values 29.8/s          --        -45%        -48%        -48%
hash_key    53.8/s         81%          --         -7%         -7%
using_vec   57.7/s         93%          7%          --         -0%
hash_grep   57.7/s         93%          7%          0%          --


 Setting values range: 1..1e6

              Rate hash_values    hash_key   using_vec   hash_grep
hash_values 28.5/s          --        -24%        -24%        -25%
hash_key    37.3/s         31%          --         -0%         -2%
using_vec   37.3/s         31%          0%          --         -2%
hash_grep   38.2/s         34%          2%          2%          --


 Setting values range: 1..1e8

              Rate   using_vec hash_values   hash_grep    hash_key
using_vec   17.1/s          --        -41%        -53%        -57%
hash_values 29.1/s         70%          --        -20%        -26%
hash_grep   36.2/s        112%         24%          --         -8%
hash_key    39.5/s        131%         36%          9%          --


 Setting values range: 1..1e9

              Rate   using_vec hash_values   hash_grep    hash_key
using_vec   3.29/s          --        -88%        -91%        -92%
hash_values 28.2/s        759%          --        -21%        -27%
hash_grep   36.0/s        993%         27%          --         -7%
hash_key    38.8/s       1081%         38%          8%          --

Compilation finished at Wed Nov 27 02:08:15
[download]

code:

use strict;
use warnings;
use Benchmark qw/cmpthese/;
use Data::Dump qw/pp/;
use feature 'say';
use Test::More;


say "Perlversion $^V";

my (@arr_1,@arr_2);


sub init_data{
  my $range_limit=shift;
    
  say "\n\n Setting values range: 1..$range_limit\n";

  my %unique;
  $unique{int rand $range_limit}=undef while keys %unique <8000;
  @arr_1=keys %unique;
  
  %unique=();
  $unique{int rand $range_limit}=undef while keys %unique <6000;
  @arr_2=keys %unique;
}
  

{
  package CMP;
  my @res= ("") x 8000;
  
  sub hash_values_diff {
    my %diff3;
    @diff3{@arr_1} = @arr_1;
    delete @diff3{@arr_2};
    @res = values %diff3 ;
  }

  sub hash_key_diff {
    my %diff3;
    undef @diff3{@arr_1};
    delete @diff3{@arr_2};
    @res = keys %diff3 ;
  }

  sub using_vec_diff {
    my $vec = '';
    vec( $vec, $_, 1 ) = 1 for @arr_2;
    @res = grep !vec( $vec, $_, 1 ), @arr_1;
  }

  sub hash_grep_diff {
    my %arr_2_hash;
    undef @arr_2_hash{@arr_2};
    @res = grep !exists $arr_2_hash{$_}, @arr_1;
  }
}





#--- Test benchmarks for correctness
init_data("1e4");
test_subs('_diff$',"CMP");



#--- Compare benchmarks
foreach my $range (qw/1e4 1e6 1e8 1e9/) {
  init_data($range);
  cmpthese(-5, pckg_subs('_diff$',"CMP") );
} 



#====== Utility functions for smarter benchmarks


sub pckg_subs {
  my $filter    = shift // qr[.];
  my $pckg_name = shift // "CMP";

  my $stash = do {
    no strict 'refs';
    \ %{ "${pckg_name}::" };
  };
  
  # filter all subs from package
  my $codehashref;
  while (my ($name,$glob)= each %$stash) {
    if ( my $cref = *{$glob}{CODE} and $name =~ s/$filter// ) {
#      print "$name:\t$glob\n";
      $codehashref->{$name}=$cref;
    }
  }
  return $codehashref; 
}


sub test_subs {
  
  my $h_subs=pckg_subs(@_);

  my ($last_name,@last_res,$name,$code,@res);

  # compair pairwise, loop twice to exclude sideeffects
  for (1..2) {
    while (($name,$code) =each %$h_subs ) {
      @res = sort $code->();
    
      is_deeply(\@last_res,\@res,"$last_name ~~ $name (". scalar @res 
+. " entries)")
        if $last_name;
 
      @last_res  = @res;
      $last_name = $name;    
    }
  }
  done_testing( );
}
[download]

Cheers Rolf

( addicted to the Perl Programming Language)

In reply to Re^5: Best method to diff very large array efficiently by LanX
in thread Best method to diff very large array efficiently by newbieperlperson

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.