comment on

I agree that the OP's data structure as described had some deficiencies (such as only one ad. per invidual probability value). Also, I think that the "reverse" I put in your code is evidence that I saw the string reversal. However, your main point is:

Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.

Well, if so I should then count time to serialize and deserialize that structure, since as far as I can tell the OP was looking for something that would be useful in a cgi script. (otherwise, his original complaints about performance make little sense) But for the sake of argument, let's refactor the benchmark so that each method has a setup step and a run step, and see what we get. I'll call the way I tested things before "cgi style" and the build-structure-once, extract-many method "mod_perl style". (I've eliminated QM's code until he can provide a corrected version, and added ccn's second version. I've also cut the correctness tests from this output for clarity)

==== Speed tests ====

As cgi: (from scratch each time):
               Rate origCode    duff     L~R     ccn  fizbin ccn_fast 
+ccn_faster
origCode     3799/s       --    -33%    -83%    -87%    -89%     -96% 
+      -97%
duff         5671/s      49%      --    -75%    -80%    -84%     -94% 
+      -95%
L~R         23009/s     506%    306%      --    -19%    -36%     -75% 
+      -79%
ccn         28461/s     649%    402%     24%      --    -21%     -69% 
+      -75%
fizbin      36068/s     849%    536%     57%     27%      --     -61% 
+      -68%
ccn_fast    92559/s    2337%   1532%    302%    225%    157%       -- 
+      -17%
ccn_faster 111920/s    2846%   1874%    386%    293%    210%      21% 
+        --

As mod_perl: (NOT from scratch each time):
               Rate  fizbin ccn_fast     ccn ccn_faster     L~R origCo
+de    duff
fizbin      36073/s      --     -61%    -62%       -68%    -92%     -9
+2%    -92%
ccn_fast    92123/s    155%       --     -2%       -17%    -80%     -8
+0%    -80%
ccn         94311/s    161%       2%      --       -15%    -79%     -8
+0%    -80%
ccn_faster 111200/s    208%      21%     18%         --    -75%     -7
+6%    -76%
L~R        450572/s   1149%     389%    378%       305%      --      -
+3%     -3%
origCode   462599/s   1182%     402%    391%       316%      3%       
+--     -1%
duff       466249/s   1193%     406%    394%       319%      3%       
+1%      --
[download]

ccn_fast is the version ccn posted in response to my benchmark the first time. ccn_faster is a minor tweak I made to ccn_fast to have it not use an additional variable on repeated invocations.

A few curious things to notice here - first off, using a mod_perl-style test, the original code is almost as fast as possible. (or at least, "almost as fast as what anyone else has put forward") So my first suggestion to the OP would be: if performance is your problem, find a way to switch to mod_perl and do the building of the array once. (but consider the disadvantages others have mentioned of your code only allowing one ad. per percentage value) Secondly, although the code for the methods duff and origCode is identical after the initial building of the data structure, duff was consistently (I ran this test program a few times) about 1% faster. I have no explanation for this, though I'd like one from someone with intimate knowledge of perl internals. (perl version 5.8.2 as shipped with cygwin)

And yes, my code gets spanked in the mod_perl scenario, but that's expected. What I hadn't expected is just how blinding fast ccn_fast wound up being, enough so that it's still almost competitive under a mod_perl scenario.

And now, the improved benchmark code:

#!perl

use strict;

use Benchmark qw(cmpthese);

my $str = "17:43:33:21:23:19:27:6";

my $setup_codehash = {
    origCode => sub {
        my @ary;
        my %hash = split /:/, $str;
        foreach my $k ( keys %hash ) {
            push( @ary, map { $hash{$k} } ( 1 .. $k ) );
        }
        \@ary;
    },

    ccn => sub {
        my %hash = split /:/, $str;
        \%hash,
    },

    "L~R" => sub {
        my $lookup;
        my %hash = reverse split /:/, $str;
        while ( my ( $key, $val ) = each %hash ) {
            $lookup .= pack( "C*", ($key) x $val );
        }
        $lookup;
    },

    duff => sub {
        my @ary;
        my @a = split /:/, $str;
        @a % 2 && die;    # not an even number of items
        while ( my ( $p, $ad ) = splice @a, 0, 2 ) {
            push @ary, ($ad) x $p;
        }
        \@ary;
    },
};

my $codehash = {
    origCode => sub {
        my $adId = $_[0]->[ int( rand(100) ) ];
    },

    ccn => sub {
        my $adno;

        my $rand = rand 100;

        my $sum = 0;
        for ( keys %{$_[0]} ) {    # there is no need of sorted keys
            $adno = $_[0]->{$_};
            last if ( $sum += $_ ) > $rand;
        }
        $adno;
    },

    "L~R" => sub {
        my $addid = unpack( "C", substr( $_[0], rand 100, 1 ) );
    },

    duff => sub {
        my $adId = $_[0]->[ int( rand(100) ) ];
    },

    fizbin => sub {
        my @a = split /:/, $str;
        my $r = int( rand(100) );
        my ( $adId, $p );
        while ( ( $p, $adId ) = splice @a, 0, 2 ) {
            if ( $r < $p ) { last; }
            $r -= $p;
        }
        $adId;
    },

    ccn_fast => sub {
        my $rand = rand 100;
        my $mystr = $str; # this line is needed for repetitive test on
+ly
        1 while $mystr =~ /([^:]+):([^:]+):?/g and (($rand -= $1) > 0)
+;
        $2;
    },

    ccn_faster => sub {
        my $rand = rand 100;
        pos($str)=0; # this line is needed for repetitive test only
        1 while $str =~ /([^:]+):([^:]+):?/g and (($rand -= $1) > 0);
        $2;
    },
};

sub phash (%) {
    my %h = @_;
    return "{"
      . join( ", ",
        map { sprintf( '%s => %3.3f', $_, $h{$_} ); }
          sort { $a <=> $b } keys(%h) )
      . "}";
}


sub make_cgistyle_routine ($) {
    my $subname = shift;
    if ($setup_codehash->{$subname}) {
        return sub {$codehash->{$subname}->($setup_codehash->{$subname
+}->());};
    }
    else
    {
        return sub {$codehash->{$subname}->();};
    }
}

sub make_mpstyle_routine ($) {
    my $subname = shift;
    if ($setup_codehash->{$subname}) {
        my @setupvals = $setup_codehash->{$subname}->();
        return sub {$codehash->{$subname}->(@setupvals);};
    }
    else
    {
        return sub {$codehash->{$subname}->();};
    }
}


print "==== Correctness tests ==== \n\n";
my %strhash = reverse split( /:/, $str );

print "Probabilities are: ", phash(%strhash), "\n";

foreach my $subname ( keys %$codehash ) {
    my %resultshash = map { $_ => 0 } keys %strhash;
    my $routine = make_mpstyle_routine($subname);
    do { $resultshash{ $routine->() } += 0.001; }
      for ( 1 .. 100000 );

    print "$subname yielded ", phash(%resultshash), "\n";
    do {
        print "$subname failed\n" and last
          if ( abs( $resultshash{$_} - $strhash{$_} ) > 0.3 );
      }
      for keys(%strhash);
}

print "\n==== Speed tests ====\n";

print "\nAs cgi: (from scratch each time):\n";

cmpthese( -5,
    {map {$_ => make_cgistyle_routine($_)} keys %$codehash });

print "\nAs mod_perl: (NOT from scratch each time):\n";

cmpthese( -5,
    {map {$_ => make_mpstyle_routine($_)  } keys %$codehash });
[download]

Finally, I want to note that I trust the "cgi style" tests of this code less than the previous benchmark code. In this version, code that has two halves (setup and eval) gets penalized extra in the cgi-style tests because of the additional indirection and sub invocation which didn't happen in the previous tests, and wouldn't happen in a real cgi script. I'm not quite sure what to do here; perhaps I could try to penalize the one-piece scripts in a similar fashion to compensate. (Or, for cgi-style tests, use the benchmark I posted earlier)

--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/;
map{y/X_/\n /;print}map{pop@$_}@/for@/
[download]

In reply to Re^4: $str to %hash to @ary by fizbin
in thread $str to %hash to @ary by abaxaba

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.