abaxaba has asked for the wisdom of the Perl Monks concerning the following question:
So, I've got a colon-delimited string, that looks like this:
my $str="17:43:33:21:23:19:27:6";
which is really a stringified hash, that looks like this:
%hash=(17=>43,
33=>21,
23=>19,
27=>6);
This hash represents proportion-weights for advertisement roatation. That is, 17% of the time, ad #43 shows. 33% is ad #21, and so on.
The way I am codifying this is to have an array, 0..99, where
scalar(grep/$hash{$key}/@ary)==$key;
So, the whole thing looks something like this:
my $str="17:43:33:21:23:19:27:6";
my %hash=split/:/,$str;
foreach my $k(keys %hash)
{
push (@ary, map{$hash{$k}}(1..$k));
}
my $adId=$ary[int(rand(100))];
I've ran some timed tests on this, and am not pleased with the results. Testing has determined that the part where the array is populated (via the push/map while foreaching over keys %hash) is where much of my load is coming in.
Any thoughts on how to streamline that (or any other part of the code) to make run faster?
ÅßÅ×ÅßÅ
"It is a very mixed blessing to be brought back from the dead." -- Kurt Vonnegut
|
Re: $str to %hash to @ary
by ccn (Vicar) on Jul 22, 2004 at 21:17 UTC
|
my $str="17:43:33:21:23:19:27:6";
my @aux = split /:/, $str;
my @ary = map {($aux[2*$_+1]) x $aux[2*$_]} 0 .. @aux/2-1;
UPDATE:
You can try the following code also:
my $str="17:43:33:21:23:19:27:6";
my %hash = split /:/, $str;
my $adno;
my $rand = rand 100;
my $sum = 0;
for (keys %hash) { # there is no need of sorted keys
$adno = $hash{$_};
last if ($sum += $_) > $rand;
}
print $adno;
UPDATE: sort keys eliminated in 'for' | [reply] [d/l] [select] |
Re: $str to %hash to @ary
by duff (Parson) on Jul 23, 2004 at 03:30 UTC
|
Here's another method that doesn't use a hash at all (since the hash appears to be a synthetic variable not related to the problem you're trying to solve).
my $str="17:43:33:21:23:19:27:6";
my @a = split/:/,$str;
@a % 2 && die; # not an even number of items
while (my ($p,$ad) = splice @a,0,2) {
push @ary, ($ad) x $p;
}
my $adId=$ary[int(rand(100))];
Basically it just splits the string into an array, then grabs elements 2 at a time from the array until the array is exhausted. The code also makes use of the repetition operator in list context to get the ads repeated the proper number of times.
| [reply] [d/l] |
Re: $str to %hash to @ary
by pg (Canon) on Jul 22, 2004 at 21:32 UTC
|
At first sight, I spotted a problem right the way. Your code will never work base on your description of the data. Try this (it is your code, with slightly modified testing data):
use Data::Dumper;
my $str="27:43:33:21:23:19:27:6";
my %hash=split/:/,$str;
foreach my $k(keys %hash)
{
push (@ary, map{$hash{$k}}(1..$k));
}
my $adId=$ary[int(rand(100))];
print Dumper(\%hash);
Your code does not handle the situation where two ads has the same percentage of show time. Well, the fix is not difficult, so I just leave it to you, just switch key/value, use show number as key. | [reply] [d/l] |
Re: $str to %hash to @ary
by Eimi Metamorphoumai (Deacon) on Jul 22, 2004 at 21:21 UTC
|
I can't say about the performance per se (I'm actually quite surprised that would be noticably slow unless you're running it a lot of times, since you'll only create a total of 100 entries), but I can say that your design appears to have some serious problems. One of the main ones is that you're using the percentages as keys to a hash, so no two advertisers can have the same percentage (or the second one would disappear entirely). Once you create @ary, do you only use it once? If so, then you might look at Weighted random numbers generator. If you are going to use it several times, you still might do better to reverse the direction of your hash, if you even need the hash at all--is there a reason for it? | [reply] |
Re: $str to %hash to @ary
by Limbic~Region (Chancellor) on Jul 22, 2004 at 22:34 UTC
|
abaxaba,
- Repeatedly doing hash lookups is slower than doing it once
- The order appears reversed since as is you can't have two ads with the same weight nor lookup the weight of an add.
Assuming it is ok to reverse the order, I have presented the following unique solution that does not require an array:
#!/usr/bin/perl
use strict;
use warnings;
my $lookup;
my %hash = split /:/ , '43:17:21:33:19:23:6:27';
while (my ($key, $val) = each %hash) {
$lookup .= pack("C*", ($key) x $val);
}
print unpack("C", substr($lookup, rand 100, 1));
| [reply] [d/l] |
Re: $str to %hash to @ary
by QM (Parson) on Jul 22, 2004 at 21:27 UTC
|
At the expense of making the lookup a little slower:
my $str="17:43:33:21:23:19:27:6";
my %hash=split/:/,$str;
my $count;
my %ad_lookup;
foreach my $k(keys %hash)
{
$count += $hash{$k};
$ad_lookup{$count} = $k;
}
my $rand = rand(100);
my $adid;
foreach ( sort {$a <=> $b} keys %ad_lookup )
{
$adid = $ad_lookup{$_} unless defined( $adid);
if ( $_ <= $rand )
{
$adid = $ad_lookup{$_};
}
else
{
last;
}
}
Of course, if you want the lookup to be faster, use a binary search instead.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] |
|
| [reply] [d/l] |
|
You're correct. I didn't test the code well, and seem to have gotten several things backward or out of whack. Here's properly tested code (I hope):
#!/your/perl/here
use strict;
use warnings;
my $str="17:43:33:21:23:19:27:6";
my %hash=split/:/,$str;
my $count;
my %ad_lookup;
foreach my $k(keys %hash)
{
$count += $k;
$ad_lookup{$count} = $hash{$k};
}
my @ad_lookup_sorted_keys = sort {$b <=> $a} keys %ad_lookup;
my %seen;
for my $rand ( 1..$count )
{
# my $rand = rand($count);
my $adid;
foreach my $key ( @ad_lookup_sorted_keys )
{
if ( $rand <= $key )
{
$adid = $ad_lookup{$key};
}
else
{
last;
}
}
# print "$adid\n";
$seen{$adid}++;
}
foreach my $k ( sort {$a <=> $b} keys %seen )
{
print "\$seen{$k}: $seen{$k}\n";
}
Which outputs:
$seen{6}: 27
$seen{19}: 23
$seen{21}: 33
$seen{43}: 17
[There would be a slight shift going back to rand, because of the 0..99 and 1..100 difference. I believe you just need to change:
$ad_lookup{$count} = $hash{$k};
to:
$ad_lookup{$count-1} = $hash{$k};
]
I should note there are several ways to do this and get it right, and this is but one. Also, the key sort is only done once. A binary search on the keyspace would be faster, if there are many keys, but I'd estimate you would need a hundred keys or so to make it worth while. [Feel free to test that if you like]
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] [select] |
Re: $str to %hash to @ary
by NetWallah (Canon) on Jul 23, 2004 at 00:10 UTC
|
If you want to keep your hash structure, here are 2 alternatives that do not lose data:
The first one uses the ad# as the key (inverse of your example). The second one uses the percent as key, but stores the Ad#'s as Values in a HOA.
#First way - Ad# as key:
my $str='27:43:33:21:23:19:27:6';
my( $temp,%h);
%h= map {if($temp){
my $y=$temp;
$temp=undef;
$_=>$y
}else{
$temp=$_;
undef()=>undef;
}
} split/:/,$str;
delete $h{undef()};
print qq($_=>$h{$_}\n) for keys %h;
#Second way - using a HOA
my $str='27:43:33:21:23:19:27:6';
my( $temp,%h);
for ( split/:/,$str){
if($temp){
push @{$h{$temp}},$_;
$temp=undef;
}else{
$temp=$_;
}
};
print qq($_=> @{$h{$_}}\n) for keys %h;
Earth first! (We'll rob the other planets later)
| [reply] [d/l] |
Re: $str to %hash to @ary
by fizbin (Chaplain) on Jul 23, 2004 at 14:42 UTC
|
This is a variant on duff's method that's a bit faster, and I have here performance numbers to demonstrate it:
my @a = split/:/,$str;
@a % 2 && die; # not an even number of items
my $r = int(rand(100));
my ($adId, $p);
while (($p,$adId) = splice @a,0,2) {
if ($r < $p) {last;}
$r -= $p;
}
Now, the performance numbers; it's pretty easy to test all the variants given so far with Benchmark.pm:
Rate origCode duff QM L~R ccn fizbin
origCode 4029/s -- -34% -77% -82% -87% -89%
duff 6137/s 52% -- -65% -73% -81% -83%
QM 17360/s 331% 183% -- -24% -46% -53%
L~R 22908/s 469% 273% 32% -- -29% -38%
ccn 32106/s 697% 423% 85% 40% -- -12%
fizbin 36690/s 811% 498% 111% 60% 14% --
I have the benchmark code, but it's basically just a bunch of cuts and pastes of the code that's been posted on this thread, so I'll only post it if asked.
Update: Fixed oops in code, updated performance figures.
--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/;
map{y/X_/\n /;print}map{pop@$_}@/for@/
| [reply] [d/l] [select] |
|
==== Correctness tests ====
Probabilities are: {6 => 27.000, 19 => 23.000, 21 => 33.000, 43 => 17.
+000}
origCode yielded {6 => 26.955, 19 => 22.825, 21 => 33.142, 43 => 17.07
+8}
duff yielded {6 => 27.111, 19 => 22.956, 21 => 33.008, 43 => 16.925}
fizbin yielded {6 => 27.117, 19 => 23.057, 21 => 32.862, 43 => 16.964}
L~R yielded {6 => 26.929, 19 => 22.946, 21 => 33.151, 43 => 16.974}
QM yielded {6 => 59.902, 19 => 0.000, 21 => 17.062, 43 => 23.036}
QM failed
ccn yielded {6 => 26.991, 19 => 22.882, 21 => 33.064, 43 => 17.063}
==== Speed tests ====
Rate origCode duff QM L~R ccn fizbin
origCode 4101/s -- -34% -78% -83% -87% -89%
duff 6191/s 51% -- -66% -75% -81% -83%
QM 18348/s 347% 196% -- -26% -43% -50%
L~R 24743/s 503% 300% 35% -- -24% -32%
ccn 32411/s 690% 424% 77% 31% -- -11%
fizbin 36567/s 792% 491% 99% 48% 13% --
And here's the code I used to make that determination. The formatting is from being run through perltidy, since it was even uglier before.
| [reply] [d/l] [select] |
|
fizbin,
You have missed that I reversed the ordering of the string. As is, there is no way to have two adds with the same percentage nor is there a straight forward way of determining the percentage of any given add. I was trying to be creative, provide a significant speed increase, and maintain manageability.
Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.
| [reply] |
|
|
|
|
ccn_fast => sub {
my $rand = rand 100;
my $mystr = $str; # this line is needed for repetitive test on
+ly
1 while $mystr =~ /([^:]+):([^:]+):?/g and (($rand -= $1) > 0)
+;
$2;
},
| [reply] [d/l] |
|
|
|