in reply to Re^2: $str to %hash to @ary
in thread $str to %hash to @ary

fizbin,
You have missed that I reversed the ordering of the string. As is, there is no way to have two adds with the same percentage nor is there a straight forward way of determining the percentage of any given add. I was trying to be creative, provide a significant speed increase, and maintain manageability.

Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.

Cheers - L~R

Replies are listed 'Best First'.
Re^4: $str to %hash to @ary
by fizbin (Chaplain) on Jul 25, 2004 at 03:08 UTC
    I agree that the OP's data structure as described had some deficiencies (such as only one ad. per invidual probability value). Also, I think that the "reverse" I put in your code is evidence that I saw the string reversal. However, your main point is:
    Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.
    Well, if so I should then count time to serialize and deserialize that structure, since as far as I can tell the OP was looking for something that would be useful in a cgi script. (otherwise, his original complaints about performance make little sense) But for the sake of argument, let's refactor the benchmark so that each method has a setup step and a run step, and see what we get. I'll call the way I tested things before "cgi style" and the build-structure-once, extract-many method "mod_perl style". (I've eliminated QM's code until he can provide a corrected version, and added ccn's second version. I've also cut the correctness tests from this output for clarity)
    ==== Speed tests ==== As cgi: (from scratch each time): Rate origCode duff L~R ccn fizbin ccn_fast +ccn_faster origCode 3799/s -- -33% -83% -87% -89% -96% + -97% duff 5671/s 49% -- -75% -80% -84% -94% + -95% L~R 23009/s 506% 306% -- -19% -36% -75% + -79% ccn 28461/s 649% 402% 24% -- -21% -69% + -75% fizbin 36068/s 849% 536% 57% 27% -- -61% + -68% ccn_fast 92559/s 2337% 1532% 302% 225% 157% -- + -17% ccn_faster 111920/s 2846% 1874% 386% 293% 210% 21% + -- As mod_perl: (NOT from scratch each time): Rate fizbin ccn_fast ccn ccn_faster L~R origCo +de duff fizbin 36073/s -- -61% -62% -68% -92% -9 +2% -92% ccn_fast 92123/s 155% -- -2% -17% -80% -8 +0% -80% ccn 94311/s 161% 2% -- -15% -79% -8 +0% -80% ccn_faster 111200/s 208% 21% 18% -- -75% -7 +6% -76% L~R 450572/s 1149% 389% 378% 305% -- - +3% -3% origCode 462599/s 1182% 402% 391% 316% 3% +-- -1% duff 466249/s 1193% 406% 394% 319% 3% +1% --
    ccn_fast is the version ccn posted in response to my benchmark the first time. ccn_faster is a minor tweak I made to ccn_fast to have it not use an additional variable on repeated invocations.

    A few curious things to notice here - first off, using a mod_perl-style test, the original code is almost as fast as possible. (or at least, "almost as fast as what anyone else has put forward") So my first suggestion to the OP would be: if performance is your problem, find a way to switch to mod_perl and do the building of the array once. (but consider the disadvantages others have mentioned of your code only allowing one ad. per percentage value) Secondly, although the code for the methods duff and origCode is identical after the initial building of the data structure, duff was consistently (I ran this test program a few times) about 1% faster. I have no explanation for this, though I'd like one from someone with intimate knowledge of perl internals. (perl version 5.8.2 as shipped with cygwin)

    And yes, my code gets spanked in the mod_perl scenario, but that's expected. What I hadn't expected is just how blinding fast ccn_fast wound up being, enough so that it's still almost competitive under a mod_perl scenario.

    And now, the improved benchmark code:

    Finally, I want to note that I trust the "cgi style" tests of this code less than the previous benchmark code. In this version, code that has two halves (setup and eval) gets penalized extra in the cgi-style tests because of the additional indirection and sub invocation which didn't happen in the previous tests, and wouldn't happen in a real cgi script. I'm not quite sure what to do here; perhaps I could try to penalize the one-piece scripts in a similar fashion to compensate. (Or, for cgi-style tests, use the benchmark I posted earlier)
    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
      fizbin,
      I knew I was trading speed in setup time for speed in lookup time. If you want to see me on top, change the following:
      "L~R" => sub { my @foo; my $str = "17:43:33:21:23:19:27:6"; while (my $p = $str =~ /([^:]+)/g && $1 and my $add_id = $str +=~ /([^:]+)/g && $1) { push @foo, ($add_id) x $p; } \@foo; }, "L~R" => sub { my $addid = $_[0]->[ rand 100 ]; }, As mod_perl: (NOT from scratch each time): Rate fizbin ccn_fast ccn ccn_faster origCode du +ff L~R fizbin 35680/s -- -52% -53% -60% -91% -9 +1% -92% ccn_fast 74642/s 109% -- -2% -16% -80% -8 +1% -83% ccn 76321/s 114% 2% -- -14% -80% -8 +1% -82% ccn_faster 88368/s 148% 18% 16% -- -76% -7 +8% -80% origCode 375784/s 953% 403% 392% 325% -- - +6% -13% duff 398928/s 1018% 434% 423% 351% 6% +-- -8% L~R 432419/s 1112% 479% 467% 389% 15% +8% --
      I did notice that not on every pass all methods were "passing". If someone wanted to make it even faster they would randomize the array as part of set up time and then just loop through the array.

      Cheers - L~R

        I still want to know why there's the consistent time difference among the top three each time in the mod_perl case - it's the same code on the lookup end, once you drop the unnecessary call to int that's in the methods duff and origCode.

        I'll note that your fast version is essentially what duff did, except with regexes to extract the array instead of split followed by splice.

        -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/