in reply to patchable settings

Here's my current attempt to approach y'all's desires, without doing too much that I find offsensive :). I considered switching to backslash escaping, but that meant replacing the splits with more complex regexes, so discarded that approach. The nested hash stuff is an extra bonus and shows a benefit of using different delimiters between key and value than between different vars.

I tried to increase readability of the stored string (which seemed important to demerphq) by only escaping the necessary characters &, =, and %. I did not consider using unprintable characters and am a little uneasy about not escaping any there are in the data; I'd prefer it to be always all printable. This also meant bypassing calls to escape/unescape which would slow things down with a lot of vars.

cmpVars is what we need for current patch detection to work reliably; I haven't tested it at all yet.

sub packVars { my $varsref = $_[0]; # current format version: 01 return join "&", "==01", map { my $typ; my $v = $varsref->{$_}; # special data to pack? if (ref $v) { # only hash refs supported now $typ = "H"; $v = packVars( $v ); } # undef becomes empty, protect empty values elsif (!$v) { $v = $typ = ''; } join '=', map {s/([%&=])/ sprintf '%%%02x', ord($1) /ge; $_} $_, $v, (defined $typ ? $typ : ()); } sort keys %$varsref; } sub unpackVars { my $vars_str = $_[0]; my $format_version = "00"; # version 00: original format # version 01: keys are escaped, not just values $format_version = $1 if $vars_str =~ s/^==(\d\d)&//; return {} unless $vars_str; my %vars; if ($format_version eq "01") { for (split /&/, $vars_str) { my ($k,$v,$typ) = map { s/%(\w\w)/ chr(hex($1)) /ge; $_ } split /=/, $_, + -1; # special data to unpack? if ($typ) { # nested hash if ($typ eq "H") { $v = unpackVars($v); } } $vars{$k} = $v; } } # format version "00" else { %vars = map split(/=/, $_, 2), split /&/, $vars_str; unescape( values %vars ); $vars{$_} eq ' ' and $vars{$_} = '' for keys %vars; } return \%vars; } sub cmpVars { my ($var1str, $var2str) = @_; # return false immediately if strings match, # otherwise return true if both are current format version, # otherwise compare current format version. return $var1str cmp $var2str && ( $var1str =~ /^==01&/ && $var2str =~ /^==01&/ ) || packVars( unpackVars ( $var1str ) ) cmp packVars( unpackVars ( $var2str ) ); }

Replies are listed 'Best First'.
Re^2: patchable settings (big readmores)
by demerphq (Chancellor) on Oct 16, 2004 at 10:58 UTC

    I took the liberty of taking your code and running with it. Unsurprisingly I added handling arrays, (Which required a modification to your packing scheme, type is tagged at the root level, right after the version string and not at the key/value level as you had it.) As well as code to prevent possible accidental infinite recursion. Additionally I used my personal settings as a test set and benchmarked the two approaches to see what the comparative perfomance was. The results were very interesting. First off, if we rip out DB related code and benchmark the two variants we find that the results are as follows:

    Rate old new old 115/s -- -47% new 215/s 87% --

    When i then did the benchmark on the server _with_ update code the perfomance changes to as follows:

    Rate old new old 107/s -- -61% new 271/s 153% --

    These results were somewhat gratifying to me as they appear to validate my point that by using a more compact encoding we reduce the size of the stored data and thus the time required to marshal that data back and forth to the db server. (The update was forced each time by incrementing a counter in the stored hash.) These are the lengths and first chars of the two packed vars:

    L: 11549>external_user=demerphq&scratchpublic=%20 L: 6902 >==01:H&DomainNodeletExtras=&allow_dupe_p

    So overall I think this code is worthy. We probably pack and unpack four or five vars per page fetch. Reducing the load this takes has got to be a good thing for all concerned.

    Home test code (needs DDS to be installed)

    And the Dumper Prompt Benhcmark Code (in IE this means you have to save the response as a file as the cmpthese output comes before the html, havent figured out a better way to do this yet.)


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi

      Flux8


      Played with it a little, mostly reindenting; removed the eval checks: we aren't preserving blessed refs, so no point in even allowing them. Put the hash check first in packVars. Don't need to escape = for arrays (unless I'm missing something).(Update: added back =). Started to mess with cmpVars but ran out of time. My inclination would be just to treat too high version numbers as if they were the current rather than return an error. All the error conditions should get logged rather than die or return.

        Ok, i took yours and ran further. Neither of ours handled strings starting with $PackVer. This does, as well as handling undef properly. It also include enhanced benchmark results from Dumper Prompt here on PM.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

          Flux8


      Your benchmarks show that adding the DB update work to the new method makes it run about 30% faster. This is a red flag and you should explain it before accepting the results.

      I did scan the code quickly and didn't see an obvious reason, but the code is hard to read from here.

      - tye        

        Umm, sorry I should have been more clear. The actual timings were done on two different machines. One set of benchmarks on my box here, the other set via Dumper Prompt. When I get more time Ill run both via dumper prompt so the runs per sec numbers are comparable (i was more interested in the relative percentages, not the r/p/s)


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

          Flux8


        I reran all the tests from the Dumper prompt with the updated code I just posted in reply to ysth elsewhere in this thread. 'new_u' and 'old_u' are with DB updates, 'new' and 'old' are without.

        new, new_u, old, old_u, each for at least 2 CPU seconds... new: 2 wallclock secs ( 2.04 usr + 0.00 sys = 2.04 CPU) @ 50 +5.62/s (n=1031) new_u: 8 wallclock secs ( 2.09 usr + 0.05 sys = 2.15 CPU) @ 23 +6.45/s (n=508) old: 3 wallclock secs ( 2.12 usr + 0.00 sys = 2.12 CPU) @ 13 +0.82/s (n=278) old_u: 7 wallclock secs ( 2.05 usr + 0.06 sys = 2.12 CPU) @ 10 +2.49/s (n=217) Rate old_u old new_u new old_u 102/s -- -22% -57% -80% old 131/s 28% -- -45% -74% new_u 236/s 131% 81% -- -53% new 506/s 393% 286% 114% -- new, new_u, old, old_u, each for at least 2 CPU seconds... new: 3 wallclock secs ( 2.19 usr + 0.00 sys = 2.19 CPU) @ 50 +0.11/s (n=1094) new_u: 5 wallclock secs ( 1.94 usr + 0.12 sys = 2.06 CPU) @ 24 +6.30/s (n=508) old: 2 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 13 +5.50/s (n=289) old_u: 4 wallclock secs ( 2.10 usr + 0.03 sys = 2.13 CPU) @ 10 +6.43/s (n=227) Rate old_u old new_u new old_u 106/s -- -21% -57% -79% old 136/s 27% -- -45% -73% new_u 246/s 131% 82% -- -51% new 500/s 370% 269% 103% -- new, new_u, old, old_u, each for at least 2 CPU seconds... new: 2 wallclock secs ( 2.09 usr + 0.00 sys = 2.09 CPU) @ 51 +0.56/s (n=1065) new_u: 5 wallclock secs ( 1.89 usr + 0.13 sys = 2.02 CPU) @ 26 +0.94/s (n=528) old: 2 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU) @ 13 +9.21/s (n=298) old_u: 3 wallclock secs ( 2.01 usr + 0.03 sys = 2.04 CPU) @ 11 +1.33/s (n=227) Rate old_u old new_u new old_u 111/s -- -20% -57% -78% old 139/s 25% -- -47% -73% new_u 261/s 134% 87% -- -49% new 511/s 359% 267% 96% --

        Note that new_u() is still considerably faster than old_u(), and that new() has an even greater advantage over old().


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

          Flux8


      Will look more at this soon, but I wanted to note that because of the lack of sort, cmpVars can't rely on just cmp when comparing version 0 strings.

        Yep, thats a good point. So forget that change. :-)


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi

          Flux8