Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Hashes aren't being differently randomized

by Hue-Bond (Priest)
on Jun 26, 2006 at 17:28 UTC ( #557616=perlquestion: print w/replies, xml ) Need Help??

Hue-Bond has asked for the wisdom of the Perl Monks concerning the following question:

$ perldoc -f each each HASH Entries are returned in an apparently random order. [...] Since Perl 5.8.1 the ordering is different even between different runs of Perl for security reasons (see "Algorithmic Complexity Attacks" in perlsec).

Great feature, so let's try it!

$ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for each %a' +| md5sum HASH_SEED = 3379561142 b6d67a24906e8a8541291882f81d31ca - $ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for each %a' +| md5sum HASH_SEED = 4068799219 b6d67a24906e8a8541291882f81d31ca -

Hmm, same MD5? And what about keys and values?

$ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for keys %a' +| md5sum HASH_SEED = 1151419008 dd56233cf84603df0d47d272da1af003 - $ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for keys %a' +| md5sum HASH_SEED = 2731377116 dd56233cf84603df0d47d272da1af003 - $ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for values %a +' | md5sum HASH_SEED = 1861095523 c1a75ab8e3bf1ff6c07b01025a1219e9 - $ PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; print $_ for values %a +' | md5sum HASH_SEED = 788024661 c1a75ab8e3bf1ff6c07b01025a1219e9 -

This implies I'm always getting the same output, so each, keys and values are returning the elements in the same order even when the seeds are different! My Perl was compiled without -DUSE_HASH_SEED_EXPLICIT (according to $Config::Config{ccflags}) so I shouldn't be setting the seed manually but anyway I tried, just in case. I used bash's $RANDOM variable, that returns a different value each time it's evaluated:

$ echo $RANDOM 8035 $ echo $RANDOM 797 $ PERL_HASH_SEED=$RANDOM PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; + print $_ for values %a' | md5sum HASH_SEED = 26815 c1a75ab8e3bf1ff6c07b01025a1219e9 - $ PERL_HASH_SEED=$RANDOM PERL_HASH_SEED_DEBUG=1 perl -e 'my %a=0..299; + print $_ for values %a' | md5sum HASH_SEED = 30449 c1a75ab8e3bf1ff6c07b01025a1219e9 -

Super Search found this but it doesn't seem to apply here since perl -V | grep SEED shows nothing. The test here also returns the same output every time I run it. This is a 5.8.8 running on Debian/Linux. Ideas?

--
David Serrano

Replies are listed 'Best First'.
Re: Hashes aren't being differently randomized
by vkon (Curate) on Jun 26, 2006 at 17:58 UTC
    In my opinion, hash randomization issue do not mean that you're guaranteed different keys order on different runs (but sometimes you will), it means something different.

    algorithm complexity attack only means you'll not get hash slowness because all hash items felt into same bucket so extracting hash element will be O(N), instead of O(log(N))

    However the moral is - you must not pretend that keys order is same between runs (so some programs probably stopped working when hash randomization was introduced)

      This is explicitly stated in perlsec:
      Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order.

      Also note that while the order of the hash elements might be randomised, this "pseudoordering" should not be used for applications like shuffling a list randomly (use List::Util::shuffle() for that, see List::Util, a standard core module since Perl 5.8.0; or the CPAN module Algorithm::Numerical::Shuffle), or for generating permutations (use e.g. the CPAN modules Algorithm::Permute or Algorithm::FastPermute), or for any cryptographic applications.

      greets,
      --shmem

      _($_=" "x(1<<5)."?\n".q/)Oo.  G\        /
                                    /\_/(q    /
      ----------------------------  \__(m.====.(_("always off the crowd"))."
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        Your post is good and do not contradicts mine,
        I suspect you intended to reply to OP, isn't it?

      And yet, it seems very unlikely that effectively changing the hash function would result in identical output order. The output order is based on the hashed key values modulo the number of buckets, so changing the hash function is very likely to change the output order.

      So test the effectiveness of the randomization directly: See which hash keys collide. If the same hash keys collide between runs, then the randomization isn't doing what it is supposed to:

      my %hash; my $key= 'aa'; my $count= @ARGV ? shift(@ARGV) : 20; $hash{$key}= 'base'; while( 1 ) { $hash{++$key}= 'test'; if( %hash =~ /^1/ ) { print "Collision: $key\n"; exit if --$count <= 0; } delete $hash{$key}; }

      I always get the same output, but this PERL_HASH_SEED_DEBUG environment variable doesn't work for me so I won't jump to any conclusions (my 5.008_007 is new enough that it should include it, however).

      - tye        

        I always get the same output

        Same here:

        $ PERL_HASH_SEED_DEBUG=1 perl ./foo | md5sum HASH_SEED = 4155558892 da7b0b6c9adb2fbc01be283dc16aed19 - $ PERL_HASH_SEED_DEBUG=1 perl ./foo | md5sum HASH_SEED = 51798468 da7b0b6c9adb2fbc01be283dc16aed19 - $ PERL_HASH_SEED_DEBUG=1 perl ./foo | md5sum HASH_SEED = 2930542944 da7b0b6c9adb2fbc01be283dc16aed19 -

        --
        David Serrano

        "perl -V" reports " ... ccflags =' ... -DNO_HASH_SEED ... ", which likely explains why /I/ always get the same results. My Perl was built by Randy Kobes. I'm a bit curious why that setting was chosen.

        - tye        

Re: Hashes aren't being differently randomized (explanation)
by demerphq (Chancellor) on Jun 28, 2006 at 07:58 UTC

    Reviewing the docs and the source I think I can explain what the deal is. Hashes arent "randomized" at all. In fact from what I can tell all perls default to using the exact same hashing at the beginning. But when a hash goes pathological (which appears from the hash testing code to be defined as having twenty keys in a single bucket) then randomization kicks in, on a per hash basis.

    I dont know if this behaviour makes sense, or if it matches with how its documented, but that appears to be the way it works.

    The code below shows how you can put 10 keys in a single bucket but that twenty will cause the hash to randomize itself. I had to change tyes routine a little as the strings we are dealing with are all zero bytes (which effectively guarantees that they go in the same bucket in a non randomized hash).

    I guess you could use this technique to randomize a hash. Stuff it with "\0" x 1 to "\0" x 20 and then empty it and then refill it with what you really want to store in it.

    my %hash= map { "\0" x $_ => $_ }1..10; print "Ten keys:\n"; for my $av ( getKeyCollisions_Len( \%hash ) ) { print "@$av\n"; } print "Twenty keys:\n"; %hash= map { "\0" x $_ => $_ }1..20; for my $av ( getKeyCollisions_Len( \%hash ) ) { print "@$av\n"; } sub getKeyCollisions_Len { my( $hv )= @_; no warnings 'numeric'; my $buckets= 0 + %$hv; my @keys= keys %$hv; my( @clash, @return ); while( @keys ) { my $key= shift @keys; delete $hv->{$key}; my $b= 0 + %$hv; if( $b == $buckets ) { push @clash, length $key; } elsif( @clash ) { push @return, [@clash,length $key]; @clash= (); } $buckets= $b; } return @return; } __END__ Ten keys: 10 9 8 7 6 5 4 3 2 1 Twenty keys: 17 12 4 5 9 14
    ---
    $world=~s/war/peace/g

Re: Hashes aren't being differently randomized
by demerphq (Chancellor) on Jun 27, 2006 at 21:56 UTC

    I think you need a debugging perl for the environment var to work. But you can get the same effect from Hash::Util::hash_seed(), which returns 0 if hash seed randomization is NOT in effect. From what i can tell Activestates builds disable this feature.

    ---
    $world=~s/war/peace/g

      I think you need a debugging perl for the environment var to work. [...] you can get the same effect from Hash::Util::hash_seed()

      Both environment variables shown in the OP seem to be working:

      $ PERL_HASH_SEED_DEBUG=1 perl -MHash::Util=hash_seed -le'print hash_se +ed' HASH_SEED = 165032588 165032588 $ PERL_HASH_SEED_DEBUG=1 perl -MHash::Util=hash_seed -le'print hash_se +ed' HASH_SEED = 725441538 725441538

      tye's getKeyCollisions shows always the same output here:

      $ PERL_HASH_SEED_DEBUG=1 perl gkc.pl | md5sum HASH_SEED = 501557310 e4d6401f730eede592733482fac0ec61 - $ PERL_HASH_SEED_DEBUG=1 perl gkc.pl | md5sum HASH_SEED = 1324598606 e4d6401f730eede592733482fac0ec61 -

      I don't know if it's supposed to do so or not. I'm beginning to take a fancy to this :) since it seems to be impossible to obtain some different output of anything between runs of perl (well, anything except hash seeds!).

      --
      David Serrano

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://557616]
Approved by Corion
Front-paged by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2022-11-29 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?