Problems with defining hashes

Annemarie has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Problems with defining hashes by polettix (Vicar) on Mar 30, 2005 at 13:00 UTC
You're using floating point numbers as keys in the hash and this is probably something that you shouldn't do, unless you exactly know what you're doing. Consider that 5.0, 5.5 and 0.1 have not an exact representation inside the computer, due to its binary nature and finite bits per number; given the fact that your cycles start from different starting points (5.0 the first, 5.5 the second), you end up having slightly different number in $mag in the first and second loop, i.e. different keys in the hash. I would suggest that you fix the magnitudes somewhere (an array, for example), then use some integer index just to be sure not to mess things up. Flavio Don't fool yourself.	[reply]
Re^2: Problems with defining hashes by ysth (Canon) on Mar 30, 2005 at 18:47 UTC
5.0 and 5.5 do have exact representations; 0.1 does not.	[reply]
Re^3: Problems with defining hashes by polettix (Vicar) on Mar 30, 2005 at 23:38 UTC
Touché :-) Another case of "negative laziness"... Flavio Don't fool yourself.	[reply]
Re^2: Problems with defining hashes by Annemarie (Acolyte) on Mar 30, 2005 at 23:27 UTC
Thanks, Falvio. I had completely forgotten about the issues with using floating point numbers.	[reply]
Re: Problems with defining hashes by Fletch (Bishop) on Mar 30, 2005 at 13:08 UTC
Seconded on the using floats as hash keys being a bad idea. Run this version to see why: `my @keys = qw( 0 0.1 0.3 1 3 10 30 1000 ); my %n; for ( my $mag = 5.0 ; $mag < 9.0 ; $mag += 0.1 ) { @{ $n{ $mag } }{ @keys } = ( 0 ) x @keys; } for ( sort { $a <=> $b } keys %n ) { print "mag: $_\t$n{$_}->{30}\n"; }` [download] Perhaps if you step back and described what you're trying to accomplish with this hash . . .	[reply] [d/l]
Re: Problems with defining hashes by tlm (Prior) on Mar 30, 2005 at 13:47 UTC
To expand a bit on frodo72's reply, if you change your code slightly to: `for ($mag=5.0;$mag<9.0;$mag=$mag+0.1){ $n{$mag}{0}=$n{$mag}{0.1}=$n{$mag}{0.3}=$n{$mag}{1}=0; $n{$mag}{3}=$n{$mag}{10}=$n{$mag}{30}=$n{$mag}{1000}=0; } for ($mag=5.5; $mag<9.0; $mag=$mag+0.1){ print"$mag $n{$mag}{30}\n"; }` [download] you'll see that you are not initializing what you think you are. The output of the above looks like this: `5.5 0 5.6 0 5.7 0 5.8 0 5.9 0 6 0 6.1 0 6.2 0 6.3 0 6.4 0 6.5 6.6 6.7 6.8 6.9 6.99999999999999 0 7.09999999999999 0 7.19999999999999 0 7.29999999999999 0 7.39999999999999 0 <truncated>` [download] Try this: use strict; my @bin_labels = qw( 0 0.1 0.3 1 3 10 30 1000 ); { my $step_size = 0.1; my $start = 5.0; my $finish = 9.0; sub index_to_mag { my $i = shift; return $start + $i * $step_size; } sub mag_to_index { my $mag = shift; return sprintf "%.f", ($mag - $start) / $step_size; } } my @n; for ( my $i = 0; $i < mag_to_index( 9.0 ); ++$i ) { $n[ $i ]{ $_ } = 0 for @bin_labels; } for ( my $mag = 5.5; $mag < 9.0; $mag = sprintf "%.1f", $mag + 0.1 ) { printf "%.1f %f\n", $mag, $n[ mag_to_index( $mag ) ]{ 30 }; } [download] Afterthought: Maybe some clarification will make the code above more useful. I switched using a hash (`%n`) to using an array (`@n`), for the reasons that frodo72 already explained. In this case, there is a simple way to interconvert between array indexes and magnitudes; this is encapsulated in `index_to_mag` and `mag_to_index`. The frequent use of `(s)printf` takes care of avoiding round-off errors. Update: Made proper closures out of `mag_to_index` and `index_to_mag`. the lowliest monk	[reply] [d/l] [select]
Re: Problems with defining hashes by graff (Chancellor) on Mar 30, 2005 at 15:36 UTC
This might be an easier way to avoid the floating-point problem for you hash keys: `# first, generate fixed-precision strings for magnitudes # and bin values: my @mag_keys = map { sprintf("%.1f", $_/10) } ( 50 .. 89 ); my @bins = qw/0 0.1 0.3 1 3 10 30 1000/; # now initialize hash bins for counting: # (this is probably unnecessary, unless you're re-using # the hash on multiple separate data sets) my %n; for my $mag ( @mag_keys ) { for my $bin ( @bins ) { $n{$mag}{$bin} = 0; } }` [download] The next thing to watch out for, when actually counting things up, is to avoid using the "==" operator to test whether a floating point value from your input data matches a given hash key. Use only "<", ">", "<=", ">=" as needed, or else use sprintf on the data value to get it into the same precision as the hash key, then use "eq" (or "gt", "ge", "le", "lt"). I'm still scratching my head about the "0, 0.1, 0.3, ..." series -- that jump from 30 to 1000 seems odd.	[reply] [d/l]
Re^2: Problems with defining hashes by Annemarie (Acolyte) on Mar 30, 2005 at 23:32 UTC
As I just said to Flavio, I had completely forgotten about issues with floating point numbers. Your suggestions work beautifully and are brief. Thank you! The values 0.1 to 30 cover logarithmic time intervals during which I assume my data to be complete. 1000 is an arbitrary default value for aftershocks outside my time intervals. I could have used any other name, maybe 'outside'. The '0' collects foreshock data. I hope the explanation saves your head from being scratched. ;-)	[reply]
Re: Problems with defining hashes by melora (Scribe) on Mar 30, 2005 at 23:35 UTC
What about using the initialization as written and then using a foreach loop to work with the contents? Just a thought; it would depend on how you need to use the hash.	[reply]
Re: Problems with defining hashes by tphyahoo (Vicar) on Mar 31, 2005 at 14:20 UTC
For those who, like me, haven't learned sprintf yet, here's a nice explanation/tutorial on sprintf. What got me, specifically, was `%.1f`. I thought the 1 (one) was an l (ell). Dur... ********** Those new to the x operator (aka repetition operator) may want to have a look at this explanation of the x operator. 20050404 Unconsidered by Corion. Was considered by holli: clean up link (edit:11 keep:7 del:0)	[reply] [d/l]


go ahead... be a heretic
	PerlMonks