Re^2: 64-bit digest algorithms

Replies are listed 'Best First'.
Re^3: 64-bit digest algorithms by dHarry (Abbot) on Nov 13, 2008 at 09:43 UTC
A list of Hash functions. There area a few 64 bits alternatives. Maybe elf64? HTH	[reply]
Re^4: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 13, 2008 at 18:25 UTC
Many thanks for this link. After browsing around and reading a lot of very interesting information, it led me to FNV-1 & FNV-1a which seem to have been explicitly designed for my purposes with particular attention paid to the problems of clustering and collisions. This seems to be my best choice. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^5: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 13, 2008 at 19:11 UTC
Then again, maybe the FNV isn't such a good choice after all :( Though it only seems to produce the bad results when XOR-folding is used to truncate it, and I don't need that. At this point, I'm still looking. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^3: 64-bit digest algorithms by gone2015 (Deacon) on Nov 13, 2008 at 10:26 UTC
When I was playing around recently I used CRC32, taken once forwards across the input and a second time backwards across it. This certainly increased the effective space of the hash -- my sums is not up to any assertion about whether that gives a full 64 bits of "entropy". There are also two other respectable 32 bit CRCs -- avoiding the need to reverse the input. There is also CRC64 itself, but a quick poke at CPAN doesn't reveal an implementation (except buried in some "Bio..." stuff). Wouldn't be too difficult to bang out a table driven CRC64. You don't want anything crypto-secure, so a CRC could well be sufficient, and a good implementation will run like stink compared to any of the crypto-digest stuff.	[reply]
Re^4: 64-bit digest algorithms by massa (Hermit) on Nov 13, 2008 at 11:17 UTC
There is an arbitrary-output-size hash function family called RadioGatún... []s, HTH, Massa (κς,πμ,πλ)	[reply]
Re^4: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 15, 2008 at 06:49 UTC
This (see the red on the graphic and the conclusions) is why you don't use cyclic redundancy check algorithms for hashing purposes.	[reply]
Re^5: 64-bit digest algorithms by gone2015 (Deacon) on Nov 20, 2008 at 20:03 UTC
why you don't use cyclic redundancy check algorithms for hashing purposes. This surprised me. So I've been running a few tests. I've dug out my copy of Knuth, and refreshed my memory on how you calculate the probability of 'c' collisions if you throw 'm' balls at random into 'n' bins. Then I ran three tests. Each test uses randomly constructed 8-12 byte strings, each byte being 0..255 -- the distribution of string lengths and byte values should be even, assuming my 64-bit random number generator is good (which I believe it is). The tests were: take 1M (2^20) strings and count the number of collisions when using (a) CRC-32 and (b) FNV (see below). This was done 500 times, and the resulting distribution of numbers of collisions plotted against the expected distribution. See: Test-1. take 16,384 (2^14) strings and count the number of collisions when using (a) CRC-32 and (b) FNV, and XORing the MS and LS halves to give a 16-bit hash. This was done 1,000 times, and the resulting distribution of numbers of collisions plotted against the expected distribution. See: Test-2. as (2) but using (a) the LS and (b) the MS halves of CRC-32 to give a 16-bit hash. This was done 1,000 times. See: Test-3. Visual inspection of these plots does not suggest that CRC-32 is clearly distinguishable from uniform randomness. The middle part of the FNV distributions is a little taller, suggesting it is a little less variable in it's results. The full 32-bit CRC-32 hash leans a little towards a few more collisions than expected, a little, perhaps... I'm not sure whether to apply chi-squared or Kolmogorov-Smirnov to the data I have... but the plots don't suggest to me that CRC-32 is a bad hash. If anyone can suggest a better test or a better statistical approach, or has contrary results, I'd love to hear ! The FNV hash is taken from http://bretm.home.comcast.net/~bretm/hash/6.html. What it comes down to is: `U32 FNV_Hash(U8 * p, int l) { U32 hash = 2166136261 ; while (l--) { hash = (hash * 16777619) ^ *p++ ; } ; return hash ; } ;` [download]	[reply] [d/l]
Re^6: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 21, 2008 at 16:23 UTC
Re^7: 64-bit digest algorithms by gone2015 (Deacon) on Nov 22, 2008 at 03:05 UTC
Re^6: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 20, 2008 at 23:28 UTC
Re^7: 64-bit digest algorithms by gone2015 (Deacon) on Nov 22, 2008 at 02:48 UTC
Some notes below your chosen depth have not been shown here
Re^3: 64-bit digest algorithms by ikegami (Patriarch) on Nov 13, 2008 at 18:21 UTC
~~Doubles have 52 bits. The 53rd bit is the implied leading "1".~~ Update: As pointed out by oshalla, the implied bit does count. As pointed out by BrowserUk, I forgot about the sign bit. So there's 54 bits of precision for signed integers.	[reply]
Re^4: 64-bit digest algorithms by BrowserUk (Patriarch) on Nov 15, 2008 at 06:39 UTC
No! Doubles have 64-bits. Of which: 1 bit is used for the sign; 11 bits are used for the exponent; 52 bits are used for the mantissa; 1 extra bit is implied in the mantissa by the specification; With the result being that a 64-bit IEEE-754 double precision floating point value can act as a 53-bit* signed integer* for some purposes. If you're going to try to be a pedant, rather than contribute something useful--at least be right.	[reply]
Re^5: 64-bit digest algorithms by gone2015 (Deacon) on Nov 16, 2008 at 15:33 UTC
can act as a 53-bit* signed integer for some purposes.* To be even more pedantic... :-) ...that would be a 53 bit unsigned integer, or a 54-bit sign and magnitude signed integer (using one bit for the sign); equivalent to a 54-bit 1's complement signed integer. Since 1's complement machines are as rare as rocking horse fumet, one might worry about the `-0x80....0` minimum value. Happily that is quite comfortably accommodated in floating point form. The `-0` value is generally treated as zero, so that shouldn't intrude. So, for practical purposes a 64-bit IEEE-754 double precision floating point value can act as a 54-bit signed integer (with the usual 2's complement -ve values). So: `for my $n (53..55) { my $max = 2($n-1) - 1 ; my $min = -(2($n-1)) ; printf "%4d: -%s..%s\n", $n, show($min), show($max) ; } ; sub show { my ($x) = @_ ; my $ms = int(abs($x)/2*32) ; my $ls = abs($x) - ($ms 2*32) ; sprintf '0x%04X_%04X_%04X_%04X', ($ms >> 16), ($ms & 0xFFFF), ($ls >> 16), ($ls & 0xFFFF) ; } ;` [download] gives: 53: -0x0010_0000_0000_0000..0x000F_FFFF_FFFF_FFFF 54: -0x0020_0000_0000_0000..0x001F_FFFF_FFFF_FFFF 55: -0x0040_0000_0000_0000..0x0040_0000_0000_0000 showing that 53-bit (2's complement) signed integer is on the small side, 55-bit is to big, but 54-bits is just right. For extra credit, those with 64-bit integers can consider (a) why: `my $z = 0x0FED_CBA9_8765_4321 ; my $d = 9 ; printf " a = %20s %d + i, for i = 0..%d\n", "$z", $d, $d ; # 12: 12345678901234567890 : +2345 print " i: a / $d : Error\n" ; foreach my $i (0..$d) { my $a = ($z * $d) + $i ; my $q = $a / $d ; printf " %2d: %20s : %+5d\n", $i, "$q", $q - ($z + int($i/$d)) ; } ;` [download] gives: a = 1147797409030816545 * 9 + i, for i = 0..9 i: a / 9 : Error 0: 1147797409030816545 : +0 1: 1.14779740903082e+18 : +128 2: 1.14779740903082e+18 : +128 3: 1.14779740903082e+18 : +128 4: 1.14779740903082e+18 : +128 5: 1.14779740903082e+18 : +128 6: 1.14779740903082e+18 : +128 7: 1.14779740903082e+18 : +128 8: 1.14779740903082e+18 : +128 9: 1147797409030816546 : +0 and (b) whether this could be fixed, and finally (c) whether round-to-even is a Good Thing, with particular reference to this case. Bug #53784 refers.	[reply] [d/l] [select]
Re^6: 64-bit digest algorithms by ikegami (Patriarch) on Nov 16, 2008 at 17:57 UTC
Re^7: 64-bit digest algorithms by gone2015 (Deacon) on Nov 16, 2008 at 20:37 UTC
Some notes below your chosen depth have not been shown here