comment on

This is a follow-up/in addition to my previous reply. I finally got around to adapting some code I wrote to assess md5 to access CRC32, The result is this image

This image plots the state affect of changing bits in a randomly selected 128-bit input to CRC32 (plotted vertically), against

the 32-bits in the output (plotted horizontally in the left 2/3rds of the image) and:
the 16-bits resulting from XORing the upper and lower halves of the CRC32 together (plotted horizontally in the right third of the image).
There is a "1-pixel" band of mid-grey surrounding the image, and vertically between the two sets of data.
The image has been stretched x4 both horizontally and vertically to make discerning the individual pixels easier.
The image was generated from accumulating counts of changes of pixels in the (two) outputs for each change of state of the 128-bits in the inputs, for each of 1 million 16-byte randomly chosen inputs.
Although, the number of trials is in effect irrelevant as it doesn't vary (at all) from the state reached after just 1000 trials.
The totals of the state changes accumulated, have had (slightly modified) chi-squared algorithm applied to them, to stretch the differences from the expected average.
Though as will be explained, the results of this are minimal as the results are very "black & white" (a pun which will also be explained.)

So, for each randomly chosen input, the CRC is calculated. Then, each bit of the input in turn is toggled, the CRC32 recalculated, and the differences between the two CRCs is are counted and accumulated for the number of trials. And the chi-squared is applied to emphasis the differences from the expected averages.

Thus, for each pixel, a mid-grey color (matching the border, represents the expected 50% of the pixels change ideal. A pure black pixel indicates that that pixel never changed. A pure white pixel indicates that that pixel always changed. And, as I said above, the results are very black & white. And they indicate that CRC32 makes for a very poor hashing algorithm, because with some columns are entirely white regardless of the input, and other entirely black, it means that some hash values will be way over used and other never.

The mid-grey and black vertical strips on the right hand third of the image is the result of performing the same processing upon a 16-bit XOR-fold of the CRC32s produced above. The absence of white would seem to indicate that this has improved the result, but the presence of pure black just goes to show that you cannot generate bits using XOR, only dilute them. Upshot: XOR folding is bad for hashes. Truncation is okay because it uses the bits available without diluting them.

And that brings me back to your collisions graphs. I think you got your math wrong. I simply cannot believe that you should expect 1890 collisions from 16384 randomly chosen samples from a domain of 2**32 possibilities. Birthday paradox or not, that is way, way too high a collision rate. Way too high.

~~Hmm. I uploaded the image and the size displayed on the update page conformed it was seen, but for 20 minutes now, it fails to be rendered back to my browser. So you may not be able to see it.~~

However, if you download the code below, save it as CRCtest.pl and run it with the command line: perl -s CRC32test.pl -TRIALS=1e5 -MAP=BW, it will be reproduced for you locally, though you will probably need to arrange to load it into an image app yourself (if you use *nix), and then stretch it a little to see it clearly. Updated to pack the CRC. Image updated.

Note:There are various other command line arguments available, including color ramping. If you cannot work out how to use them, ask.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^6: 64-bit digest algorithms by BrowserUk
in thread 64-bit digest algorithms by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.