Oaty has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! After searching, I found many nodes complaining that md5_hex producing different output from identical input. I have the opposite problem. I have different input producing identical output! I'm trying to index my websites visitor logs so I call md5_hex on a string that consists of (state,city,resolution,lang,user_agent, browser plugins) concatenated together. Some of these values are passed via javascript/hidden form elements. My thinking was that this would be diverse enough to produce "fingerprints". However, I've been getting some duplicate fingerprints with different data for example:
$string1="CA" . "Los Angeles" . "1440x900" . "en-us" . "Mozilla/4.0 (c +ompatible; MSIE 6.0; Windows NT 5.1; SV1)" . ""; $string2="CA" . "Los Angeles" . "1280x1024" . "en-us" . "Mozilla/4.0 ( +compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Medi +a Center PC 5.0; .NET CLR 3.0.04506)" . "";
Both produced '40d704470259297f93bee626c12b71fb' as output. This cgi script is running on a load balanced server and inserting records into a central database. Not using mod_perl. Does MD5 not use the whole string? Any light that you could shed would be appreciated. Oaty

Replies are listed 'Best First'.
Re: md5_hex diff input produces same output?
by Fletch (Bishop) on Mar 24, 2009 at 16:26 UTC

    A simple test script shows that that's not the case.

    use strict; use Digest::MD5 qw( md5_hex ); my $string1="CA" . "Los Angeles" . "1440x900" . "en-us" . "Mozilla/4.0 + (compatible; MSIE 6.0; Windows NT 5.1; SV1)" . ""; my $string2="CA" . "Los Angeles" . "1280x1024" . "en-us" . "Mozilla/4. +0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; M +edia Center PC 5.0; .NET CLR 3.0.04506)" . ""; print md5_hex( $string1 ), "\n", md5_hex( $string2 ), "\n"; exit 0; __END__ 256645dc3bb33cfd395c51faf4241cfc c3fbc65ae0b12ee4cd22ccadd3d9a3f7

    Something Else Is Wrong™, but then you haven't shown any actual code so it's all pretty much going to be conjecture.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thanks fletch my test script proved the same is yours. ... I'm prepared for a well deserved beating now. While preparing to copy my code for you I noticed the value I was passing to the database was not the output from MD5 but an un-initialized hashref value. I so wish I could use strict at this job! Thanks for pointing me back to the code Fletch! Oaty

        You can always use strict;, as it's a lexically scoped pragma:

        ... horrible code ... sub my_new_code { use strict; ... }; ... more code from the wasteland ...

        You might or might not need to declare the global variables you need using use vars;, but at least all new code can be run under strict that way.

Re: md5_hex diff input produces same output?
by andreas1234567 (Vicar) on Mar 25, 2009 at 08:31 UTC
    Although your example is flawed, there has been demonstrated working code that produces the same MD5 output:
    C:\TEMP> md5sum hello.exe cdc47d670159eef60916ca03a9d4a007 C:\TEMP> .\hello.exe Hello, world! (press enter to quit) C:\TEMP> C:\TEMP> md5sum erase.exe cdc47d670159eef60916ca03a9d4a007 C:\TEMP> .\erase.exe This program is evil!!! Erasing hard drive...1Gb...2Gb... just kidding! Nothing was erased. (press enter to quit) C:\TEMP>
    --
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]