in reply to Re: How good is gzip data as digest?
in thread How good is gzip data as digest?

I'd have to agree with Fletch. Also, compression algorithms normally produce binary data which would cause issues when used in string context (null characters) leading to, at a minimum, collisions. Also I don't think MD5 or SHA1 will buy you much savings giving the string lengths you're talking about.

-derby

Update: As jethro points out ... I was confusing C strings with perl strings ... silly me.

Replies are listed 'Best First'.
Re^3: How good is gzip data as digest?
by jethro (Monsignor) on Apr 02, 2009 at 17:47 UTC
    Except that perl doesn't mind null characters in strings. Don't confuse them with C strings.
    > perl -e '$str="bla\x00bla"; print length($str),$str,"\n";' 7blabla
Re^3: How good is gzip data as digest?
by isync (Hermit) on Apr 02, 2009 at 17:50 UTC
    "Also, compression algorithms normally produce binary data which would cause issues when used in string context (null characters) leading to, at a minimum, collisions"
    But - if I get you right - for "seen-lookups" without ever tackling null characters, and eval{}-ed in - it should work?

    "Also I don't think MD5 or SHA1 will buy you much savings giving the string lengths you're talking about."
    Right. Some strings are even shorter than the 32bit MD5's. So the idea of not blowing up (MD5) but even compressing down (gzip) came up.
      32bit MD5's

      128-bit, actually.