in reply to Re^3: Character Length Requirement & String Conversion
in thread Character Length Requirement & String Conversion

I have found the solution I was looking for, using String:CRC32 or String::CRC.

That is a very bad idea! CRC32 is designed for detecting bit corruptions in single strings, not hashing many strings.

The following code checks for duplicates using just 5 character strings: 'aaaaa' .. 'zzzzz', and finds thousands. The first after just 18026 tries;

use String::CRC32;; @v = ( chr(0) ) x 256; $_ x= 2*1024*1024 for @v;; sub testAndSet{ my( $hi, $lo ) = ( $_[0] >> 24, $_[0] & 0x00ffffff ); return 1 if vec( $v[$hi], $lo, 1 ); vec( $v[$hi], $lo, 1 )=1; return; };; $n=0; testAndSet( crc32( $_, ++$n ) ) and warn "Dup after $n strings" for 'a +aaaa'..'zzzzz';; Dup after 18026 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18027 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18042 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18043 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18728 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18729 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18744 strings at (eval 12) line 1, <STDIN> line 4. Dup after 18745 strings at (eval 12) line 1, <STDIN> line 4. Dup after 19378 strings at (eval 12) line 1, <STDIN> line 4. Dup after 19379 strings at (eval 12) line 1, <STDIN> line 4. Dup after 116559 strings at (eval 12) line 1, <STDIN> line 4. Dup after 116574 strings at (eval 12) line 1, <STDIN> line 4. Dup after 117261 strings at (eval 12) line 1, <STDIN> line 4. Dup after 117276 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126026 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126027 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126030 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126031 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126042 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126043 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126046 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126047 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126728 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126729 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126732 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126733 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126744 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126745 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126748 strings at (eval 12) line 1, <STDIN> line 4. Dup after 126749 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176385 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176388 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176389 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176400 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176404 strings at (eval 12) line 1, <STDIN> line 4. Dup after 176405 strings at (eval 12) line 1, <STDIN> line 4. Dup after 250001 strings at (eval 12) line 1, <STDIN> line 4. Dup after 250512 strings at (eval 12) line 1, <STDIN> line 4. Dup after 250513 strings at (eval 12) line 1, <STDIN> line 4. Dup after 250516 strings at (eval 12) line 1, <STDIN> line 4.

Frankly, you'd be better off just truncating the urls to 4 or 5 characters. (That is not a recommendation!)

And its not much better with long strings:

use String::CRC32;; @v = ( chr(0) ) x 256; $_ x= 2*1024*1024 for @v;; sub testAndSet{ my( $hi, $lo ) = ( $_[0] >> 24, $_[0] & 0x00ffffff ); return 1 if vec( $v[$hi], $lo, 1 ); vec( $v[$hi], $lo, 1 )=1; return; };; $n=0; testAndSet( crc32( $_, ++$n ) ) and warn "Dup after $n strings" for 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'..'zzzzzzzzzzzzzzzzzzzzzzzz +zzzzzzzz';; Dup after 142376 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551424 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551425 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551426 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551427 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551428 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551429 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551430 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551431 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551684 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551685 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551686 strings at (eval 13) line 1, <STDIN> line 5. Dup after 551687 strings at (eval 13) line 1, <STDIN> line 5. Dup after 587768 strings at (eval 13) line 1, <STDIN> line 5. Dup after 587769 strings at (eval 13) line 1, <STDIN> line 5. Dup after 587770 strings at (eval 13) line 1, <STDIN> line 5. Dup after 587771 strings at (eval 13) line 1, <STDIN> line 5. Dup after 832410 strings at (eval 13) line 1, <STDIN> line 5. Dup after 832411 strings at (eval 13) line 1, <STDIN> line 5. Dup after 832472 strings at (eval 13) line 1, <STDIN> line 5. Dup after 832473 strings at (eval 13) line 1, <STDIN> line 5. Dup after 833434 strings at (eval 13) line 1, <STDIN> line 5. Dup after 833435 strings at (eval 13) line 1, <STDIN> line 5. Dup after 833502 strings at (eval 13) line 1, <STDIN> line 5. Dup after 833503 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903490 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903491 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903494 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903495 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903498 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903501 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903516 strings at (eval 13) line 1, <STDIN> line 5. Dup after 903517 strings at (eval 13) line 1, <STDIN> line 5. Dup after 994476 strings at (eval 13) line 1, <STDIN> line 5. Dup after 994477 strings at (eval 13) line 1, <STDIN> line 5. Dup after 994788 strings at (eval 13) line 1, <STDIN> line 5. Dup after 994789 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019528 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019529 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019532 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019533 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019536 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019537 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019560 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019565 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019840 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019841 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019844 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019845 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019848 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019849 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019872 strings at (eval 13) line 1, <STDIN> line 5. Dup after 1019877 strings at (eval 13) line 1, <STDIN> line 5.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?