in reply to Test data masking tools/techniques?

The trick mentioned earlier about using md5 hashes and cutting them up into suitable lengths seems like a nifty idea, if a bit heavy-weight.

If it would be sufficient to replace alphabetics and digits with randomly selected other alphabetics and digits (to mask personal-id info like names and credit-card numbers, but not disrupt the actual character class relations), something like this might do (not tested):

sub mask_it { my ($instr) = @_; my $retstr = ''; while ($instr =~ /[0-9a-z]/i) { if (/^([^0-9a-z]*)([a-z]+)/i) { $retstr .= $1; #pass non-alphas as-is my $orig = $2; #replace alphas with new ones $retstr .= join('',map { chr(65+int(rand(26))) } split(//, +$orig)); } elsif (/^([^0-9a-z]*)(\d+)/i { $retstr .= $1; #pass non-digits as-is my $orig = $2; #replace digits with new ones $retstr .= join('',map { chr(48+int(rand(10))) } split(//, +$orig)); } $instr =~ s/^(\W*)$orig//; #remove from input } $retstr .= $instr; #pass on anything that's left over return $retstr; }
Note that this only outputs upper-case replacements for any input letters; if you want to be more "flexible", it should be easy to add that in. (There are probably a few ways to optimize this, but this gets the basic idea across.)

update: added the "join('',...)" around each "map {...}" to make sure the string assignment would work properly. Also changed the "while" condition from /\w/ to /[0-9a-z]/i (and similarly for the "if" condidtions), to make sure that underscores don't throw it into an endless loop.