The trick mentioned earlier about using md5 hashes and cutting them up into suitable lengths seems like a nifty idea, if a bit heavy-weight.

If it would be sufficient to replace alphabetics and digits with randomly selected other alphabetics and digits (to mask personal-id info like names and credit-card numbers, but not disrupt the actual character class relations), something like this might do (not tested):

sub mask_it { my ($instr) = @_; my $retstr = ''; while ($instr =~ /[0-9a-z]/i) { if (/^([^0-9a-z]*)([a-z]+)/i) { $retstr .= $1; #pass non-alphas as-is my $orig = $2; #replace alphas with new ones $retstr .= join('',map { chr(65+int(rand(26))) } split(//, +$orig)); } elsif (/^([^0-9a-z]*)(\d+)/i { $retstr .= $1; #pass non-digits as-is my $orig = $2; #replace digits with new ones $retstr .= join('',map { chr(48+int(rand(10))) } split(//, +$orig)); } $instr =~ s/^(\W*)$orig//; #remove from input } $retstr .= $instr; #pass on anything that's left over return $retstr; }
Note that this only outputs upper-case replacements for any input letters; if you want to be more "flexible", it should be easy to add that in. (There are probably a few ways to optimize this, but this gets the basic idea across.)

update: added the "join('',...)" around each "map {...}" to make sure the string assignment would work properly. Also changed the "while" condition from /\w/ to /[0-9a-z]/i (and similarly for the "if" condidtions), to make sure that underscores don't throw it into an endless loop.


In reply to Re: Test data masking tools/techniques? by graff
in thread Test data masking tools/techniques? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.