I use things like this quite a lot, but I'm not allowed to publish the code. The project I use it for involves "anonymizing" databases, so I can create reproducible customer situations.

In that process, I first collect all surnames and given names from several databases, split them on whitespace grouped by gender. Then I shuffle the list of names and create new names from the existing list by randomly picking 2 to 5 names from the correct gender list and put them in a random order and assign the new name to the anonymized victim.

The problem that is faced here, is that I have to go through all related databases too, to change the name of the parents and children so the the relations still match.

I do the same for date of birth and place of birth. And for ZIP codes.

The best part however is the addresses. First I collect all the street names from all the databases I have access to, then I split the street names on known extensions: "street", "alley", "boulevard", "road", "way", "path", etc etc. Then I take the first part of those, shuffle them and generate new street names based on the prefix + any of the known extensions. "Bondstreet" thus creates "Bondstreet", "Bondalley", "Bondroad", etc etc. I then shuffle the new list and replace all the original street names with a random pick from the new list.

Together with some other changes, someone with knowledge of the original database said he was unable to "see" what persons were involved in the new data set. This way we can mimic problems at any size of customer database, as we now generate a new one from an existing one with the same size and relations and the "anonymize" the complete set.

This has proven to be a very useful approach. All done in perl of course and nothing to do with spam or hackers.


Enjoy, Have FUN! H.Merijn

In reply to Re^3: Random personal names by Tux
in thread Random personal names by ambrus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.