comment on

I don't think you want to use real, sensitive data to generate fake data. That could be brute forced back into the original.

What you probably want instead is a list of possible values for each field, and combine values randomly to generate complete entries.

Since you mention this is for test data, you should think about edge cases for each field, so that your lists are broad and your testing more robust.

For instance, a name field could be one of these:

John Smith
Cher
k d lang
Mr. William Peterson III, Ph.D., M.D., J.D.
The Mamas and the Papas
Mssr. Jacque Blacque du Laurier, Esquire
Hans-Peter van Scoter
TAFKAP [The Artist Formerly Known As Prince]
8) [frog smiley]
Steve & Sherry Smith
Steve Smith & Sherry Shortcake [married, preserving surname]
Tenchi Kanaka-san
[download]

(Don't forget unicode, various Asian forms, and Celtic Rune forms.) If you really plan to test something, you should consider the boundaries of your input filter, and attack them appropriately. (Of course, if you allow data like this, parsing it into first name, last name, and titles will be daunting).

-QM
--
Quantum Mechanics: The dreams stuff is made of

In reply to Re: Anonymising data by QM
in thread Anonymising data by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.