in reply to Test data masking tools/techniques?

Perhaps we could be more helpful if you explained what you mean by "masking production data". Are you talking about manipulating real live data in some way to create test data which is very similar to real-world data but without the risk of exposing real private data to testers?

  • Comment on Re: Test data masking tools/techniques?

Replies are listed 'Best First'.
Re: Re: Test data masking tools/techniques?
by dwarrell (Initiate) on Apr 01, 2003 at 01:27 UTC
    Yes, that's what I'm looking to do. Sorry for the lack of clarity. I actually don't agree with this assignment, as I feel that anything I can find on the 'Net would need to be havily modified to fit the data it will be dealing with here, but my boss has insisted I search for existing tools. I personally feel that techniques are going to be of greater assistance to me :/

      Well, I'm not aware of any tools for doing that and I didn't manage to pull anything up with a Google search (I guess you didn't either). I imagine such tools would be very domain-specific.

      As far as techniques go, one approach to taking a randomish sample is to select every 'nth' row from a table (using % (modulo) and the row number). If you can slurp a dataset into memory, you could use Shuffle to randomise it. For randomising names, you could do a random sample of first names, then a random sample of last names using a different seed or modulo and then paste them together.