First, assuming you have mutual non-disclosure agreements with your customers, find one that will let you use a chunk of their data. Explain that you want to do performance tuning on representative data, and that if you use their data, they'll see the biggest benefit.
Failing that, pick some subset of their data, and obscure the strings. The last time I did something like this, we calculated MD5 hashes of their strings, then used substrings that were the same length as the original. It made for messy, unreadable data, but was good enough for our (performance tuning) purposes.
Update: I forgot to mention. When you use MD5 hashes (or substrings), foreign key integrity is maintained. That is, if you've got customer data with textual keys, and references to those keys from other tables, when you go the MD5 route you won't lose those references.
In reply to Re: Test data masking tools/techniques?
by dws
in thread Test data masking tools/techniques?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |