Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'll start off by acknowledging that this is an odd question with a very limited scope of usage.

I've got a string, say, $_ = 'My name is Bob.'; I have an array @obs = (a..z);. I have a percentage $percentage='50%'. I want to obscure $percentage of $_ with characters from @obs in a uniform manner in that if I do it thirty times to the same string with the same percentage, I'll always get the same results. (By obscure I mean to substitute a (random) character with a random character from @obs).

My initial reaction to the problem was to use srand with a predetermined value, break up the string, pick quasirandomly which positions of the array to replace, pick a quasirandom value to replace it with, make the switch, then join the array back together. That just seems like a really clumsy manner with which to proceed, and including $percentage with such a short string made for some messy math.

I don't want to pollute other rand calls by defining srand, and all in all I think my process is clumsy at best. Any insight on how I might streamline this in a more efficient manner would be appreciated.

Replies are listed 'Best First'.
Re: Obscuring a String
by ton (Friar) on Jan 29, 2002 at 05:03 UTC
    Hmm... I don't think it has to be clumsy. Here's how I did it:
    use strict; my $string = 'My name is Bob.'; my $percent = 0.5; my @obs = ('a'..'z'); # This array better be larger that length($stri +ng) * $percent my @stringArray = split(//, $string); # break the string up my @positions = (0..(length($string) - 1)); # the locations that will + be swapped my $totalSwaps = int(length($string) * $percent); srand(1); _shuffle(\@positions); _shuffle(\@obs); for (my $i = 0; $i < $totalSwaps; ++$i) { $stringArray[$positions[$i]] = $obs[$positions[$i]]; } $string = join('', @stringArray); print $string . "\n"; sub _shuffle($) { my $aref = shift; my $pos; for(my $i = 0; $i < scalar(@$aref); ++$i) { $pos = int(rand(scalar(@$aref))); ($aref->[$i], $aref->[$pos]) = ($aref->[$pos], $aref->[$i]); } }
    -Ton
    -----
    Be bloody, bold, and resolute; laugh to scorn
    The power of man...
Re: Obscuring a String
by theorbtwo (Prior) on Jan 29, 2002 at 03:34 UTC
Re: Obscuring a String
by hossman (Prior) on Jan 29, 2002 at 14:32 UTC
    I'm guessing that when you say you need to do it thirty times and getting the same answer, you just mean you need it to be deterministic.

    I also agree, (s)rand is the wrong way to go because of it's impact on other calls to rand (not to mention the potention of other rand calls messing you up)

    you should be able to pick a sequence of letters from @obs deterministicly based on the input string, for example: convert all thee letters in $_ to their ascii number, add them up, and mod by the length of $_ to pick a letter to cut from $_, and mod by the size of @obs to pick a replacement letter. do that however many times you need to reach $percentage. (accutal code left as an excersize for the reader .. who probably isn't as tired as i am right now)

Re: Obscuring a String
by Ryszard (Priest) on Jan 29, 2002 at 10:20 UTC
    I'm quite interested in the reason of the problem.. Are you able to give any background? Is 30x an arbitraty figure or related to something?
Re: Obscuring a String
by mstone (Deacon) on Jan 31, 2002 at 22:34 UTC

    Okay..

    You're asking for a deterministic pseudo-random sequence. I use the term 'pseudo-random' because, as Von Neumann once said:

    "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."

    For the sake of discussion, though, let's assume that a pseudo-random sequence is one that meets the following criteria:

    • it distributes numbers evenly
    • it distributes the residues between numbers evenly:
      ($residue[$N][$i] = $random[$i] - $random[$i-$N], for instance)
    • and it can still be generated algorithmically.

    The problem isn't hard to solve, but you probably want to stick to an accepted solution. Randomness is subtle, and iterative systems tend to collapse into orderly patterns. There's a lovely section in Seminumerical Algorithms where Knuth talks about trying to invent a 'better' random number generator, and getting something that collapsed into short-period sequences almost instantly (1). He then goes on to show that most naieve RNGs do exactly the same thing.. they start off well, then fall into a loop that generates the same (short) sequence of numbers over and over again.

    (1) - Ya gotta love a guy who knows that much about programming, and still tells "boy, did I screw up" stories. ;-)

    All algorithmic RNGs loop eventually, and the size of the loop is called the RNG's period. The algorithms we use are the ones that we can prove generate sequences with the longest possible period for their input.

    For what you're doing, a simple linear-congruential RNG should be good enough. LCRNGs are not good enough for applications that require serious randomness (like cryptography), because the numbers tend to fall along 'lattice points' when you graph them in three dimensions. For crypto, I'd suggest you pick up Bruce Schneier's Applied Cryptography and read the chapter on random numbers. (Actually, I'd suggest everyone read the whole thing, because it's a damn good book)

    Anyway.. code:

    $MAX = 1024; ## the largest random value, and the length of ## our RNG's period. you can use any power ## of two if you want a larger (or smaller) ## period. $A = 5; ## $A should be relatively prime to $MAX, and ## ($A-1) should be divisible by 4 if $MAX is ## divisible by 4 (it is). $C = 113; ## $C must be relatively prime to $MAX, and ## 113 is prime, full stop. $SEED = 42; ## just a starting value, and all good ## _Hitchhiker's Guide_ fans love 42. sub lcrng { if ( ! defined $CURRENT ) { $CURRENT = $SEED; } $CURRENT = (($CURRENT * $A) + $C) % $MAX; return ($CURRENT / $MAX); } ## and some test code for $i (1..1032) { printf "%5.3f ", lcrng(); print "\n" if (0 == $i % 8); }

    The first and last lines should be the same, unless something's very wrong with your system.