efficient string translation?

5mi11er has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

After dilligent searching, I've been unable to find any discussions of how to efficiently translate (tr// or y//) strings when one or both of the translation maps are variable.

Understanding that there are many with strongly held beliefs that passwords of any sort do not belong in scripts or files, I'm, never the less, forced to deal with this very situation.

So, after perusing the applicable archives, I've generated my own modified version of a rot-13 type routine that includes, in the obfuscation, all printable ascii characters (the space being considered non-printable).

Within the routine, included below, we generate the appropriate "destination" translation string depending on how far we wish to rotate the printable ascii table. This then makes it necessary to eval the tr//.

As far as Programmer efficiency goes, this is a good option as it is short, effective, and relatively easy to understand and maintain. However, machine wise, evals are expensive. AYRNIEU has a rot-13 module posted that goes through each character of the string and modifies it as needed. This option might be speed and space efficient, but it is, at least arguably, less programmer/maintenance efficient.

Viewing the caesar solving routine, tachyon's solution to have a static translation of one off, and iterating with it the number of times you wish to rotate does a pretty good job of balancing the programmer and speed efficiency. Space efficiency is still not an issue.

Swinging wildly in the space inefficiency arena, one could create a separate static translation for all options, should be faster than tachyon's iteration option, but ugly in the programmer & space areas.

Another option to reduce iterations, and space needed, would be to have static translations for 1, 2, 4, 8, 16, 32, 64 rotations off...

So, how efficient are the various non-eval versions? (I'm a neophyte when it comes to benchmarks; but, I've seen an example benchmark that shows just how bad eval's are when done in bulk) Which options do you like and why? What other options have I not explored?

-Scott

######################################################################
+######
# pwdrot uses the idea of rot13 and expands the characters affected by
+ the
# rotations to include all 94 normal printable ascii characters. 
# ie. chr(33) '!' - chr(126) '~'.  The rotations are thus mod 94.
#
# Used to obfuscate passwords, NOT encrypt them.
# 
# Default rotation, if not supplied, is 47
######################################################################
+######
sub pwdrot {
    my $pwd = shift;
    my $degree = (@_ > 0) ? ((shift) % 94) : 47;

    if ($degree == 0) {
        return $pwd;
    }

    if (length($str) == 0) {
        return $pwd;
    }

    $rangestr = "\\" . sprintf("%03lo",$degree+33) . "-\\176\\041-\\" 
+. sprintf("%03lo",$degree+32);

    {
        local $_;

        eval { #Can't do string interpolation within 'tr' without 'eva
+l'ing it.
            $_ = $pwd;
            eval "tr[\041-\176][$rangestr];";
            $_;
        };

        $pwd = $_;
    }

    return $pwd;
}
[download]

Comment on efficient string translation? Download Code

Replies are listed 'Best First'.
Re: efficient string translation? by Tanktalus (Canon) on Jan 27, 2005 at 19:31 UTC
If all you're trying to accomplish is to reduce the effects of an eval in a tight loop, you could cache it. `{ my %pwdrots; sub pwdrot { my $pwd = shift; my $degree = (@_ > 0) ? ((shift) % 94) : 47; if ($degree == 0) { return $pwd; } if (length($str) == 0) { return $pwd; } unless ($pwdrots{$degree}) { my $rangstr = "\\" . sprintf("%03lo",$degree+33) . "-\\176\\041- +\\" . sprintf("%03lo",$degree+32); $pwdrots{$degree} = eval "sub { $_[0] =~ tr[\041-\176][$rangstr] + }" } $pwdrots{$degree}->($pwd); return $pwd; } }` [download] The idea is that you generate an anonymous sub which does the tr, once that is compiled once, you don't need to compile it again (for the same $degree).	[reply] [d/l]
Re^2: efficient string translation? by 5mi11er (Deacon) on Jan 27, 2005 at 19:59 UTC
Ah, good thought. So, by doing this the "compiled" eval is kept in memory and this eval, if called again is much more efficient? Oh, wait! The eval is only done once per degree, the compiled translation subroutine is assigned to the %pwdrots hash, then used as needed. Excellent! This makes a file of x strings all using the same transformation a lot more efficient. In reality, I'm hoping for a slightly more philisophical discussion of the pro's and con's of the various methods available. Holding some hope for a magical "use this instead of an eval'ed tr//".	[reply]
Re: efficient string translation? by ambrus (Abbot) on Jan 28, 2005 at 09:04 UTC
If you want to use the same transliteration more than once, you could try evalling it to a sub like this: `my $tr = "/a-z/n-za-m/"; my $trs = eval qq{sub {\$_[0]=~tr$tr}}; $s = "uryyb, jbeyq\n"; &$trs($s); print $s;` [download] This way, you can use the same transliteration any number of times and compile it only once. You can even cache multiple tr patterns like this: `{ my %trd; sub trd { my $tr = $_[0]; $trd{$tr} \|\|= do { warn "compilin +g tr$tr"; eval qq{sub {\$_[0]=~tr$tr}} }; } } $t = $s = "hello, world\n"; &{trd("/a-z/n-za-m/")}($s); print $s; &{tr +d("/a-z/A-Z/")}($t); print $t; &{trd("/a-z/n-za-m/")}($s); print $s;` [download] Update: If you want to compile translation tables only once, I can ask, can't I compile a printf/scanf pattern, or a pack/unpack pattern once? I don't think creating a translation table would be so slow. One would have to benchmark it to say anything of course.	[reply] [d/l] [select]