in reply to How to make this substitutions without splitting the strings?

AnomalousMonk pretty much nailed it. The string-based approach appears about 8 times faster than unpack/pack.

#! /usr/bin/perl sub m_upk { (my $p = shift) =~ tr/-\0-\377/xc/; pack 'c*', unpack $p, shift; } sub m_and { (my $p = shift) =~ tr/-\0-\377/\0\377/; ($p &= shift) =~ tr/\0//d; $p; } chomp (our ($str1, $str2) = <DATA>); use Benchmark 'cmpthese'; cmpthese -5, { unpack => q( m_upk $str1, $str2 ), string => q( m_and $str1, $str2 ), }
          Rate unpack string
unpack  8128/s     --   -87%
string 61949/s   662%     --

Replies are listed 'Best First'.
Re^2: How to make this substitutions without splitting the strings? (tr/// behavior)
by AnomalousMonk (Archbishop) on Aug 01, 2014 at 04:49 UTC
    (my $p = shift) =~ tr/-\0-\377/\0\377/;

    In the  tr/-\0-\377/\0\377/ expression, the '-' (hyphen) character appears twice in the search list: initially, and also within the  \0-\377 range. In tests I did with some Win32 Perls in the range 5.8 to 5.14, the test code

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'XXXooX'; (my $t = $s) =~ tr/XoX/ab/; print qq{'$t'}; " 'aaabba'
    (and identically for  tr/X\x00-\xff/ab/) always produced the same result: the leftmost occurrence of a character in the search list is selected for matching to and replacement by the corresponding character in the replacement list.

    I considered using the much neater  tr/-\0-\377/\0\377/ version, but I couldn't find anything in the docs to guarantee the behavior shown in my tests must always prevail. Despite the tests, I didn't feel comfortable using an "undocumented feature". Do you know of any documentation of this "leftmost match" feature in the  tr/// built-in? In a regex, the rule would be "leftmost longest match", but  tr/// isn't really a regex, it's a transliterator — isn't it?

      Do you know of any documentation of this "leftmost match" feature in the tr/// built-in?
      From perlop:

      "If multiple transliterations are given for a character, only the first one is used:

      tr/AAA/XYZ/

      will transliterate any A to X."

        Ah, the undocumented, documented! That feels better! I looked for just this sort of assertion (a couple of times) and couldn't find it. Thanks!