in reply to How to make this substitutions without splitting the strings?

AnomalousMonk pretty much nailed it. The string-based approach appears about 8 times faster than unpack/pack.

#! /usr/bin/perl sub m_upk { (my $p = shift) =~ tr/-\0-\377/xc/; pack 'c*', unpack $p, shift; } sub m_and { (my $p = shift) =~ tr/-\0-\377/\0\377/; ($p &= shift) =~ tr/\0//d; $p; } chomp (our ($str1, $str2) = <DATA>); use Benchmark 'cmpthese'; cmpthese -5, { unpack => q( m_upk $str1, $str2 ), string => q( m_and $str1, $str2 ), }; __DATA__ ---DAAAGLRG--G--G-P-LT-I--A--PG----A-----T----LG---G-YG--------------- +-------------------------SVT----------------------------------------- +--------------------------------------------G-------NV-T------NN---G- +---TI----SVANALPSLASSLPGDFRIF---------------------------------------- +-------------------GTLTNAGVVELRGRVVGN--G-LA-V-S------------G--------N +---Y---VGQN----------------------GAVN-------------MN-TT---------L--AG +--D------------------------------------------------------------------ +--------------------------------------------------------------------- +--------------------------------------------------------------------- +---------------------------------------------G---------------------A- +------PS-------D-TL-LI---------------GGVPA-VATAS---------G----K------ +--T----T---------L--------------------------------------------------- +-------N-----VTNVGG---------------AGAL------------------------------- +-----------------------------------TK-SDGI---------RL-VY------------- +---AVNFA-N---------T-------------------G---N-A--F--TLAG----GTVS--AG-- +--------------------------------------------------------------------- +--AYSYY--------------LV--KGGV-T-------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +-------------A-----------------LTG---------EDWYLR-S------------------ +--------------------------------------------TVPPR-P-DQ---P----------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------- +---------------------------------------------------------------T-QQ-- +PPF------------------------------------------------------------------ +--------------S--V-A---DG-TP-ES--I-----------V------------------E--AV +-K---N----------------A--AP-DA--------------------------------------- +-------------------------------------------------K-------PEP--------- +-----------------------------------------------------V--------------- +---YR---------------------------------------------------------------- +---PEV--PL-YS-----------EVP------------------------------------------ +--------------------------------------------------------------------- +--------------------------------------A--VARQ----------------------LG +---L-L------------Q--------IDT-F-H----------------DRQ-------G-EQG--LL +-----AEN-G-S--------------------------------------------------------- +--------------------------------------------------------------------- +--------------------------------------------------------------------V +P----VSWSRVW-----------GGY---SN------IKQ-NG-------------------------- +DVTPSY--DGTVW-----G--MQVGQ---DLY-----ADNRP-------SGHRNHYGFF----LGF--- +---SR--AIGDVNGFA--------------------------------------LAQPDL--------G +VGSLQVN-A-Y-N----L--G--G-YWT-----------------------------H----IGPG--- +-----------GWYTDA--------------------------VV--MGS-V--LT---V--RTHSN-- +-----------------------------N------NVSGS--T-D--GNA--VTGS-V--EAGV--P- +-I------------SL------G-YG----------L--------------T----L---------E-P +QA-QLLW-QWLS-LA--RFND------G-------V--------------------------------- +-SDV----T--W-----NN-GNTFLGR----IG-ARL--------QY-----AFDAN------GVSWK- +-------------------PYLRVNVLR--S--FG-S--DD----------RTT-----FG-----GS- +---TT------------------------IG-TQ-VG-------Q--T--AGQIGA-GL-VA-Q--LT- +KR----GSVYA--T--V--S---Y---------LT-NL-----GG----E----H----QR----T--- +I--T---GNAGVRW-- XXXXXXXXXXX..X..X.X.XX.X..X..XX....X.....X....XX...X.XX............... +.........................XXX......................................... +............................................X.......XX.X......XX...X. +...XX....XXXXX....................................................... +..................................XXX..X.XX.X.X............X........X +...X...XXXX......................XXXX.............XX.XX.........X..XX +..X.................................................................. +..................................................................... +..................................................................... +.............................................X.....................X. +......XX.......X.XX.XX...............XXX......XX.........X....X...... +..X....X.........X................................................... +.......X.....XXXXXX................XXX............................... +...................................XX.XXXX.........XX.X.............. +....XXXX.X..X....X.X...................X...X.X..X..XXX......XXX..XX.. +..................................................................... +..XXXXX..............XX..XXXX.X...................................... +..................................................................... +..................................................................... +..................................................................... +..................................................................... +..................................................................... +.............X..............XXXXXX.........XXXXXX.X.................. +............................................XXXXX.X.XX...X........... +..................................................................... +..................................................................... +..................................................................... +..................................................................... +..................................................................... +..................................................................... +...............................................................X.XX.. +XXX.........XX..X...X................................................ +..............X..X.X...XX.XX.XX..X...........X..................X..XX +.X...X................X..XX.XX....................................... +.................................................X.......XXX......... +.....................................................X...........X..X +.X.XX................................................................ +...XXX..XX.XX...........XXX.......................................... +..................................................................... +......................................X..XXXX......................XX +...X.XX.........XXX........XXX.X.X................XXX.......X.XXX..XX +......XX.X.X......................................................... +..................................................................... +....................................................................X +X....XIIIIII...........III...II.......XXXX........................... +.XXXXX..XXXXX.....X..XXXXX...XXX.....XXX............XXXXXXX....XXX... +...XX..XXXXXX.............................................X.......... +.XXXXXX.X.X.X....X..X..X.XXX.............................X....XXXX... +............XXXXX..........................XX..XXX.X..XX...X..XXXXXX. +XX..XX......................XX......XXXXX..X.X..XXX..XXXX.X..XXXX..X. +.X............XX......X..X..........X..............X....X.........X.X +XX.XXXX.XXXX.XX..XXXX......X.......XX....X....................X.X.... +.XXX....X..X.....XX.XXXXXXX....XX.XXX........XX.....XXXXX......XXXXX. +...................XXXXXXXXX..X..XX.X..XX....XX...XXXX.....XX.....XX. +...XX............X....XX....XXX.XX.XX.......X..X..XXXXXX.XX.XX.X..XX. +XX....XXXXX..X..X..X...X.........X...X......X....X....X....XX....X... +X..X...XXXXXXXXX
          Rate unpack string
unpack  8128/s     --   -87%
string 61949/s   662%     --

Replies are listed 'Best First'.
Re^2: How to make this substitutions without splitting the strings? (tr/// behavior)
by AnomalousMonk (Archbishop) on Aug 01, 2014 at 04:49 UTC
    (my $p = shift) =~ tr/-\0-\377/\0\377/;

    In the  tr/-\0-\377/\0\377/ expression, the '-' (hyphen) character appears twice in the search list: initially, and also within the  \0-\377 range. In tests I did with some Win32 Perls in the range 5.8 to 5.14, the test code

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'XXXooX'; (my $t = $s) =~ tr/XoX/ab/; print qq{'$t'}; " 'aaabba'
    (and identically for  tr/X\x00-\xff/ab/) always produced the same result: the leftmost occurrence of a character in the search list is selected for matching to and replacement by the corresponding character in the replacement list.

    I considered using the much neater  tr/-\0-\377/\0\377/ version, but I couldn't find anything in the docs to guarantee the behavior shown in my tests must always prevail. Despite the tests, I didn't feel comfortable using an "undocumented feature". Do you know of any documentation of this "leftmost match" feature in the  tr/// built-in? In a regex, the rule would be "leftmost longest match", but  tr/// isn't really a regex, it's a transliterator — isn't it?

      Do you know of any documentation of this "leftmost match" feature in the tr/// built-in?
      From perlop:

      "If multiple transliterations are given for a character, only the first one is used:

      tr/AAA/XYZ/

      will transliterate any A to X."

        Ah, the undocumented, documented! That feels better! I looked for just this sort of assertion (a couple of times) and couldn't find it. Thanks!