hawtin has asked for the wisdom of the Perl Monks concerning the following question:

I have an array of strings (some of which may possibly contain characters in the range \x7f-\xff. I need to concatenate all the strings together at some point. Originally I had:

$combine = join('',@ret);

However a for some reason (lost in the mists of time, I suspect it was something to do with efficiency for a few important cases) I changed all the concatenations to:

$combine = pack("a*" x ($#ret + 1), @ret);

Now that worked for a considerable time until I recently discovered an unexpected behaviour. This can be illustrated by:

foreach my $item (@ret) { if($item =~ /([\x7f-\xff])/) { print "Array ...".substr($`,-5)." ||".$1. "|| ".substr($',0,5)."\n"; } } my $via_join = join('',@ret); if($via_join =~ /([\x7f-\xff])/) { print "Join ...".substr($`,-5)." ||".$1. "|| ".substr($',0,5)."\n"; } my $via_pack = pack("a*" x ($#ret+1),@ret); if($via_pack =~ /([\x7f-\xff])/) { print "Pack ...".substr($`,-5)." ||".$1. "|| ".substr($',0,5)."\n"; }

Which prints out:

Array ...s Neg ||≤|| cios
Join ...s Neg ||≤|| cios
Pack ...s Neg ||├|| │cios

Why does the pack modify the character? Is there a better way to concatenate or will I have to discover why I rejected join() all those months ago?

This is running under 5.8.6 on a Windows XP machine. The text strings orginated in an XML file (ie using ó) and are meant to be "utf-8".

Thanks for any help

Replies are listed 'Best First'.
Re: Various ways to concatenating an array of strings
by GrandFather (Saint) on Mar 30, 2006 at 18:59 UTC

    I'd be inclined to use join even if it were somewhat slower. However, my guess is that join is faster and actually, a benchmark bears that out:

    use strict; use warnings; use Benchmark qw(cmpthese); my @strings = qw( xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz ppppppppppppppppppppppppppppppppppppppppppppppppppppppppp qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr sssssssssssssssssssssssssssssssssssssssssssssssssssssssss ttttttttttttttttttttttttttttttttttttttttttttttttttttttttt ); cmpthese (-1, { join => sub {join '', @strings}, pack => sub {pack("a*" x ($#strings + 1), @strings)}, } );

    Prints:

    Rate pack join pack 452302/s -- -58% join 1073851/s 137% --

    DWIM is Perl's answer to Gödel

      As I said in my original post I rejected join() for some good reason months ago. All your reply says is that join is the most obvious solution (which I agree it is), but it didn't work in my case.

      Thank you for your input but your test of course proves nothing, my issue was with thousands of differently sized strings not 8 strings the same size. If I rewrite your test as:

      use strict; use warnings; use Benchmark qw(cmpthese); my @strings; for(my $i=0;$i<1000;$i++) { $strings[$i] = (sprintf("%06d",$i)) x rand(1000); } my $asign1; my $asign2; cmpthese (-1, { pack => sub {$asign2 = pack("a*" x($#strings + 1),@strings)}, join => sub {$asign1 = join '', @strings}, } );

      Then the results are:

      Rate join pack join 108/s -- -21% pack 137/s 27% --

      Having played with this I now recall that the efficiency problem was not with speed but with memory usage. Under some pathalogical cases the join was consuming vast quantities of memory (that was under a previous version of Perl and maybe I should study it again).

      So thank you for helping me illustrate that pack is quicker than join (in some cases) :-)

        First off, I don't like that rand there - it makes things much more difficult to replicate. So, after running your test as-is a few times, I went and just changed "rand(1000)" to "1000" and reran it. I got slightly more consistant answers when the strings where consistantly sized.

        At no time have I had pack outperform join. Although my CPU seems to be a bit slower than yours ... I never got over 100 runs per second.

        As-is: Rate pack join pack 80.0/s -- -15% join 93.7/s 17% -- Always 1000: Rate pack join pack 46.0/s -- -8% join 50.0/s 9% -- x $i rather than x 1000 or x rand(1000): Rate pack join pack 74.0/s -- -14% join 86.0/s 16% --
        Since it seems that the longer the strings, the closer they were, I tried 10,000.
        Rate pack join pack 5.66/s -- -5% join 5.94/s 5% --
        It doesn't seem that I can get pack outperforming join on this issue. And that's probably despite being utf-8 aware...

        Tests above were run on a threaded perl 5.8.7 on Linux

        According to your Benchmarks, join is taking 0.009 seconds to execute compared to pack's 0.007. Are you sure you need those extra 2 milliseconds?
Re: Various ways to concatenating an array of strings
by swampyankee (Parson) on Mar 30, 2006 at 18:59 UTC

    Could pack be treating the individual characters in the string as signed (lower case "c" in pack) vs unsigned characters (upper case "C" in pack)?

    emc

    "Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. "
    —G. Steele
Re: Various ways to concatenating an array of strings
by whio (Beadle) on Mar 30, 2006 at 21:11 UTC
    I have wished before that the language had a built-in to handle the special case of join('', ...), although that has mostly been when I'm trying to write obfuscated Perl, so it is not a significant lack. You could do it with interpolation:
    { local $"; $scalar = "@array"; }
    but my guess is that join is the best way to go about it. TMTOWTDI but as duff pointed out, using another method will be much less clear when reading the code.

      Thank you for actually answering my question, rather than telling me I don't know what I am doing. I had not considered interpolation it sounds like a good alternative.

      As I mentioned in the original post join caused some serious issues in certain pathalogical cases, the fact that it is the "obvious" solution doesn't help if it kills the machine for some (rare) cases.

      Having a solution that works reliably is of slightly higher priority than the extra few lines of comments that an obscure construct will require.

Re: Various ways to concatenating an array of strings
by duff (Parson) on Mar 30, 2006 at 19:09 UTC

    You are clearly insane for abandoning the obvious and clear solution in favor of something more obfuscated. :-) Go back to the join.