kyle has asked for the wisdom of the Perl Monks concerning the following question:

Gentlemonks,

At $work, a test suite has these three different methods of building a string sprinkled throughout it. Since the tests are for utf8 handling, I'm assuming that these are designed to produce a string with particular bytes in them without alerting Perl to their UTF-8 nature. The methods are:

  1. The double quoted string of \x escapes.
    "\x{aa}\x{42}\x{fe}"
  2. Concatenation of chr(hex).
    ( chr(hex('AA')) . chr(hex('42')) . chr(hex('FE')) )
  3. packed hex numbers
    pack( "C*", 0xaa, 0x42, 0xfe )

Is any of these methods preferable over the others? Is there any difference between them? Is there a way still better than any of them?

Replies are listed 'Best First'.
Re: Building binary strings.
by pc88mxer (Vicar) on Nov 28, 2007 at 05:29 UTC
    They all will do the same thing, so choose the method that you find the most appealing.

    However, since they represent the UTF-8 encoding of some characters, consider working with code-points like this:

    use Encode; ... $x = chr(65533); $y = Encode::encode('utf-8', $x); # -> "\x{aa}\x{42}\x{fe}"
    When dealing with perl strings, it is helpful to keep in mind the following:

    1. perl strings are just an array of numbers, and the numbers (characters) can be interpreted as either a Unicode code-points or as byte values

    2. if the characters (numbers) in a string are meant to be interpreted as code-points, we call it "text" and if they are meant to be interpreted as byte values we call the string "binary data".

    The point is that the string "\x{aa}\x{42}\x{fe}" can be interpreted as either three Unicode code-points (U+00AA, U+0042, U+00FE) or as three bytes (0xaa, 0x42, 0xfe), and only the programmer knows what the correct interpretation is.

    Here are some examples of the difference. If a string (say $x) is meant to contain code-points, then the following usage of $x is logically incorrect even if perl does not report an error:

    $y = Encode::decode('some encoding', $x); binmode STDOUT, ":bytes"; print $x; ...
    Conversely, if $x contains byte values, the following are incorrect uses of $x:
    $y = Encode::encode('some encoding', $x); $n = rindex($x, "\N{WHITE SMILEY FACE}"); # need: use charnames ':full +'; ...
    In these cases, perl may return a result, but the result is meaningless.

    Hope this helps. Or better yet, hope this generates some more questions :-)

Re: Building binary strings.
by KurtSchwind (Chaplain) on Nov 28, 2007 at 14:28 UTC

    They all work. Choose your preference.

    I'd probably base the choice on where the data originates. Personally, I like working with hex, so I've used the pack option in the past as it just resonated with me. I also think it reads the easiest.

    --
    I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.