Building binary strings.

kyle has asked for the wisdom of the Perl Monks concerning the following question:

Gentlemonks,

At $work, a test suite has these three different methods of building a string sprinkled throughout it. Since the tests are for utf8 handling, I'm assuming that these are designed to produce a string with particular bytes in them without alerting Perl to their UTF-8 nature. The methods are:

The double quoted string of \x escapes.
```
"\x{aa}\x{42}\x{fe}"
[download]
```

Concatenation of chr(hex).

( chr(hex('AA')) . chr(hex('42')) . chr(hex('FE')) )
[download]

packed hex numbers

pack( "C*", 0xaa, 0x42, 0xfe )
[download]

Is any of these methods preferable over the others? Is there any difference between them? Is there a way still better than any of them?

Comment on Building binary strings. Select or Download Code

Replies are listed 'Best First'.
Re: Building binary strings. by pc88mxer (Vicar) on Nov 28, 2007 at 05:29 UTC
They all will do the same thing, so choose the method that you find the most appealing. However, since they represent the UTF-8 encoding of some characters, consider working with code-points like this: `use Encode; ... $x = chr(65533); $y = Encode::encode('utf-8', $x); # -> "\x{aa}\x{42}\x{fe}"` [download] When dealing with perl strings, it is helpful to keep in mind the following: 1. perl strings are just an array of numbers, and the numbers (characters) can be interpreted as either a Unicode code-points or as byte values 2. if the characters (numbers) in a string are meant to be interpreted as code-points, we call it "text" and if they are meant to be interpreted as byte values we call the string "binary data". The point is that the string "\x{aa}\x{42}\x{fe}" can be interpreted as either three Unicode code-points (U+00AA, U+0042, U+00FE) or as three bytes (0xaa, 0x42, 0xfe), and only the programmer knows what the correct interpretation is. Here are some examples of the difference. If a string (say `$x`) is meant to contain code-points, then the following usage of `$x` is logically incorrect even if perl does not report an error: `$y = Encode::decode('some encoding', $x); binmode STDOUT, ":bytes"; print $x; ...` [download] Conversely, if `$x` contains byte values, the following are incorrect uses of `$x`: `$y = Encode::encode('some encoding', $x); $n = rindex($x, "\N{WHITE SMILEY FACE}"); # need: use charnames ':full +'; ...` [download] In these cases, perl may return a result, but the result is meaningless. Hope this helps. Or better yet, hope this generates some more questions :-)	[reply] [d/l] [select]
Re: Building binary strings. by KurtSchwind (Chaplain) on Nov 28, 2007 at 14:28 UTC
They all work. Choose your preference. I'd probably base the choice on where the data originates. Personally, I like working with hex, so I've used the pack option in the past as it just resonated with me. I also think it reads the easiest. -- I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.	[reply]