I have a question about encoding, string comparison, and read() that is best expressed through an example. Here goes.

Suppose I am reading from a "binary" file. For this example let's use an IFF file since it's pretty familiar. Part of the process of decoding IFF involves checking the "Type ID" - a 4-byte indicator of chunk type, similar to a FourCC. Example type IDs might be "FORM", "LIST" etc. (but being 32 bits long, also quickly comparable to an int32 as 0x464f524d). Reading and doing something with the Type ID might look like this:

open(my $fp, '<:raw', $filename) or die "Couldn't open $filename: $!"; read($fp, my $type_id, 4); if ($type_id eq 'FORM') { ... }

Now my question is: is this string comparison always "safe"? In Python I think this would be an error, because it's comparing a "string" to what is technically a "bytearray". That language would force you to make an explicit conversion. In Perl, this is allowed, but I don't know if there are dangers. One option would be to "unpack" like so:

read($fp, my $buffer, 4); my $type_id = unpack('A4', $buffer); if ($type_id eq 'FORM') { ... }

Now I've guaranteed that it is an ASCII string, but, have I really gained anything? Or is this overkill?

What's the encoding of a "string" read from a filehandle opened with :raw or changed with binmode()? What about the encoding of literal strings within my Perl script?


In reply to read() and string comparison by hornpipe2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.