Eyeth has asked for the wisdom of the Perl Monks concerning the following question:

Hello.

I'm relatively new to Perl programming (appx. 1 month) and my only book (so far!) is the B&N $9.95 Perl book. I try to rely on existing (Internet) documentation for the bulk of all things Perl on my end.

My main question revolves around Perl's binary file handling. I still have a queasy feeling whenever I try to open a binary file under Perl. I use Perl v5.8.5 on FC3 Linux. Here's a code snippet:

open (FHDLR, "< :raw", "$fname") or die ("Cannot open $fname!\n"); # Opens the file in binary mode. seek (FHDLR, 258, 0); # zero onto FORMAT_TYPE of partition $ftype = ord (getc (FHDLR)); # just get the FORMAT_TYPE variable
When I first started, I thought I could use a C-language 'Jedi' trick by using the getc function to get an individual (unsigned) char and treat it as a 8-bit binary value for $ftype. Needless to say, I found out that it didn't work.

Thinking that I had to treat it as a pure binary file, I added the "< :raw" argument to the open function, and the getc function still didn't work, as Perl insisted on wedging in an actual character for $ftype.

I finally found the ord function and put that thing before the getc function call. This time, it worked, as Perl put in a 8-bit binary (unsigned) integer into $ftype.

So, I've got it working now, but the whole experience has led me to some middling doubts about how Perl handles binary files. Just how does Perl buffer file reads? What about writes, do I have to 'reconvert' these values via the chr function calls?

I can't afford for Perl to munge a binary file by treating it as an ASCII file. My other concern is that by eventually overusing the ord/chr functions in many parts of my Perl script, I may actually slow the script down. I'm also using the read function to read in 256-byte chunks of data into a superarray. Even then, the 256-byte chunks are full of characters, printable and non-printable. So far, the script seems 'Snappy(tm)'.

I was hoping to use Tk along with my initial Perl coding efforts. I'd hate to grapple with the complexities of C language programming and GTK+ programming.

Enjoy.

Replies are listed 'Best First'.
Re: Binary Reading Questions
by Corion (Patriarch) on Jun 30, 2005 at 06:41 UTC

    You will want to look at the pack and unpack functions if you are routinely converting between numbers and their binary representation as characters.

    Usually, Perl does the Right Thing, but if you want to be safe from Unicode encodings, use the binmode() call after opening your file, so you get full control over what Perl writes to the file:

    open my $fh, "<", $fname or die "Couldn't open $fname: $!"; binmode $fh; seek $fh, 258, 0; read $fh, (my $buf), 1; $buf = unpack "c", $buf; ...

    Why do you want to convert the stuff from the character to the integer at all? If there are no calculations to be made, you will likely be faster by not converting at all. But still, the IO overhead of reading the file will likely dwarf any optimizations you might make in your code ...

    Update: PodMaster pointed me to Pack/Unpack Tutorial (aka How the System Stores Data) by pfaut - you should read that tutorial.

    Update 2: frodo72 pointed out a typo in binmmode - thanks frodo72!

      Hello.

      Quote:
      Why do you want to convert the stuff from the character to the integer at all? If there are no calculations to be made, you will likely be faster by not converting at all. But still, the IO overhead of reading the file will likely dwarf any optimizations you might make in your code ...

      Thanks for the information. I've never thought about this I/O overhead in that way. Makes me more comfortable about using the ord/chr functions a whole lot more.

      As for your other question, this Perl script needs to handle binary files from anywhere of 2Mb to 16Mb (maximum). I need to utilize bitwise operators extensively, to seperate out bitfields suitable for use within Perl. Here's an example:

      $type = ord (substr ($entry, 0, 1)) & 191; # get filetype & remove file lock flag
      Also, thank you for pointing me into the direction of pfaut's Pack/Unpack Tutorial. I've already found it quite illuminating. Further study on my end is needed to see if the unpack function can efficiently carve up bitfields.

      I'll use binmode from now on.

      Enjoy.

        You want to look into vec.

        $type = vec($entry,0,8) & 191;

        The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon

      Why do you recommend binmode over :raw?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

        I'm not too familiar with the PerlIO stuff, but it sure isn't in Perl 5.5.3 (but open my $fh isn't either), and I'm not sure if it is in Perl5.6.1. In any case it is necessary that the Perl is built with the slower PerlIO layer and I suspect weird distros like RedHat to build a Perl without it.

Re: Binary Reading Questions
by polettix (Vicar) on Jun 30, 2005 at 10:18 UTC
    As Corion pointed out, binmode is your friend here.

    The trick you're trying to use derives from C treating chars as eight-bit signed integers, aka chars. This is C's approach, which isn't in many other languages, among which you find Perl. That's why you're forced to use ord to perform the conversion.

    Remember that variables are not "strongly" typed in Perl, so it actually doesn't know what a single character is; moreover, Perl supports implicit conversions between stringed numbers and the numbers they represent (e.g. "123" is also read as 123 in numeric contexts, like in a sum), so you would end up with a problem trying to auto-convert a single-char variable like "1": is it 1 or is it 49? C goes the latter, Perl chooses the former.

    As a rule of thumb: Perl is definitively not C (even if it behaves friendly to those who know C :)

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.