in reply to Re^2: pack/unpack binary editing
in thread pack/unpack binary editing

Mostly because I am sure that I won't be getting any interference from IOLayers, Unicode conversions or whatever. That may be paranoia, but I believe that I have had the situation where a random piece of binary data has looked sufficiently like unicode to cause is to be upgraded by some action. This may have been on 5.6.1 before the unicode support was sorted out--by why risk it?

Also, if you're randomly accessing the file and reading bytes, any buffering Perl or the C-runtime does is unlikely to be helpful. I have some evidence that on Win32, you can get a non-useful interaction between PerlIO's caching efforts and those done by the OS itself. One I can avoid, the other not, so I avoid the one I can.

Let me turn your question around: Why wouldn't you use sysread/syswrite/sysseek when processing a binary file?


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^4: pack/unpack binary editing
by blazar (Canon) on Feb 08, 2005 at 14:20 UTC
    Mostly because I am sure that I won't be getting any interference from IOLayers, Unicode conversions or whatever. That may be paranoia, but I believe that I have had the situation where a random piece of binary data has looked sufficiently like unicode to cause is to be upgraded by some action.
    While indeed there are situations in which it can be necessary to use sys*(), it is also true that binmode(), or open()'s '<:raw' mode should that care of your concernes altogether.

    Then one can use Perl's typical IO {operators,functions}. Since the OP underlined that he has to process a whole 9Gb file, chances are that it may be possible to do it one chunk (whatever this may mean, size-wise) at a time with good'ol while (<$fh>), provided that $/ is set accordingly (e.g. local $/=\512).

    This may have been on 5.6.1 before the unicode support was sorted out--by why risk it?
    AFAIKnew unicode support has not been "sorted out", nay, notwithstanding the fact that I do not need it nor have I ever used it, it is my understanding that it's being constantly improved. What has been sorted out is unicode automatic handling (depending on an environment variable, which somehow forced *NIX users to use binmode() too, something they're not used to!)

      Call me old fashioned, but I like to know (as far as possible) what processing is being done. With sys*(), i know nothing is getting between me and the OS.

      I'm never quite so sure when I am using the non-sys versions of the same calls. It seems reasonable to me that even if they end up calling the syscalls, there are layers and decision points to go through to get there?

      Maybe Re^5: pack/unpack binary editing would run just as quickly using read/seek/write--but maybe not. Maybe I'll try it to see.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.