Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, All,

I have a binary file that possesses a text header (lines begin with #). What is the absolute safest way to elimiate the text header *only* without compromising the integrity of the file?

Any suggestions would be much appreciated.

Thank You,

-fiddler42

  • Comment on How to eliminate text portion of a binary file?

Replies are listed 'Best First'.
Re: How to eliminate text portion of a binary file?
by matija (Priest) on May 14, 2004 at 21:01 UTC
    Since the file is binary, you don't want to be reading it with angle brackets (<>) because you don't know if there are ANY new line characters in the part of the file after the headers. Use sysread, and keep adding to the buffer, until you see the end of headers marker. Then just remove the headers with a regex, and cycle through syswrite and sysread (with the appropriate sized buffer, like 16 or 64 K) until you reach the end of file.

    Update: here is the example code:

    my $buf=""; #append to buffer until headers are found and removed while (!($buf=~s/^.*?#### end_ascii_header\n//)) { sysread(INP,$buf,1024,length($buf)); } syswrite(OUT,$buf,$length($buf)); #copy the rest of the file while (sysread(INP,$buf,16384)) { syswrite(OUT,$buf,length($buf)); }
    P.s. In production code, checking the return values of syswrite would be prudent.
      ... you don't want to be reading it with angle brackets (<>) because you don't know if there are ANY new line characters in the part of the file after the headers

      Unless of course you set $/ to a reference to a scalar value containing the number of bytes you want to read each time, in which case it'll work just fine and return that many bytes each read newlines or no.

      Hi,

      Ahhh...can you show that in an example? I would appreciate it very much!

      Thanks,

      -fiddler42

Re: How to eliminate text portion of a binary file?
by thor (Priest) on May 14, 2004 at 20:44 UTC
    Are there a fixed number of them? Are the lines of fixed length? The thing about "binary" data is that anything goes: any value is valid. So, the more we know about your header, the better equipped we are to help...

    thor

      The header will always end with the line:

      #### end_ascii_header

      Thanks,

      -fiddler42

        Try this, then:
        use strict; use warnings; open(my $fh, "/path/to/file") or die $!; while(<$fh>) { last if $_ eq "#### end_ascii_header\n"; } #$fh is now positioned at the beginning of the binary data; do what yo +u will

        thor

        Update:Fixed a typo in the code (missing 'if')
Re: How to eliminate text portion of a binary file?
by BrowserUk (Patriarch) on May 15, 2004 at 04:27 UTC

    This one-liner seems to do the job.

    perl -pe"BEGIN{ $/='#### end of header'; <>}" mixed.dat > bin.dat

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail