Manipulating Binary files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(Ovid) Re: Manipulating Binary files by Ovid (Cardinal) on May 14, 2001 at 19:50 UTC
getc is very slow and generally shunned. How large is the file? If it can be read into a scalar, you could use the following: `$bytes =~ s/(?:[^\x0D]\|\x0D(?!\x0A))//;` [download] You could also write that as: `$bytes =~ s/.?(?=\x0D\x0A)//;` [download] It's easier to read, but it's very inefficient (see Death to Dot Star! for details). Aside from that, I use read. Read in chunks of an appropriate size and when you find what you need, substitute out what you don't need, write out the rest to a new file and then continue writing the remainder to a file. Of course, don't forget that if you read in say, 20 bytes at a time, the two bytes you specify could be split and you'll need to test to see if `0x0D` is on the end of one read and `0x0A` is at the beginning of the next. Hideously untested code: #!/usr/bin/perl -w use strict; my $in_file = 'file1.txt'; my $out_file = 'file2.txt'; open IN, "< $in_file" or die "Can't open $in_file for reading: $!"; open OUT, "> $out_file" or die "Can't open $out_file for writing: $!"; binmode IN; # in case we're on a Windows system binmode OUT; my $buffer; my $flag = 0; my $last_byte = 0; while ( read( IN, $buffer, 1024 ) ) { if ( $last_byte and substr( $buffer, 0, 1 ) == 0x0A ) { $flag = 1; $buffer = substr( $buffer, 1 ); } else { $last_byte = 0; } if ( $buffer =~ /\x0D\x0A/ ) { $flag = 1; $buffer =~ s/(?:[^\x0D]\|\x0D(?!\x0A))*//; } $last_byte = 1 if substr( $buffer, -1 ) == 0x0D; last if $flag; } if ( $flag ) { print OUT $buffer or die "Could not write data to $out_file: $!"; while ( read( IN, $buffer, 1024 ) ) { print OUT $buffer or die "Could not write data to $out_file: $ +!"; } } else { warn "Did not find '0x0D 0x0A' in $in_file"; } [download] Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply] [d/l] [select]
(tye)Re: Manipulating Binary files by tye (Sage) on May 14, 2001 at 20:12 UTC
Being paranoid about really big files, I'd probably do: `{ local $/= \4096; binmode(INPUT); binmode(OUTPUT); while( <INPUT> ) { if( s/^.*?(\x0d\x0a)/$1/s ) { print OUTPUT $_; last; } } print OUTPUT $_ while <INPUT>; }` [download] But setting $/ to be a reference to a block size is a recently added feature so be aware that your version of Perl may not support it yet. In which case you can change `<INPUT>` to: `read(INPUT,$_,4096)` - tye (but my friends call me "Tye")	[reply] [d/l] [select]
(Ovid) Re: (tye)Re: Manipulating Binary files by Ovid (Cardinal) on May 14, 2001 at 20:16 UTC
For a one-shot program, I'd be happy to use your code. However, if one is likely to use this repeatedly (which it doesn't sound like), then there is a potential bug. What happens if `0\x0D` is the 4,096th character and `0x0A` is the 4,097th? That would be annoying to track down (ain't boundaries a pain?). Cheers, Ovid Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
(tye)Re2: Manipulating Binary files by tye (Sage) on May 14, 2001 at 20:23 UTC
Oops. `{ binmode(INPUT); binmode(OUTPUT); local $_= ""; while( read( INPUT, $_, 4096, length($_) ) ) { if( s/^.*?(\x0d\x0a)/$1/s ) { print OUTPUT $_; last; } substr( $_, 0, -1 )= ""; } print OUTPUT $_ while read INPUT, $_, 4096; }` [download] Thanks for catching that. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: Manipulating Binary files by MeowChow (Vicar) on May 14, 2001 at 22:07 UTC
Perhaps I'm missing something, but all the answers thus far seem terribly overcomplicated: `{ local $/ = "\x0D\x0A"; binmode INPUT; binmode OUTPUT; while (<INPUT>) { print OUTPUT if $. > 1; } }` [download] MeowChow s aamecha.s a..a\u$&owag.print	[reply] [d/l]
(tye)Re3: Manipulating Binary files by tye (Sage) on May 14, 2001 at 22:09 UTC
If the binary file is very large and the "\x0d\x0a" comes late in the file, then `<INPUT>` is going to read most of the file into memory, which may fail due to the above considerations. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: (tye)Re3: Manipulating Binary files by MeowChow (Vicar) on May 14, 2001 at 22:15 UTC
I would guess that the `0D-OA` sequence appears with regularity in the file, considering that it's the binary record seperator for DOS/Win32 systems, equivalent to "\n" in the *nix world. MeowChow s aamecha.s a..a\u$&owag.print	[reply]