steidley has asked for the wisdom of the Perl Monks concerning the following question:
I have a file that is mixed text and binary. This file is also a "join" of multiple smaller files, all with the same format. I need to split this single file into its smaller parts.
Essentially, the individual files have a text header followed by a variable length of binary data. Here is the code:
use File::Find; use strict; #Create a list of files my @file_list = (); my $dir = shift(@ARGV); find (\&find_files, $dir); my $file_count = @file_list; my $cur_file_num = 1; my $file_name = ""; my $OFH; foreach $file_name (@file_list){ open my $IFH, "<", $file_name or die "Could not open input file: $file_name\n"; print "Working on $cur_file_num/$file_count: $file_name\n"; #Create the directory that will hold the results $file_name =~ /.*\/(.*).blg$/; my $directory_name = $dir . "\\" . $1; print "Creating directory: $directory_name\n"; mkdir $directory_name; #Set up some variables my $line = ""; my $output_file_name = ""; my $int_file_count = 0; my $reading_data = 0; my $data_format = ""; my $bytes = 0; my $data_struct = 0; #Now run throught the file and split it up while (<$IFH>){ $line = $_; if ($reading_data){ $reading_data = 0; $/ = "\n"; } if ($line =~ /FileName: (.*)/){ chomp($output_file_name = $1); $int_file_count++; print "\t\t\tOpening: " . $directory_name . "\\" . $output +_file_name . "\n"; open $OFH, ">", $directory_name . "\\" . $output_file_name or die "Could not open output file: $output_file_name\ +n$!"; print "\tFound: $output_file_name\n"; } if ($line =~ /DataStruct: (.*)/){ $data_struct = $1; } if ($line =~ /DataForm:/){ $data_format = substr $_, 9; # remove header $data_format =~ tr/,//d; # remove commas $bytes = ( () = $data_format =~ /D/g) * 8; # Count the +doubles floats $bytes += ( () = $data_format =~ /L/g) * 4; # Count the + longs $bytes += ( () = $data_format =~ /S/g) * 2; # Count the + shorts $bytes += ( () = $data_format =~ /F/g) * 4; # Count the + single floats $bytes *= $data_struct; # Multiply by the n +umber of records print "\t\tData size: $bytes\n"; } if ($line =~ /NextFile:/){ $/ = \$bytes; # change how much data is read in at one +time $reading_data = 1; } print $OFH $line; } } ################################################### #find_file subroutine called by "find" sub find_files { if ($_ =~ /.*\.blg/){ push @file_list, $File::Find::name; # print "$File::Find::name\n"; } }
This script works.
It should be noted that the input file is created using a script on a Linux box, I am working on a Win7 box. So, when I compare the data in a hex editor, I find that any occurrence of '0x0A' is replaced with '0x0A 0x0D'.
So, I figure that I need to use some form of 'binmode' or ">:raw" when I open the input file. But, when I do, I get an error at the opening of the output file:
Invalid argument at split_file.pl line 47, <$IFH> line 1.
47 open $OFH, ">", $directory_name . "\\" . $output_file_name or die "Could not open output file: $output_file_name\n$!";
My question is this:
Is this the correct way to handle the apparent auto converting of '0x0A' and if so, why does opening an input file in this mode keep me from opening an output file?
My confusion comes from not understanding the statement "Invalid argument". What is it about opening an input file in binary mode that would require something different in how I open an output file?
I even tried changing line 47 to:
open $OFH, ">:raw", $directory_name . "\\" . $output_file_name
This did not change the results.
Any help is greatly appreciated
David Steidley
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Open file for output errors
by kennethk (Abbot) on Nov 21, 2013 at 21:09 UTC | |
by Kenosis (Priest) on Nov 21, 2013 at 21:31 UTC | |
by steidley (Initiate) on Nov 21, 2013 at 22:09 UTC | |
by Anonymous Monk on Nov 22, 2013 at 03:20 UTC |