steidley has asked for the wisdom of the Perl Monks concerning the following question:

I have a file that is mixed text and binary. This file is also a "join" of multiple smaller files, all with the same format. I need to split this single file into its smaller parts.

Essentially, the individual files have a text header followed by a variable length of binary data. Here is the code:

use File::Find; use strict; #Create a list of files my @file_list = (); my $dir = shift(@ARGV); find (\&find_files, $dir); my $file_count = @file_list; my $cur_file_num = 1; my $file_name = ""; my $OFH; foreach $file_name (@file_list){ open my $IFH, "<", $file_name or die "Could not open input file: $file_name\n"; print "Working on $cur_file_num/$file_count: $file_name\n"; #Create the directory that will hold the results $file_name =~ /.*\/(.*).blg$/; my $directory_name = $dir . "\\" . $1; print "Creating directory: $directory_name\n"; mkdir $directory_name; #Set up some variables my $line = ""; my $output_file_name = ""; my $int_file_count = 0; my $reading_data = 0; my $data_format = ""; my $bytes = 0; my $data_struct = 0; #Now run throught the file and split it up while (<$IFH>){ $line = $_; if ($reading_data){ $reading_data = 0; $/ = "\n"; } if ($line =~ /FileName: (.*)/){ chomp($output_file_name = $1); $int_file_count++; print "\t\t\tOpening: " . $directory_name . "\\" . $output +_file_name . "\n"; open $OFH, ">", $directory_name . "\\" . $output_file_name or die "Could not open output file: $output_file_name\ +n$!"; print "\tFound: $output_file_name\n"; } if ($line =~ /DataStruct: (.*)/){ $data_struct = $1; } if ($line =~ /DataForm:/){ $data_format = substr $_, 9; # remove header $data_format =~ tr/,//d; # remove commas $bytes = ( () = $data_format =~ /D/g) * 8; # Count the +doubles floats $bytes += ( () = $data_format =~ /L/g) * 4; # Count the + longs $bytes += ( () = $data_format =~ /S/g) * 2; # Count the + shorts $bytes += ( () = $data_format =~ /F/g) * 4; # Count the + single floats $bytes *= $data_struct; # Multiply by the n +umber of records print "\t\tData size: $bytes\n"; } if ($line =~ /NextFile:/){ $/ = \$bytes; # change how much data is read in at one +time $reading_data = 1; } print $OFH $line; } } ################################################### #find_file subroutine called by "find" sub find_files { if ($_ =~ /.*\.blg/){ push @file_list, $File::Find::name; # print "$File::Find::name\n"; } }

This script works.

It should be noted that the input file is created using a script on a Linux box, I am working on a Win7 box. So, when I compare the data in a hex editor, I find that any occurrence of '0x0A' is replaced with '0x0A 0x0D'.

So, I figure that I need to use some form of 'binmode' or ">:raw" when I open the input file. But, when I do, I get an error at the opening of the output file:

Invalid argument at split_file.pl line 47, <$IFH> line 1.

47 open $OFH, ">", $directory_name . "\\" . $output_file_name or die "Could not open output file: $output_file_name\n$!";

My question is this:

Is this the correct way to handle the apparent auto converting of '0x0A' and if so, why does opening an input file in this mode keep me from opening an output file?

My confusion comes from not understanding the statement "Invalid argument". What is it about opening an input file in binary mode that would require something different in how I open an output file?

I even tried changing line 47 to:

open $OFH, ">:raw", $directory_name . "\\" . $output_file_name

This did not change the results.

Any help is greatly appreciated

David Steidley

Replies are listed 'Best First'.
Re: Open file for output errors
by kennethk (Abbot) on Nov 21, 2013 at 21:09 UTC
    The open wouldn't fail because of intended output. My guess is there is a problem with your file specification. Are you sure you're getting a legal file name? Since Perl is agnostic to forward/backward slashes in paths on Windows, what happens when you change your open statement to:
    open $OFH, ">", "$directory_name/$output_file_name" or die "Could not open output file: $directory_name/$o +utput_file_name\n$!";
    I suppose it might also be related to your odd filehandle scoping, but Perl should be handling that cleanly.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Since Perl is agnostic to forward/backward slashes in paths on Windows...

      I'll be quietly retreating to the Monastery's iconostasis--visiting the Wall--to meditate upon the phrase Perl is agnostic...

      By typing in your suggestion, it allow the script to run... That is kind of a head scratcher! Now all I need to do is figure out how to get the script to stop the auto converting of the '0x0A'. For that I am not ready to throw in the towel just yet. But at least now I have some way of testing it out. Thank you for pointing me in the right direction.