comment on

I have a file that is mixed text and binary. This file is also a "join" of multiple smaller files, all with the same format. I need to split this single file into its smaller parts.

Essentially, the individual files have a text header followed by a variable length of binary data. Here is the code:

use File::Find;
use strict;

#Create a list of files
my @file_list = ();
my $dir = shift(@ARGV);
find (\&find_files, $dir);

my $file_count = @file_list;
my $cur_file_num = 1;
my $file_name = "";
my $OFH;

foreach $file_name (@file_list){
    open my $IFH, "<", $file_name
        or die "Could not open input file: $file_name\n";
    print "Working on $cur_file_num/$file_count: $file_name\n";

    #Create the directory that will hold the results
    $file_name =~ /.*\/(.*).blg$/;
    my $directory_name = $dir . "\\" . $1;
    print "Creating directory: $directory_name\n";
    mkdir $directory_name;

    #Set up some variables
    my $line = "";
    my $output_file_name = "";
    my $int_file_count = 0;
    my $reading_data = 0;
    my $data_format = "";
    my $bytes = 0;
    my $data_struct = 0;

    #Now run throught the file and split it up
    while (<$IFH>){
        $line = $_;

        if ($reading_data){
            $reading_data = 0;
            $/ = "\n";
        }

        if ($line =~ /FileName: (.*)/){
            chomp($output_file_name = $1);
            $int_file_count++;
            print "\t\t\tOpening: " . $directory_name . "\\" . $output
+_file_name . "\n";
            open $OFH, ">", $directory_name . "\\" . $output_file_name
                or die "Could not open output file: $output_file_name\
+n$!";
            print "\tFound: $output_file_name\n";
        }

        if ($line =~ /DataStruct: (.*)/){
            $data_struct = $1;
        }

        if ($line =~ /DataForm:/){
            $data_format = substr $_, 9;            # remove header
            $data_format =~ tr/,//d;            # remove commas
            $bytes = ( () = $data_format =~ /D/g) * 8;    # Count the 
+doubles floats
            $bytes += ( () = $data_format =~ /L/g) * 4;    # Count the
+ longs
            $bytes += ( () = $data_format =~ /S/g) * 2;    # Count the
+ shorts
            $bytes += ( () = $data_format =~ /F/g) * 4;    # Count the
+ single floats
            $bytes *= $data_struct;                # Multiply by the n
+umber of records
            print "\t\tData size: $bytes\n";
        }

        if ($line =~ /NextFile:/){
            $/ = \$bytes;    # change how much data is read in at one 
+time
            $reading_data = 1;
        }

        print $OFH $line;
    }
}

###################################################
#find_file subroutine called by "find"

sub find_files {
    if ($_ =~ /.*\.blg/){
        push @file_list, $File::Find::name;
#        print "$File::Find::name\n";
    }
}
[download]

This script works.

It should be noted that the input file is created using a script on a Linux box, I am working on a Win7 box. So, when I compare the data in a hex editor, I find that any occurrence of '0x0A' is replaced with '0x0A 0x0D'.

So, I figure that I need to use some form of 'binmode' or ">:raw" when I open the input file. But, when I do, I get an error at the opening of the output file:

Invalid argument at split_file.pl line 47, <$IFH> line 1.

47     open $OFH, ">", $directory_name . "\\" . $output_file_name
        or die "Could not open output file: $output_file_name\n$!";
[download]

My question is this:

Is this the correct way to handle the apparent auto converting of '0x0A' and if so, why does opening an input file in this mode keep me from opening an output file?

My confusion comes from not understanding the statement "Invalid argument". What is it about opening an input file in binary mode that would require something different in how I open an output file?

I even tried changing line 47 to:

open $OFH, ">:raw", $directory_name . "\\" . $output_file_name
[download]

This did not change the results.

Any help is greatly appreciated

David Steidley

In reply to Open file for output errors by steidley

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.