Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a main file that contains multiple bills and I want to split the file into individual bills. I have a script that splits the file when the header line is identified for a new bill and writes this data to an individual file. But the problem is the bill detail is not always located right after the header info. For eg. the file may contain header info for 2 bills in a row and then the bill detail (or any other order). Thus I want to split the file at both the header line and bill detail line and merge them separately according to the bill number. I'm not sure how to add this logic to the existing script.

DATA:
1010100.........123456..... #header line for bill # 123456 #some data #some data #some data #some data 1010100.........678910..... #header line for bill # 678910 #some data #some data #some data 3010330.........123456..... #bill detail for bill 123456 #some data #some data #some data #some data #some data 3010330.........678910..... #bill detail for bill 678910 For eg. each bill should be split and joined like this: 1010100.........123456..... #header line for bill # 123456 #some data #some data #some data #some data 3010330.........123456..... #bill detail for bill 123456 #some data #some data #some data #some data #some data <br>



This's what I have right now.
#/usr/bin/perl -w #This script takes a main file and separates into individual bills by +splitting at the header line. # [-v|--verbose] # [-d|--dest splitdir] filename # use Getopt::Long; GetOptions("verbose" => \$verbose, "dest:s" => \$destdir); if (length $destdir == 0) { $split_file_prefix = $ARGV[0]; } else { $split_file_prefix = $destdir."/".`basename $ARGV[0]`; chomp($split_file_prefix); } ## open file name passed in as arg 1 for reading open(MAIN_BATCH_FILE, "<$ARGV[0]") or die "Couldn't open file : $ARGV[0] \n Message : $!"; $first_line = <MAIN_BATCH_FILE>; $counter = 1; $filename = $split_file_prefix."-SPLIT-".$counter.".txt"; if ($verbose) { print "$filename\n"; } open(NEW_FILE,">".$filename); print NEW_FILE $first_line; while (<MAIN_BATCH_FILE>) { if ($_ =~ /^1010100/) { close(NEW_FILE); $counter++; $filename = $split_file_prefix."-SPLIT-".$counter.".txt"; if ($verbose) { print "$filename\n"; } open(NEW_FILE,">".$filename); } print NEW_FILE "$_"; } close(NEW_FILE); exit 0;
Thank you all in advance.

Replies are listed 'Best First'.
Re: split file
by GrandFather (Saint) on Feb 09, 2006 at 04:36 UTC

    The following should get you started:

    use warnings; use strict; my %bills; my $currBill = 'void'; my $type = 'header'; while (<DATA>) { chomp; if (/^1010100\.+(\d+)/) { $currBill = $1; $type = 'header'; } elsif (/^3010330\.+(\d+)/) { $currBill = $1; $type = 'data'; } else { push @{$bills{$currBill}{$type}}, $_; } } for my $bill (sort keys %bills) { print ">>>> $bill\n"; print "Header\n" . (join "\n", @{$bills{$bill}{'header'}}) . "\n" if exists $bills{$bill}{'header'}; print "Data\n" . (join "\n", @{$bills{$bill}{'data'}}) . "\n" if exists $bills{$bill}{'data'}; } __DATA__ 1010100.........123456..... #header line for bill # 123456 #some data #some data #some data #some data 1010100.........678910..... #header line for bill # 678910 #some data #some data #some data 3010330.........123456..... #bill detail for bill 123456 #some data #some data #some data #some data #some data 3010330.........678910..... #bill detail for bill 678910 data for 678910

    Open your files in the for loop, print their contents as required and then close.


    DWIM is Perl's answer to Gödel
      When I ran this script, the output was actually the same as the DATA file. In addition, it results in the following warning: useless use of a variable in void context at sp_new_load_script.pl line 6 (referring to my %bills;) I can't quite figure out where it's going wrong. Any ideas? Thanks.

        I just copied the code as posted and ran it using AS Perl 5.8.7 (815). It performed as expected. You could try adding use diagnostics; at the start of the code - it may give you more information about the warning.


        DWIM is Perl's answer to Gödel