in reply to Re^3: split file
in thread split file

Thanks for your help so far. I've got through the previous error and this code works fine with the data I've posted. I made some changes to reflect the actual data, but I now have the same problem where the output doesn't show up in the right order.
This's my actual data file:
#In this example, the 13 digit bill numbers are right after the date ( +20061212), eg. 123456789A101, C234567891011, A001122334455. The heade +r line starts with 101010000 and bill details line starts with 301033 +000. The end of bill details line is identified by 999999. 101010000ABC 20060212123456789A1018880001000000001 123456789A101 +00216 4322489239649 342 10 00000 20060212 + 105060000ABC 20060212123456789A1018880001000000002 ADDRESS + 0 107660000ABC 20060212123456789A1018880001000000003 ADDRESS LINE +2 20 108250100ABC 20060212123456789A1018880001000000004 123456789A101 +00216 + 109300100ABC 20060212123456789A1018880001000000005 + 303-0 0 101250100ABC 20060212123456789A1018880001000000006 IUSER@MAIL.DO +MAIN.COM 108430100ABC 20060212123456789A1018880001000000008 00000 12345 +6789A10100216 0000000840M0000000000{0000000000{0000000000{0000000000{ +0000000000{0000000840M 0000000000{ 100550000ABC 20060212123456789A1018880001000000009 123456789A101 +00216 NC2005110920051208200512090000000000{0000000000{0000000000{0000 +000000{0000000000{0000000000{0000000000{000000000520 100610000ABC 20060212123456789A1018880001000000010 123456789A101 +00216 NC0000000000{0000000000{0000000000{0000000000{0000000000{000000 +0000{0000000000{0000000000{0000000000{0000000840M0000 520 102210000ABC 20060212123456789A1018880001000000011 * * * REVISED + FINAL BILL MESSAGE * * * THIS REVISED FINAL BILL + INCLUDES CHARGES OR CREDITS NOT PREVIOUSLY 104500000ABC 20060212123456789A1018880001000000012 APPLIED TO YO +UR ACCOUNT. A SUMMARY IS PROVIDED ON ANOTHER PAGE OF THIS BI +LL. ANY CREDITS DUE YOU ARE REFLECTED ON THIS BILL. 102250000ABC 20060212123456789A1018880001000000013 SHOULD YOU HA +VE ANY QUESTIONS CONCERNING YOUR ACCOUNT, CONTACT 101250100ABC 20060212123456789A1018880001000000014 YOUR CUSTOMER + SERVICE CENTER 105500000ABC 20060212123456789A1018880001000000015 THANK YOU FOR + THE OPPORTUNITY TO SERVE YOU. 108456100ABC 20060212123456789A1018880001000000016 123456789A101 +00216 0000000840M0000000000{00 +00000000{ometimes, the data can also look like this or any other variation
Here's what the code looks like:
use warnings; use strict; use diagnostics; my %bills; my $currBill = 'void'; my $type = 'header'; $datafile = "datafile.txt"; open(DATA, $datafile) || die ("Cannot open $datafile: $!"); while (<DATA>) { chomp; if ($_ =~ /^101010000/) { $currBill = $1; $type = 'header'; } elsif ($_ =~ /^301033000/) { $currBill = $1; $type = 'data'; } else { push @{$bills{$currBill}{$type}}, $_; } } close(DATA); open($outputdata, '>', 'outputfile.txt' || die "Cannot open output fil +e output.txt: $!"); for my $bill (sort keys %bills) { print $outputdata ">>>> $bill\n"; print $outputdata "Header\n" . (join "\n", @{$bills{$bill}{'header +'}}) . "\n" if exists $bills{$bill}{'header'}; print $outputdata "Data\n" . (join "\n", @{$bills{$bill}{'data'}}) + . "\n" if exists $bills{$bill}{'data'}; }
I have spent much time on this but I still cannot have it working properly. Please help.

Replies are listed 'Best First'.
Re^5: split file
by GrandFather (Saint) on Feb 14, 2006 at 11:06 UTC

    You need to select the bill number for $currBill. Your current regex match patterns don't include capture brackets so there is no meaningfull value in $1. Try something like these:

    if (/^1010100.{17}(.{13})/) { } elsif (/^3010330.{17}(.{13})/) {

    DWIM is Perl's answer to Gödel
      That doesn't seem to work for me. It outputs the original file as before. Also, what does the . represent? I'm not too good with regular expressions.

      Thanks.