in reply to Re^3: split file
in thread split file

Thanks for your help so far. I've got through the previous error and this code works fine with the data I've posted. I made some changes to reflect the actual data, but I now have the same problem where the output doesn't show up in the right order.
This's my actual data file:
#In this example, the 13 digit bill numbers are right after the date ( +20061212), eg. 123456789A101, C234567891011, A001122334455. The heade +r line starts with 101010000 and bill details line starts with 301033 +000. The end of bill details line is identified by 999999. 101010000ABC 20060212123456789A1018880001000000001 123456789A101 +00216 4322489239649 342 10 00000 20060212 + 105060000ABC 20060212123456789A1018880001000000002 ADDRESS + 0 107660000ABC 20060212123456789A1018880001000000003 ADDRESS LINE +2 20 108250100ABC 20060212123456789A1018880001000000004 123456789A101 +00216 + 109300100ABC 20060212123456789A1018880001000000005 + 303-0 0 101250100ABC 20060212123456789A1018880001000000006 IUSER@MAIL.DO +MAIN.COM 108430100ABC 20060212123456789A1018880001000000008 00000 12345 +6789A10100216 0000000840M0000000000{0000000000{0000000000{0000000000{ +0000000000{0000000840M 0000000000{ 100550000ABC 20060212123456789A1018880001000000009 123456789A101 +00216 NC2005110920051208200512090000000000{0000000000{0000000000{0000 +000000{0000000000{0000000000{0000000000{000000000520 100610000ABC 20060212123456789A1018880001000000010 123456789A101 +00216 NC0000000000{0000000000{0000000000{0000000000{0000000000{000000 +0000{0000000000{0000000000{0000000000{0000000840M0000 520 102210000ABC 20060212123456789A1018880001000000011 * * * REVISED + FINAL BILL MESSAGE * * * THIS REVISED FINAL BILL + INCLUDES CHARGES OR CREDITS NOT PREVIOUSLY 104500000ABC 20060212123456789A1018880001000000012 APPLIED TO YO +UR ACCOUNT. A SUMMARY IS PROVIDED ON ANOTHER PAGE OF THIS BI +LL. ANY CREDITS DUE YOU ARE REFLECTED ON THIS BILL. 102250000ABC 20060212123456789A1018880001000000013 SHOULD YOU HA +VE ANY QUESTIONS CONCERNING YOUR ACCOUNT, CONTACT 101250100ABC 20060212123456789A1018880001000000014 YOUR CUSTOMER + SERVICE CENTER 105500000ABC 20060212123456789A1018880001000000015 THANK YOU FOR + THE OPPORTUNITY TO SERVE YOU. 108456100ABC 20060212123456789A1018880001000000016 123456789A101 +00216 0000000840M0000000000{00 +00000000{0000000840M 107208000ABC 20060212123456789A1018880001000000017 0000000840M + 109080000ABC 20060212123456789A1018880001000000018 2006021200000 +000006000000000050000000000000000000000000000000020000000000000000000 +00000000000000000000000000000000000000000000000 102098100ABC 20060212123456789A1018880001000000019 2006021243224 +89239649 1 + 00000 109990000ABC 20060212123456789A1018880001000000020 N10 NC047 +0 0 101010000ABC 20060212C2345678910118880003000000021 C234567891011 +00114 40002416789 3939 20U 00000 20060114 + 105060000ABC 20060212C2345678910118880003000000022 ADDRESS + 105060000ABC 20060212C2345678910118880003000000023 ADDRESS LINE +2 107660000ABC 20060212C2345678910118880003000000024 C234567891011 +00114 + 109080000ABC 20060212C2345678910118880003000000025 IUSER@MAIL.DO +MAIN.COM 109990000ABC 20060212C2345678910118880003000000026 C234567891011 +00114 301033000ABC 20060212123456789A1018880002000000001 2006021243224 +89000000 306020100ABC 20060212123456789A1018880002000000002 400 306020100ABC 20060212123456789A1018880002000000003 1 306020100ABC 20060212123456789A1018880002000000005 1 CUSTOMER NA +ME ABBREVIATION 306020100ABC 20060212123456789A1018880002000000006 1 CUSTOMER TE +RMINAL LOCATION 306020100ABC 20060212123456789A1018880002000000007 1 CUSTOMER'S +CARRIER NAME ABBREVIATION 306020100ABC 20060212123456789A1018880002000000008 1 AREA IDENTI +FICATION 306020100ABC 20060212123456789A1018880002000000009 1TAR TAX ARE +A 309990000ABC 20060212123456789A1018880002000000010 1TAX TAX APP +LICATION 999999 200602121635 30 301033000ABC 20060212C2345678910118880001000000001 2006011643224 +89000000 306020100ABC 20060212C2345678910118880001000000002 306020100ABC 20060212C2345678910118880001000000003 1 306020100ABC 20060212C2345678910118880001000000004 1 CUSTOMER NA +ME ABBREVIATION 306020100ABC 20060212C2345678910118880001000000005 1 CUSTOMER TE +RMINAL LOCATION 306020100ABC 20060212C2345678910118880001000000006 1 CUSTOMER'S +CARRIER NAME ABBREVIATION 306020100ABC 20060212C2345678910118880001000000007 1 AREA IDENTI +FICATION 306020100ABC 20060212C2345678910118880001000000008 1TAR TAX ARE +A 309990000ABC 20060212C2345678910118880001000000009 1TAX TAX APP +LICATION 999999 200601161635 30 #Sometimes, the data can also look like this or any other variation 101010000Z98765420060212A0011223344558880001000000001 123456789A101 +00216 4322489239649 3424 10N 00000 20060212 + 105060000Z98765420060212A0011223344558880001000000002 ADDRESS + 0 107660000Z98765420060212A0011223344558880001000000003 ADDRESS LINE +2 20 108250100Z98765420060212A0011223344558880001000000004 123456789A101 +00216 + 109300100Z98765420060212A0011223344558880001000000005 + 343-0 0 101250100Z98765420060212A0011223344558880001000000006 IUSER@MAIL.DO +MAIN.COM 108430100Z98765420060212A0011223344558880001000000008 00000 R494319 +57005313 0000000840M0000000000 301033000Z98765420060212A0011223344558880001000000001 2006011643224 +89000000 306020100Z98765420060212A0011223344558880001000000002 306020100Z98765420060212A0011223344558880001000000003 1 306020100Z98765420060212A0011223344558880001000000004 1 CUST NAME 306020100Z98765420060212A0011223344558880001000000005 1 CUSTOMER LO +C 306020100Z98765420060212A0011223344558880001000000006 1 CUSTOMER'S +ABBREVIATION 306020100Z98765420060212A0011223344558880001000000007 1 AREA IDENTI +FICATION 306020100Z98765420060212A0011223344558880001000000008 1TAR TAX ARE +A 309990000Z98765420060212A0011223344558880001000000009 1TAX TAX APP +LICATION 999999 200601161635 30
Here's what the code looks like:
use warnings; use strict; use diagnostics; my %bills; my $currBill = 'void'; my $type = 'header'; $datafile = "datafile.txt"; open(DATA, $datafile) || die ("Cannot open $datafile: $!"); while (<DATA>) { chomp; if ($_ =~ /^101010000/) { $currBill = $1; $type = 'header'; } elsif ($_ =~ /^301033000/) { $currBill = $1; $type = 'data'; } else { push @{$bills{$currBill}{$type}}, $_; } } close(DATA); open($outputdata, '>', 'outputfile.txt' || die "Cannot open output fil +e output.txt: $!"); for my $bill (sort keys %bills) { print $outputdata ">>>> $bill\n"; print $outputdata "Header\n" . (join "\n", @{$bills{$bill}{'header +'}}) . "\n" if exists $bills{$bill}{'header'}; print $outputdata "Data\n" . (join "\n", @{$bills{$bill}{'data'}}) + . "\n" if exists $bills{$bill}{'data'}; }
I have spent much time on this but I still cannot have it working properly. Please help.

Replies are listed 'Best First'.
Re^5: split file
by GrandFather (Saint) on Feb 14, 2006 at 11:06 UTC

    You need to select the bill number for $currBill. Your current regex match patterns don't include capture brackets so there is no meaningfull value in $1. Try something like these:

    if (/^1010100.{17}(.{13})/) { } elsif (/^3010330.{17}(.{13})/) {

    DWIM is Perl's answer to Gödel
      That doesn't seem to work for me. It outputs the original file as before. Also, what does the . represent? I'm not too good with regular expressions.

      Thanks.