drodinthe559 has asked for the wisdom of the Perl Monks concerning the following question:

Monks: I have take a fixed line feed text file and separate it into text fields. Then, I need to extract it back out into an Excel file. Below is what I have thus far. I would appreciate any suggestions. Is unpack an option?
#!/usr/bin/perl -w use strict; use warnings; open INPUT, "C:/INPUTFILE" or die "Couldn't open INPUT file: $!"; open OUTPUT, ">C:/OUTPUT.TXT"; my $rcount = 1; local $/ = \1173; while (<INPUT>) { my $string = $_; if ($rcount < 8) {++$rcount; next;} my $FIELD01 = substr($string,0,34); my $FIELD02 = substr($string,34,30); my $FIELD03 = substr($string,64,40); my $FIELD04 = substr($string,104,40); my $FIELD05 = substr($string,144,40); my $FIELD06 = substr($string,184,40); my $FIELD07 = substr($string,224,40); my $FIELD09 = substr($string,264,2); my $FIELD10 = substr($string,265,9); my $FIELD11 = substr($string,275,9); my $FIELD12 = substr($string,284,6); my $FIELD13 = substr($string,290,40); my $FIELD14 = substr($string,330,34); my $FIELD15 = substr($string,364,3); my $FIELD16 = substr($string,367,8); my $FIELD17 = substr($string,375,1); my $FIELD18 = substr($string,376,1); my $FIELD19 = substr($string,377,10); my $FIELD20 = substr($string,387,20); my $FIELD21 = substr($string,407,10); my $FIELD22 = substr($string,417,34); my $FIELD23 = substr($string,451,15); my $FIELD24 = substr($string,466,15); my $FIELD25 = substr($string,481,15); my $FIELD26 = substr($string,496,8); my $FIELD27 = substr($string,504,8); my $FIELD28 = substr($string,512,8); my $FIELD29 = substr($string,520,8); my $FIELD30 = substr($string,528,3); my $FIELD31 = substr($string,531,3); my $FIELD32 = substr($string,534,40); my $FIELD33 = substr($string,574,10); my $FIELD34 = substr($string,584,7); my $FIELD35 = substr($string,591,20); my $FIELD36 = substr($string,611,7); my $FIELD37 = substr($string,618,15); my $FIELD38 = substr($string,633,5); my $FIELD39 = substr($string,638,8); my $FIELD40 = substr($string,646,20); my $FIELD41 = substr($string,666,15); my $FIELD42 = substr($string,695,15); my $FIELD43 = substr($string,711,15); my $FIELD44 = substr($string,726,15); my $FIELD45 = substr($string,741,15); my $FIELD46 = substr($string,756,1); my $FIELD47 = substr($string,757,40); my $FIELD48 = substr($string,797,5); my $FIELD49 = substr($string,802,8); my $FIELD50 = substr($string,810,1); my $FIELD51 = substr($string,811,3); my $FIELD52 = substr($string,814,3); my $FIELD53 = substr($string,817,3); my $FIELD54 = substr($string,820,20); my $FIELD55 = substr($string,840,10); my $FIELD56 = substr($string,850,1); my $FIELD57 = substr($string,851,15); my $FIELD58 = substr($string,866,15); my $FIELD59 = substr($string,881,50); my $FIELD60 = substr($string,931,1); my $FIELD61 = substr($string,932,8); my $FIELD62 = substr($string,940,10); my $FIELD63 = substr($string,951,1); my $FIELD64 = substr($string,952,7); my $FIELD65 = substr($string,959,3); my $FIELD66 = substr($string,962,7); my $FIELD67 = substr($string,969,1); my $FIELD68 = substr($string,970,1); my $FIELD69 = substr($string,971,15); my $FIELD70 = substr($string,986,8); my $FIELD71 = substr($string,994,15); my $FIELD72 = substr($string,1009,3); my $FIELD73 = substr($string,1012,15); my $FIELD74 = substr($string,1027,8); my $FIELD75 = substr($string,1035,20); my $FIELD76 = substr($string,1055,6); my $FIELD77 = substr($string,1061,6); my $FIELD78 = substr($string,1067,20); my $FIELD79 = substr($string,1087,15); my $FIELD80 = substr($string,1102,15); my $FIELD81 = substr($string,1117,40); my $FIELD82 = substr($string,1157,15); printf OUTPUT $string . "\x0A"; ++$rcount } close INPUT; close OUTPUT;

Replies are listed 'Best First'.
Re: Seperating Fixed Line Feed into Fields
by JavaFan (Canon) on Jun 10, 2009 at 20:19 UTC
    I would use unpack to unpack the fields into an array - which can be done in a single command instead of the 82 substrings. Then I'd use one of the CPAN modules that knows excel to write out the file.
      How would that unpack command look? Below is what I have so far.
      push @array, unpack("A34", $string);
        my @fields = unpack('A34 A30 A40 ...', $string);
        or
        my @field_sizes = (34, 40, 40, ...); my $rec_format = join '', map "A$_", @field_sizes; my @fields = unpack($rec_format, $string);

        Then use $fields[0] instead of $FIELD01, $fields[1] instead of $FIELD02, etc.

Re: Seperating Fixed Line Feed into Fields
by Transient (Hermit) on Jun 10, 2009 at 20:31 UTC
    I'd say you'd definitely want to unpack, as JavaFan mentioned. Something similar to:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $rcount = 1; while (<DATA>) { my $string = $_; next if $rcount++ < 8; my @fields = unpack ('A34 A30 A40', $string); print Dumper(@fields), "\n"; } __DATA__ Ignore Me Line 1 Ignore Me Line 2 Ignore Me Line 3 Ignore Me Line 4 Ignore Me Line 5 Ignore Me Line 6 Ignore Me Line 7 This is a very long sentence which should illustrate the unpacking fun +ctionality in accordance to what you require.
    Outputs:
    $VAR1 = 'This is a very long sentence which'; $VAR2 = ' should illustrate the unpacki'; $VAR3 = 'ng functionality in accordance to what y';
    For writing excel spreadsheets, Spreadsheet::WriteExcel does the trick for me.
      Now, that works but every time it loops through it adds to the array instead of creating a new index.
        If you want to save the array off each time (it didn't look like you were doing that in your original code) you can declare an array outside of the loop and push the arrayref of the fields into that larger array. e.g.
        my @big_list_of_fields; while (<DATA>) { ... # unpacking here ... push @big_list_of_fields, \@fields; }
        Edit: I may be reading your question incorrectly. It should declare a new @fields inside each loop iteration though, so there should be nothing added to the array, it's a fresh array each iteration.
Re: Seperating Fixed Line Feed into Fields
by johngg (Canon) on Jun 10, 2009 at 22:50 UTC

    Further to the good advice you have been given regarding the use of unpack and arrays, there are a few other potential issues with your code.

    Firstly, your local $/ ... is not really local unless you confine the scope to a bare block, e.g.

    ... { local $/ = \1173; ... } ...

    Getting into the habit of doing this will minimise the risk of unwanted side effects if reading from another file handle later in the script.

    Secondly, I would recommend the use of the three argument form of open and lexical rather than package filehandles. Also, you only check for success on one of your open statements and neither of your closes. E.g.

    ... my $inputFile = q{C:/INPUTFILE}; open my $inputFH, q{<}, $inputFile or die qq{open: < $inputFile: $!\n}; my $outputFile = q{C:/OUTPUT.TXT}; open my $outputFH, q{>}, $outputFile or die qq{open: > $outputFile: $!\n}; ... while( <$inputFH> ) ... print $outputFH ... ... close $inputFH or die qq{close: < $inputFile: $!\n}; close $outputFH or die qq{close: > $outputFile: $!\n};

    Thirdly, you should consider separating the skipping of the first seven records from reading the rest of the records so as to avoid testing whether to skip for every read, and do you actually use $rcount for anything other than the record skipping?. I'm not sure of the logic you use when incrementing $rcount so below I show the way I'd tackle it.

    ... my $rcount; { local $/ = \1173; my $discard = <$inputFH> for 1 .. 7; $rcount = 7; # We've seen and skipped 7 records so far. while( <$inputFH> ) { $rcount ++; my @fields = unpack q{A34 ... }, $_; ... } } ... # Do something with $rcount here? ...

    I hope these points are of interest.

    Cheers,

    JohnGG

      Thanks, JohnGG. Your suggestions made a huge improvement to the script. Moreover, you showed me several better techniques I can utilize. When I first started writing it, I knew it could be way better; hence the posting. I think I am going to take a breather now and work on generating the file to excel. Since you skipped the set of rows, I can leave out $rcount. Below is how it looks now. Thanks, Dave
      use strict; use warnings; use Win32::OLE; my $ex = Win32::OLE->new('Excel.Application') or die "oops\n"; my $book = $ex->Workbooks->Add; my $sheet = $book->Worksheets(1); my $inputFile = q{c:/INPUT.TXT}; open my $inputFH, q{<}, $inputFile or die qq{open: < $inputFile: $!\n} +; my $outputFile = q{C:/OUTPUT.TXT}; open my $outputFH, q{>}, $outputFile or die qq{open: > $outputFile: $! +\n}; { local $/ = \1173; my $discard = <$inputFH> for 1 .. 7; while (<$inputFH>) { $rcount ++; $sheet->Range("A1:CD100")->{value} = unpack("A34 A30 A40 A40 +A40 A40 A40 A2 A9 A9 A6 A40 A34 A3 A8 A1 A1 A10 A20 A10 A34 A15 A15 A15 A8 A8 A8 A8 A3 A3 A +40 A10 A7 A20 A7 A15 A5 A8 A20 A15 A15 A15 A15 A15 A15 A1 A4 +0 A5 A8 A1 A3 A3 A3 A20 A10 A1 A15 A15 AA50 A1 A8 A1 +0 A1 A7 A3 A7 A1 A1 A15 A8 A15 A3 A15 A8 A20 A6 A6 A +20 A15 A15 A15 A40 A15", $_); } } close $inputFH; close $outputFH; $book->SaveAs('C:\test.xls'); undef $book; undef $ex;