comment on

Hello Monks, I've struggling with a script for the past few days. I have report that I need to parse into a CSV file and I have it somewhat working but could use some help making it better. The text file that I need to parse has the following format and is repeated per page:

Per CountZero's suggestion here is a mock up of the text file.

Date 08/17/11 Report Page 1
Time 12:46

Important Text: 1
Misc Text: All
Misc Text: Sec
** Indicates
APPTEXT

PLINE     SCODE    PCODE    FID    SEC    unsec     fcs
-------------------------------------------------------------------------------------------------
TEST     TT     TT00    TT00.1    NO    xxxx    TTD
TEST    TT    TT00    **TT00.2    YES    XXXXXX
TEST    TT    TT00    **TT00.3    YES    XXX
TEST    TT    TT01    TT01.1    NO    XXXXXXXXXXX    TT
TEST    TT    **TT02    TT02.1    YES    XXXXX

I need to combine "text1" with each line of the columns into a CSV record. The most recent thing I found out is that each of the column lines is variable, and there the number of white spaces in between are variable. Here is the script that I have, but I was wondering what I could do to take into variability of the lines. Also, I'm not very knowledgeable about PERL. I've put this together from skimming some books and picking up things on the internet.

The output then would be something like this:
1,TEST,TT,TT00,TT001,NO,xxxx
1,TEST,TT,TT00,**TT00.2,YEST,XXXXXX

#! /usr/bin/perl

$OutPut= '>secout.txt';
open(INFILE,'sec_rpt3.txt') or die "Can't open file.\n";
open(OUT, $OutPut) or die "Can't open output.\n";

sub rtrim($)
{
    my $string = shift;
    $string =~ s/\s+$//;
    return $string;
}

sub trim($)
{
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

sub ltrim{
  my $string = $_;
  $string =~ s/^\s*//;
  return $string;
}

while (<INFILE>)
{ 
   $ThisLine=ltrim($_);
   chomp($ThisLine);
   $LineLen=length($ThisLine);
   if (index($ThisLine,'IMPORTANT TEXT') ne -1)
       { 
           $LenSec=int($LineLen)-17; 
           $SecClass=substr($ThisLine,17,$LenSec);
       }
   if (index($ThisLine,"TEST") ne -1)
     { 
        $pline = trim(substr($ThisLine,0,16));
        $mod = trim(substr($ThisLine,18,6));
        $tok = trim(substr($ThisLine,24,10));
        $form = trim(substr($ThisLine,34,13));
        $sec = trim(substr($ThisLine,47,7));
        $unsec =substr($ThisLine,54,21);
        $secfc = substr($ThisLine,76,21);
        $rec = join(',',$SecClass,$pline,$mod,$tok,$form,$sec,$unsec,$
+secfc);
        print OUT "$rec\n";
    };
}

close(INFILE);
close(OUT);
[download]

In reply to Parsing text file to CSV by apok69

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.