Thanks pfaut,

The input file contents looks like this:-
Ref_no Supp_co Order_ Cat_num Carr_co Unit_pri Line_pric Ca User_ +Auth_dat B 0312003620 SLUM02 M0551 RT3420 4.04 8.25 8.25 P UFJGC +04/01/98 E 0312003619 SLUM02 M0550 RT3420 4.04 8.25 8.25 P UFJGC +04/01/98 E 0312003617 SLUM02 M0548 RT3420 4.04 8.25 8.25 P UFJGC +04/01/98 E 0312003616 SLUM02 M0547 RT3420 4.04 8.25 8.25 P UFJGC +04/01/98 E 0312003684 SLUM02 M0615 RT3420 4.04 11.90 11.90 P UFJGC +04/01/98 E 0312003613 SLUM02 M0544 RT3420 4.04 11.90 11.90 P UFJGC +04/01/98 E 0312003586 SLUM02 M0517 RT3420 4.04 11.90 11.90 P UFJGC +04/01/98 E
I have to check each line is a valid record rather than a header line or blank line (either of which appears a few hundred times throughout the file).

The actual record formatting is to remove the decimal places and insert leading zeroes on fields 5,6 & 7. I also have to interrogate the year in the penultimate field - the year determining which file the record is written to.

The code I have so far is:-
while ($line = <INPUT>) { chomp $line; # Check for lines to be discarded or kept if (substr($line,0,9) =~ /[0-9]{9}/) { # Lines are valid entries to be written to file if first 9 characters +are # numeric, file used dependant on date of invoice details. ($newline, $year) = validLine($line); if ($year ne "02") { open (YEAR, ">>".$path."year$year.txt") || die "Cannot open file +: $!\n"; print YEAR "$newline\n"; $y_count++; close YEAR || die "Cannot close file: $!\n"; } else { print OUTPUT "$newline\n"; $o_count++; } } else { print DISCARD "$line\n"; $d_count++; next; } }
with a subroutine, validLine(), that breaks each line using substr to remove the decimals, insert leading zeroes, get the year and re construct the line (I was using a split on spaces at this point but have had to change it as not every record has the same number of fields and the line must be reconstructed to take this into account).

Appreciate any further comments!

elbow

In reply to Re: Re: Formatting a large number of records by elbow
in thread Formatting a large number of records by elbow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.