I have a set of data that I need to parse into a csv file. The data looks like the following:
06 01720168-00000000257980 123 S Somewhere HWY 192 172016-8 Company NATURAL GAS CO., INC. Business P O BOX 1547 123 Road Dr. Town ST 12345 SUITE# 1234 Town, ST 12345 6/23/2014 $257.98 Business 6/23/2014 123 S Road HWY 123 172016-8 6/09/2014 Town ST 12345 $257.98 02CS 4/30 5/28 3117.0 3259.0 142.0 Meter # C204508 142.0 232.99 Pipe Replacement Pgm SNR Comm 3.27 RESEARCH & DEVELOPMENT TARIFF .03 3.00% Rate Increase County Co Sc Tax on 236.29 7.09 6.00% State Tax on 243.38 14.60 Current Charges 257.98 Previous Amount Due 351.60 Payment Received 5/22 351.60CR Total Amount Due 257.98 1-877-123-4567 66.0 28 142.0 8:00am to 4:00pm 58.6 30 203.0 70.3 28 174.0

The file has one record per 58 lines. I can handle the fine parsing that will need to happen on the lines to pull out the variables but what I am having trouble wrapping my head around is a method for grabbing 58 lines at a time and then performing the necessary processes on each iteration. For example once I have the 58 lines read in I know that line 8-11 from position 1-39 contain the return address that I will be putting as "return1","return2","return3","return4" in the CSV file. This is going to happen on every record as well as each of the other pieces I need to parse.

I thought about just using a counter and resetting it after every 58 lines while looping through the entire file but that didn't seem like it would be the best solution. As I'm by no means an expert at perl I wanted to check here with you guys to see if anyone has a better place to start or some ideas on how to make this a bit more clean and efficient.

If you need any other information please let me know.

UPDATE: I found a control character other than newline in the data at the beginning of each record. I was able to use local $/ = "\014"; (was looking for newlines or double newlines and not this character) to pull the data into a variable one record at a time. I then split the data using my @lines = split /\n/, $record; into an array with one line per.

So now I believe I can pass the array to a subroutine, perform the checks and changes I need to on each of the lines, write the line out to a csv file and then move to the next record.


In reply to Parsing a Formatted Text File by Pharazon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.