renef has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all,

I'm an xBase programmer seeking enlightenment through Perl. I have a rather tedious data munge to deal with and I thought that this could be an opportunity to attempt to solve it using Perl. As a programmer, I understand the concepts and the variables etc., but I have yet to apply all that I have read in the Camel book. So, I thought that I would approach my task as a homework assignment. I don't want anyone to help me do it. I need directions to help me get there on my own.

I have a 3 column label format saved as a text file (10,000+ records)

last [comma] first last [comma] first last [comma] first address1 address1 address1 address2 address2 address2 city st [rt justify]zip city st [rt justify]zip city st [rt justify] + zip

(I hope that this mess looks ok when you see it.)

The whole point is that I need to create a *single column* listing from this 3 column list. The only constants that I have to work with on this 3 column report are:

The comma that can be used to id the name line
consistent spacing between labels/columns
the zipcode is in a specific position throughout the report
The only problem is address2 that is not consistent and adds a line if there is data in that field

As an xBase program, this is already licked. I would *rather* have any of you send me to some relevant Pattern Matching Tutorials and Loops tutorials that could help me unravel this on a nice, quiet Sunday morning with no one home but me and the humming of my computers (and Barry White on the CD player)

Thanks for your help.


-Rene Ferrer

Replies are listed 'Best First'.
Re: Pattern Matching examples
by delirium (Chaplain) on Nov 23, 2003 at 16:54 UTC
    The unpack function and hashes are my first thoughts. You can use the consistent column spacing to your advantage by unpacking each line into an array. For example, if each column is 24 characters:

    @array = unpack('A24A24A24', $line);

    ...will return a three element array out of $line. Assuming the "last,first" names are unique, those could be the keys of your hash:

    @names = unpack('A24A24A24', $line); @address1 = unpack('A24A24A24', $nextline); for (0..2) { $hash{$names[$_]}{address1} = $address1[$_]; }

    That should get you started.

      But putting your data into a hash will destroy any sequence which your original data might have had and which you perhaps like to preserve when going from three columns to one column.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Yes, Thank you so much. I can work with this. I'll get back to you on the progress. Someone else replied that they want to see the xBase code. I'll try to copy it into a reply a bit later. I'm running to work... #!/usr/bin/ignition...
      I'm working on this and I've decided to stop trying to do the xBase part. The reasons are too numerous to list. It's just not getting me anywhere. I'm going to solve this with Perl and some brainpower.I transferred all files to my Linux laptop and turned off my *other* operating system

      I have better tools now, I have underestimated the text processing power I have. Anyways, here's my diagram that I've printed out for myself with the fixed positions that I will be using.

      1234567890123456789012345678901234567890123456789012345678901234567890 +1234567890123456789012345678901234567890 | | | | | | | + | | | | aaaaaaaaaaaaaaaaa, aaaaaaaaaa aaaaaaaaaaaaa, aaaaaaaaaaaaaa + aaaaaaaaaaaa, aaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbb + bbbbbbbbbbbbbbbbbbbbbbbbbbbbb ccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccc + ccccccccccccccccccccccccccccc ddddddddddd ee 99999-9999 dddddddd ee 99999-9999 + dddddddddd ee 99999-9999 | | | | + | | 2 21 39 58 + 75 94

      Obviously you need to see the text without word-wrap, but you get the gist of it. The cool thing is that address1(b) and address2(c) can be grabbed as a fixed-length string and then chomped(?) or TRIMed in xBase.

      The lastname(a) and firstname(a) fields need to be switched for the purposes of my new output needs.

      The city(d) and state(e) can also be grabbed as a string of fixed length because I'm not interested in sorting them here.

      Then I can jump directly to the zipcode position and grab 10 characters.

      The issue with the intermittent 2nd address will have to wait.

      The are *no* spaces between the labels. It's just one messy report.

      Step 1: Figure out how to handle each line
      Step 2: Learn to play with the array(s)
      Step 3: Start picking out data from each line.

      Then I'll do the loop and the output text file.

      Be back in a few hours (I'd better make more coffee)

      RF

Re: Pattern Matching examples
by duff (Parson) on Nov 23, 2003 at 17:44 UTC

    There's perlretut and perlrequick in the standard perl distribution these days; they might help. But also, you should look into unpack() and substr() if bits of your records can be nailed to specific locations on a line.

Re: Pattern Matching examples
by CountZero (Bishop) on Nov 23, 2003 at 19:33 UTC
    There is not much pattern matching to do in this task: only check whether the line just read-in contains at least one comma (assuming that only the first line of each record contains commas) which will signal the "lastname, firstname" line. It is probable better/faster done with index than with a regex.

    As you already have it written in xBase, I will not go into the logic of the program itself.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Pattern Matching examples
by ysth (Canon) on Nov 23, 2003 at 19:00 UTC
    Seems like you have two basic tasks:
    1. read a row of records
    2. split up the columns into three individual records
    The latter has been discussed. For the former, if you have blank lines only in between rows, look up what happens when $/ = "" (paragraph mode) in perldoc perlvar.
Re: Pattern Matching examples
by jdporter (Paladin) on Nov 24, 2003 at 00:26 UTC
    Just out of curiosity -- what does the xBase code look like?

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      Hey! Yeah this is great. Thanks for such quick replies. I'll send over the xBase code as soon as I can. I'm looking up these other things as well.

      Perl is very cool. So far I'm picking things up quickly.

      I'll be back in a bit.