Pattern Matching examples

renef has asked for the wisdom of the Perl Monks concerning the following question:

Greetings to all,

I'm an xBase programmer seeking enlightenment through Perl. I have a rather tedious data munge to deal with and I thought that this could be an opportunity to attempt to solve it using Perl. As a programmer, I understand the concepts and the variables etc., but I have yet to apply all that I have read in the Camel book. So, I thought that I would approach my task as a homework assignment. I don't want anyone to help me do it. I need directions to help me get there on my own.

I have a 3 column label format saved as a text file (10,000+ records)

last [comma] first       last [comma] first       last [comma] first
address1                 address1                 address1
address2                 address2                 address2
city st [rt justify]zip  city st [rt justify]zip  city st [rt justify]
+ zip
[download]

(I hope that this mess looks ok when you see it.)

The whole point is that I need to create a *single column* listing from this 3 column list. The only constants that I have to work with on this 3 column report are:

The comma that can be used to id the name line
consistent spacing between labels/columns
the zipcode is in a specific position throughout the report
The only problem is address2 that is not consistent and adds a line if there is data in that field

As an xBase program, this is already licked. I would *rather* have any of you send me to some relevant Pattern Matching Tutorials and Loops tutorials that could help me unravel this on a nice, quiet Sunday morning with no one home but me and the humming of my computers (and Barry White on the CD player)

Thanks for your help.

-Rene Ferrer

Comment on Pattern Matching examples Download Code

Replies are listed 'Best First'.
Re: Pattern Matching examples by delirium (Chaplain) on Nov 23, 2003 at 16:54 UTC
The unpack function and hashes are my first thoughts. You can use the consistent column spacing to your advantage by unpacking each line into an array. For example, if each column is 24 characters: `@array = unpack('A24A24A24', $line);` [download] ...will return a three element array out of $line. Assuming the "last,first" names are unique, those could be the keys of your hash: `@names = unpack('A24A24A24', $line); @address1 = unpack('A24A24A24', $nextline); for (0..2) { $hash{$names[$_]}{address1} = $address1[$_]; }` [download] That should get you started.	[reply] [d/l] [select]
Re: Re: Pattern Matching examples by CountZero (Bishop) on Nov 23, 2003 at 19:36 UTC
But putting your data into a hash will destroy any sequence which your original data might have had and which you perhaps like to preserve when going from three columns to one column. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Re: Pattern Matching examples by Anonymous Monk on Nov 24, 2003 at 14:31 UTC
Yes, Thank you so much. I can work with this. I'll get back to you on the progress. Someone else replied that they want to see the xBase code. I'll try to copy it into a reply a bit later. I'm running to work... #!/usr/bin/ignition...	[reply]
Re: Re: Pattern Matching examples by Anonymous Monk on Nov 25, 2003 at 05:33 UTC
I'm working on this and I've decided to stop trying to do the xBase part. The reasons are too numerous to list. It's just not getting me anywhere. I'm going to solve this with Perl and some brainpower.I transferred all files to my Linux laptop and turned off my other operating system I have better tools now, I have underestimated the text processing power I have. Anyways, here's my diagram that I've printed out for myself with the fixed positions that I will be using. 1234567890123456789012345678901234567890123456789012345678901234567890 +1234567890123456789012345678901234567890 \| \| \| \| \| \| \| + \| \| \| \| aaaaaaaaaaaaaaaaa, aaaaaaaaaa aaaaaaaaaaaaa, aaaaaaaaaaaaaa + aaaaaaaaaaaa, aaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbb + bbbbbbbbbbbbbbbbbbbbbbbbbbbbb ccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccc + ccccccccccccccccccccccccccccc ddddddddddd ee 99999-9999 dddddddd ee 99999-9999 + dddddddddd ee 99999-9999 \| \| \| \| + \| \| 2 21 39 58 + 75 94 [download] Obviously you need to see the text without word-wrap, but you get the gist of it. The cool thing is that address1(b) and address2(c) can be grabbed as a fixed-length string and then chomped(?) or TRIMed in xBase. The lastname(a) and firstname(a) fields need to be switched for the purposes of my new output needs. The city(d) and state(e) can also be grabbed as a string of fixed length because I'm not interested in sorting them here. Then I can jump directly to the zipcode position and grab 10 characters. The issue with the intermittent 2nd address will have to wait. The are no spaces between the labels. It's just one messy report. Step 1: Figure out how to handle each line Step 2: Learn to play with the array(s) Step 3: Start picking out data from each line. Then I'll do the loop and the output text file. Be back in a few hours (I'd better make more coffee) RF	[reply] [d/l]
Re: Pattern Matching examples by duff (Parson) on Nov 23, 2003 at 17:44 UTC
There's perlretut and perlrequick in the standard perl distribution these days; they might help. But also, you should look into unpack() and substr() if bits of your records can be nailed to specific locations on a line. PerlJam	[reply]
Re: Pattern Matching examples by CountZero (Bishop) on Nov 23, 2003 at 19:33 UTC
There is not much pattern matching to do in this task: only check whether the line just read-in contains at least one comma (assuming that only the first line of each record contains commas) which will signal the "lastname, firstname" line. It is probable better/faster done with `index` than with a regex. As you already have it written in xBase, I will not go into the logic of the program itself. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply] [d/l]
Re: Pattern Matching examples by ysth (Canon) on Nov 23, 2003 at 19:00 UTC
Seems like you have two basic tasks: read a row of records split up the columns into three individual records The latter has been discussed. For the former, if you have blank lines only in between rows, look up what happens when `$/ = ""` (paragraph mode) in perldoc perlvar.	[reply] [d/l]
Re: Pattern Matching examples by jdporter (Paladin) on Nov 24, 2003 at 00:26 UTC
Just out of curiosity -- what does the xBase code look like? jdporter The 6th Rule of Perl Club is -- There is no Rule #6.	[reply]
Re: Re: Pattern Matching examples by Anonymous Monk on Nov 24, 2003 at 14:28 UTC
Hey! Yeah this is great. Thanks for such quick replies. I'll send over the xBase code as soon as I can. I'm looking up these other things as well. Perl is very cool. So far I'm picking things up quickly. I'll be back in a bit.	[reply]