http://qs1969.pair.com?node_id=949303

koolgirl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Fellow Monks/Monkettes,

I have been working on a county site scraping project for about three months, and I'm exasperated in every way possible. I'm finally at the end, and almost there, but am seriously stuck in the middle of pulling the data into the .csv files. I've been "in the basement" the whole time on this one, I don't think anyone's seen me in about 4 months oO, and to say I have code blindness would be the understatement of the year.

So, here's my problem; I'm collecting information about thousands of properties, each property, has a parcel number, this is the distinguishing unique number I use for most of my code. Each parcel number, in turn, has a document history (title abstract, deed history) I have to collect. Each document in that history, has it's own unique document number, however, as I began to collect the document information, with each individual document number, I see there have been some data entry mistakes, and each of those document numbers, within their information, have no record of what parcel number they belong to. So now I have a file of all the info I need, each line representing a line of information for each document, and no way to tell which parcel number to which they are related.

I do have a file, listing each parcel number, and all the document numbers which belong to it, so now the only way I can think of to resolve the issue, is to compare the files, matching the list of doc numbers in one file, to it's corresponding info in the other, then putting both sets of info together in a new file, this one including the parcel number. I can not figure out how to do this. I read all through the lama, camel and the cook book (what is that a mountain goat?oO), and the only thing similar to my situation, was the technique of using Tie::File, to basically enable file handles to be operated on as an array, but that doesn't really cut it, and the comparing two files routine, but I need to match them against each other and operate from there, not just compare.

I haven't slept in about 3 or 4 days, so please don't kill me as I suspect the answer is very simple and basic *ducks to avoid flying keyboards*. I'm too tired to think, or apparently write code that makes sense, but my head will be chopped off, seriously, if I can't finish this by morning.

Does anyone have any ideas for solving this problem? Not looking for someone to do it for me, really, just a push in the right direction, I'm spinning my wheels here. I've tried attempting to tie the parcel and doc numbers together in the initial collection, which would be cumbersome anyhow, because it isn't even stripped at that point, but comparing the two files, then combining the data into a new output file seems to be the only way to go.

Bottom Line Coming Up In 3, 2, 1....

Basically, I need to keep track of the parcel number and each document number tied to it, and each set of data is in two separate files. I've made an example of what type of data each file holds, where it is related, and the output file I want to create by combining each set of data together with it's parcel number. I can not seem to even think of how to do that at this point.

FILE1 parcel# 12345 doc num 123 doc num 456 doc num 789 parcel# 67890 doc num 342 doc num 657 doc num 876 FILE2 doc num 342 data data data data data data data data doc num 657 data data data data data data data data doc num 876 data data data data data data data data doc num 123 data data data data data data data data doc num 456 data data data data data data data data doc num 789 data data data data data data data data
So that's an example of the structure of the data that each file holds and how it matches, and this is what I'm trying to output as a combination from matching each set up:
FILE3 parcel# 12345 doc num 123 data data data data data data data data parcel# 12345 doc num 456 data data data data data data data data parcel# 12345 doc num 789 data data data data data data data data parcel# 67890 doc num 342 data data data data data data data data parcel# 67890 doc num 657 data data data data data data data data parcel# 67890 doc num 876 data data data data data data data data

Sorry, I am a total brain dead zombie, so I hope I made sense, if anyone has any ideas, please help me, I am officially not able to compute. oO