Limo has asked for the wisdom of the Perl Monks concerning the following question:

Following up on my earlier post Merging Files: A Different Twist...

I have a program which takes a structured format file and reports back data contained in specified fields.

eg: exfields.pl file.gz field1,field2,field3
which produces the following output:
steve 37 ma jeff 35 ca ben 30 tn
I am trying to write a program which will merge 2 such files. file_1.gz and file_2.gz each contain different data associated with "steve", "jeff", ben", and I want to merge the output of both files, and display the end result to the user. output of:  exfields.pl file_1.gz field1,field2:
steve 37 ma jeff 35 ca ben 30 tn
output of: exfields.pl file_2.gz  field1:
steve guitar jeff bass ben drums
after both files are merged needs to become:
steve 37 ma guitar jeff 35 ca bass ben 30 tn drums
The first approach I considered (yes, I'm very new to Perl) with was creating 2 separate arrays and then joining them, only to realize that this would not accomplish my task. Tried all day, until now to:
merge.pl file1 field,field file2 field
which extrapolates:
file1 field,field
pass it through a subroutine that would produce a hash containing:
%hash_1 = steve => '37' AND 'ma' jeff => '35' AND 'ca' ben => '30' AND 'tn'
then extrapolates:
file2 field
pass it through the same subroutine, which would produce a hash containing:
%hash_2 = steve => 'guitar' jeff => 'bass' ben => 'drums'
oh yeah, I am NOT trying to code this program as it appears; rather I am only trying to demonstrate what I am tring to do. I have the extrapolations that I referred to already coded. I also know how to produce %hash_2. My question is how could I code the subroutine to do what I am trying to do? Before that consideration, am I correct in thinking that the above examples of hashes will produce my desired result? I have read several references to "hashes with keys that contain multiple values", and "references to arrays", but I can't seem to use the examples effectively. Please help, while I still have hair on my head. .

Replies are listed 'Best First'.
RE: Merging 2 Formatted Files
by moen (Hermit) on Sep 20, 2000 at 12:39 UTC
    This can be done with something like this:
    foreach (@first_file) { #get data from first file /\b(\w+)\b(.*)/; #split the first record, by name and data ($key, $data) = ($1, $2); #put thoose into scalars foreach (@second_file) { #next file foreach line in first file /\b(\w+)\b(.*)/; #run through same regex and get keys again if ($1 eq $key) { #compare keys from both files $merge{$1} = "$data$2"; #if they match, merge into hash } } }
    It will merge thoose enteries that match, else it will discard the data. I'm kinda new to perl, so bear with me :o)
Re: Merging 2 Formatted Files
by fundflow (Chaplain) on Sep 20, 2000 at 17:04 UTC
    If the files are not big, you can do this (untested)
    while(<FIRSTFILE>) { ($key, $rest) = split; $list{$key}= $rest; } while(<SECONDFILE>) { ($key, $rest) = split; $list{$key}.= " " .$rest; } for(sort keys %list) { print "$_\n"; }
    This will give you all the info on each id in one hash line.

    If the files are big, you can sort them (using unix' sort command) and then merge them (which is trivial to code). On a second pass you just pick all consecutive lines which have the same ID.

Re: Merging 2 Formatted Files
by chromatic (Archbishop) on Sep 20, 2000 at 22:26 UTC
    This sort of operation is what relational databases are good at. Conceptually, you would have two tables. The first one contains names, ages, and state abbreviations. The second one contains names and instrument names.

    To get all of the data, you'd do a SQL JOIN command on the name fields. It might end up something like 'SELECT * FROM location, instrument JOIN ON location.name = instrument.name'. (Untested and off the top of my head.)

    If you'll be doing this sort of thing often, I would highly recommend a relational database, like MySQL or PostgreSQL. If you're looking for a way of doing the same thing without a real database, DBD::RAM is a pretty cool module that'll simulate it for you.

    That said, fundflow's example is an easy simple hack (in a good way), that will do only what you've described and nothing more.

    I wouldn't bother trying to hack in more complexity, though, as this is a case where there's an excellent, flexible, already-coded solution. (Besides that, learning a bit about SQL and DBI will make you a more valuable programmer.)

Re: Merging 2 Formatted Files
by Limo (Scribe) on Sep 20, 2000 at 22:52 UTC
    That's what I wanted to do in the first place, but our trusty sys admin, can't seem to find time to re-install a broken compiler, which DBI requires! Thanks for the help! Much appreciated.