file merge problem

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmongers
Can anybody assist in this issue ?
I have two input files which I merge into a third file and discard the input files
The format of the two input files was

word decimal
word decimal
word decimal
[download]

The code I have does this

open F1,"F1.dat" or die "Can't open F1.dat: $!\n";
open F2,"F2.dat" or die "Can't open F2.dat: $!\n";

   my (%hash1,%hash2);
   while (<F1>)
   {
      /(\w*) (\d*)/ ;
      $hash1{$1} = $2 ;
   }
   close F1;

   while (<F2>)
   {
      /(\w*) (\d*)/ ;
      $hash2{$1} = $2 ;
   }
   close F2;
   
   open MERGED,">merge.dat" or die "Can't open merge.dat:$!\n";
   foreach (sort keys %hash2)
   {
      if (defined($hash1{$_}))
      {
         print MERGED "$_ $hash1{$_}\n" ;
      } else {
         print MERGED "$_ $hash2{$_}\n" ;
      }
    }
    close MERGED;
[download]

This works fine. However the format of the input files has now changed so that there are extra words i.e.

word decimal word....
[download]

The first two fields of each record will always be of the format "word decimal", but there could be an indeterminent number of words after the decimal.
I'm unsure how to adopt the code to handle the writing of field3 onwards to the merged file
Any thoughts welcome

Comment on file merge problem Select or Download Code

Replies are listed 'Best First'.
Re: file merge problem by davorg (Chancellor) on Dec 09, 2005 at 12:51 UTC
`while (<F1>) { /(\w) (\d)/ ; $hash1{$1} = $2 ; }` [download] This won't solve your problem, but it's worth pointing out that you should never use $1, $2, etc without checking that the match succeeded. If the match failed then you'll get the values from the previous successful match. `while (<F1>) { if (/(\w) (\d)/) { $hash1{$1} = $2; } else { warn "Invalid input line $.: $_\n"; } }` [download] -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply] [d/l] [select]
Re^2: file merge problem by Anonymous Monk on Dec 09, 2005 at 13:17 UTC
Thanks. This is not something I'd thought of but explains some weird results I've seen	[reply]
Re: file merge problem by Samy_rio (Vicar) on Dec 09, 2005 at 12:49 UTC
Hi, If i understood your question correctly, this will help you. `open F1,">>F1.dat" or die "Can't open F1.dat: $!\n"; #Open F1 file in +Append mode open F2,"F2.dat" or die "Can't open F2.dat: $!\n"; while(<F2>) { print F1 $_; } close(F1); close(F2);` [download] (Untested) Regards, Velusamy R. eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@\|6%,53!-9@2~j';	[reply] [d/l] [select]
Re^2: file merge problem by Anonymous Monk on Dec 09, 2005 at 13:14 UTC
My code creates the new file so I don't get duplicates and also deals with orphaned records i.e. those that only live in one file but not the other Hence this doesn't help. Thanks anyway	[reply]
Re^3: file merge problem by ptum (Priest) on Dec 09, 2005 at 14:25 UTC
You might want to look again at Samy_rio's solution above -- with a little tweaking, you could address the duplicate issue. I don't think that you will have a problem with orphaned records using the proposed file concatenation. If you open the first file, step through it and create a lookup hash, then close and re-open the first file for appending and the second file for reading, you can simply check each line of the second file to see if it is already in your lookup hash before appending it to the first file. You don't tell us what impact the additional words have in terms of the logic of your program -- if they are simply extra fields that you want to bring forward, then you can simply adjust your existing regular expression from (\d) to (\d+.) to include everything to the end of the line, something like this: `if (/^(\w) (\d+.)$/) { unless (exists($lookup{$1}) && ($lookup{$1} eq $2)) { print F1, $_; } } } else { warn "Invalid input line $.: $_\n"; }` [download] If the additional words in the line have some impact on your program's logic (e.g., if you need to parse them off and possibly create new records in your resulting file) then you need to tell us. Hope that helps. :) Update: added 'exists' logic to check in lookup hash. No good deed goes unpunished. -- (attributed to) Oscar Wilde	[reply] [d/l]
Re^4: file merge problem by Anonymous Monk on Dec 09, 2005 at 14:56 UTC
Re^5: file merge problem by ptum (Priest) on Dec 09, 2005 at 17:24 UTC
Re: file merge problem by injunjoel (Priest) on Dec 09, 2005 at 22:41 UTC
Greetings, Just a thought. `#!/usr/bin/perl -w use strict; my %d = map{ /^(\w+) /; $1, $_; }grep{ chomp; /^\w+ \d+/; }map{ local @ARGV = ($_); <>; }<F.dat>; open(MRG, ">merge.dat") or die "Oops! There was a problem: $!"; print MRG, $d{$_}."\n" for(sort keys %d); close MRG;` [download] This assumes your input filenames match the pattern "F.dat" for the glob to work on. Updates Read more... (173 Bytes) BTW: we are Monks not Mongers :} -InjunJoel "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo	[reply] [d/l] [select]
Re: file merge problem by Delusional (Beadle) on Dec 09, 2005 at 13:06 UTC
Havn't tested anything, but the first thing that comes to mind would be a loop to check for $3, $4, and so on, and stuff them into the hash as needed. something along the lines of: `if ($\$loop ne $NULL) { $hash1{$\$loop} = $\$loop+1 ; }` [download] This assumes you define $loop and increment it twice. Again, I didn't test, so I may be completely wrong with my thoughts, but this might get you going in the right direction. The loop would allow for as many posibilities as Perl hashes out in `/(\w) (\d)/ ;`	[reply] [d/l] [select]
Re: file merge problem by thundergnat (Deacon) on Dec 09, 2005 at 14:52 UTC
Since you are merging the hashes, do you really need two of them? Also, it seems that the only critical part of the line is the word at the begining, so you can just split that off separately and save the rest of the line in a single variable. `use warnings; use strict; my %filehash; my @files = qw(F1.dat F2.dat); for my $filename (@files) { open my $filehandle, '<', $filename or die "Can't open $filename $ +!\n"; while (<$filehandle>) { my ( $word, $remainder ) = split ' ', $_, 2; $filehash{$word} = $remainder; } close $filehandle; } open my $merged, '>', 'merge.dat' or die "Can't open merge.dat:$!\n"; for ( sort keys %filehash ) { print $merged "$_ $filehash{$_}"; }` [download] Update: changed split parameter to ignore any initial whitespace	[reply] [d/l]