merge two files

karey3341 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: merge two files by GrandFather (Saint) on Apr 17, 2009 at 04:01 UTC
You are heading in the right direction. If you wrap the essence of your code in a for loop that iterates over the files you need to process and change the hash access from an assignment to a += then you've done the input part. Generating the output is then a matter of two nested loops. Consider: use strict; use warnings; my $file1 = <<END_FILE1; 0: 0,11 5,6 11,2 1: 1,3 3: 0,1 2,2 3,2 5: 3,5 6,1 8,16 9,1 END_FILE1 my $file2 = <<END_FILE2; 0: 0,10 4,19 2: 1,3 2,5 6,4 5: 6,10 9,3 END_FILE2 my %sites; for my $source (\$file1, \$file2) { open my $inFile, '<', $source or die "Failed to open $source: $!\n +"; while (my $line = <$inFile>) { chomp $line; next if ! ($line =~ s/^([^:]+):\s//); my $word = $1; for my $field (split ' ', $line) { my ($site, $count) = split /,/, $field; $sites{$word}{$site} += $count; } } } for my $word (sort {$a <=> $b} keys %sites) { print "$word: "; my $wordSites = $sites{$word}; for my $site (sort {$a <=> $b} keys %$wordSites) { print "$site,$wordSites->{$site} "; } print "\n"; } [download] Prints: `0: 0,21 4,19 5,6 11,2 1: 1,3 2: 1,3 2,5 6,4 3: 0,1 2,2 3,2 5: 3,5 6,11 8,16 9,4` [download] There are a few things you ought do to save yourself time in the future. First off, always use strictures (use strict; use warnings;). Use the three parameter version of open (you didn't did you?) and always test the result of file opens. Use lexical file handles (open my $infile* ...). Note the indentation and bracketing style I've used in my sample. It is pretty common in Perl circles and is probably worth adopting. There is a really good tool called Perl Tidy that is well worth getting if you are at all interested in generating consistently formatted code. True laziness is hard work	[reply] [d/l] [select]
Re: merge two files by graff (Chancellor) on Apr 17, 2009 at 03:46 UTC
Please put <code> and </code> around your perl snippet and data samples. You have a good start on reading the first file. Now you just need to use the same hash when reading the second file, and do: `$file1{$word}{$site} += $count;` [download] instead of just the straight "=" assignment. That will make sure that when the second file contains a "word" and "site" combination that was also in the first file, you'll be adding together the two "count" values. You should also start with `use strict;` and it will make sense to have your file reading logic in a subroutine that you call for each file, just to avoid repeating code unnecessarily: `use strict; my %data; my @filenames = ...; # @ARGV or literal strings or whatever for my $fname ( @filenames ) { open my $input, "<", $fname or die "$fname: $!"; load_data( $input, \%data ); } # doing stuff with %data is left as an exercise... sub load_data { my ( $fh, $href ) = @_; while ( <$fh> ) { next unless ( s/^(.?):\s// ); my $word = $1; for my $field ( split ) { my ( $site, $count ) = split /,/, $field; $$href{$word}{$site} += $count; } } }` [download] (not tested)	[reply] [d/l] [select]
Re: merge two files by Utilitarian (Vicar) on Apr 17, 2009 at 09:26 UTC
Update, should take my own advice, misread your intention entirely. I didn't realise the "y co-ordinate" needed to be summed Look at the structure of your data. You want to sort a list of numbers where each list is associated with a unique key What do you think is the data structure you need to put the data into? Something like: `%records{$number_before_colon => @list_of_numbers_seperated_by_commas}` So read each file splitting the result into an index and an array and add the array to a hash on the index. `while(<INFILE>){ chomp; ($index,@record)=split/[:,]/; push @{$records{$index}},@record; }` [download] When you need to do data conversion, half the battle is examining the relationship between the data structures. Now you can access the data in a sorted order: `for $index (sort{$a<=>$b}(keys (%records))){ print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$ +index}}))),"\n"; }` [download] Wrapping it up #!/usr/bin/perl use strict; use warnings; my @files=qw(temp.data temp1.data temp2.data); my ($file,$index, %records,$INFILE,$OUTFILE); for $file (@files){ open($INFILE,"<","$file")\|\| die "Failed to open $file: $!"; while(<$INFILE>){ my @record; chomp; ($index,@record)=split/[:,]/; push @{$records{$index}},@record; } close $INFILE; } open ($OUTFILE,">","newfile.data")\|\| die "Failed to open newfile.data: + $!"; for $index (sort{$a<=>$b}(keys (%records))){ print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$ +index}}))),"\n"; } close $OUTFILE [download]	[reply] [d/l] [select]