karey3341 has asked for the wisdom of the Perl Monks concerning the following question:


If I have two input files:

file1:

0: 0,11 5,6 11,2

1: 1,3

3: 0,1 2,2 3,2

5: 3,5 6,1 8,16 9,1



file2:

0: 0,10 4,19

2: 1,3 2,5 6,4

5: 6,10 9,3



How to merge those two files to get the output below:

output file:

0: 0,21 4,19 5,6 11,2

1: 1,3

2: 1,3 2,5 6,4

3: 0,1 2,2 3,2

5: 3,5 6,11 8,16 9,4





I have following code to read input file:

while (<FILE1>)

{next unless s/^(.*?):\s*//;

$word = $1;

for $field (split)

{($site,$count) = split /,/, $field;

$file1 {$word}{$site}=$count;

}

}

Replies are listed 'Best First'.
Re: merge two files
by GrandFather (Saint) on Apr 17, 2009 at 04:01 UTC

    You are heading in the right direction. If you wrap the essence of your code in a for loop that iterates over the files you need to process and change the hash access from an assignment to a += then you've done the input part. Generating the output is then a matter of two nested loops. Consider:

    use strict; use warnings; my $file1 = <<END_FILE1; 0: 0,11 5,6 11,2 1: 1,3 3: 0,1 2,2 3,2 5: 3,5 6,1 8,16 9,1 END_FILE1 my $file2 = <<END_FILE2; 0: 0,10 4,19 2: 1,3 2,5 6,4 5: 6,10 9,3 END_FILE2 my %sites; for my $source (\$file1, \$file2) { open my $inFile, '<', $source or die "Failed to open $source: $!\n +"; while (my $line = <$inFile>) { chomp $line; next if ! ($line =~ s/^([^:]+):\s*//); my $word = $1; for my $field (split ' ', $line) { my ($site, $count) = split /,/, $field; $sites{$word}{$site} += $count; } } } for my $word (sort {$a <=> $b} keys %sites) { print "$word: "; my $wordSites = $sites{$word}; for my $site (sort {$a <=> $b} keys %$wordSites) { print "$site,$wordSites->{$site} "; } print "\n"; }

    Prints:

    0: 0,21 4,19 5,6 11,2 1: 1,3 2: 1,3 2,5 6,4 3: 0,1 2,2 3,2 5: 3,5 6,11 8,16 9,4

    There are a few things you ought do to save yourself time in the future. First off, always use strictures (use strict; use warnings;). Use the three parameter version of open (you didn't did you?) and always test the result of file opens. Use lexical file handles (open my $infile ...).

    Note the indentation and bracketing style I've used in my sample. It is pretty common in Perl circles and is probably worth adopting. There is a really good tool called Perl Tidy that is well worth getting if you are at all interested in generating consistently formatted code.


    True laziness is hard work
Re: merge two files
by graff (Chancellor) on Apr 17, 2009 at 03:46 UTC
    Please put <code> and </code> around your perl snippet and data samples.

    You have a good start on reading the first file. Now you just need to use the same hash when reading the second file, and do:

    $file1{$word}{$site} += $count;
    instead of just the straight "=" assignment. That will make sure that when the second file contains a "word" and "site" combination that was also in the first file, you'll be adding together the two "count" values.

    You should also start with use strict; and it will make sense to have your file reading logic in a subroutine that you call for each file, just to avoid repeating code unnecessarily:

    use strict; my %data; my @filenames = ...; # @ARGV or literal strings or whatever for my $fname ( @filenames ) { open my $input, "<", $fname or die "$fname: $!"; load_data( $input, \%data ); } # doing stuff with %data is left as an exercise... sub load_data { my ( $fh, $href ) = @_; while ( <$fh> ) { next unless ( s/^(.*?):\s*// ); my $word = $1; for my $field ( split ) { my ( $site, $count ) = split /,/, $field; $$href{$word}{$site} += $count; } } }
    (not tested)
Re: merge two files
by Utilitarian (Vicar) on Apr 17, 2009 at 09:26 UTC
    Update, should take my own advice, misread your intention entirely. I didn't realise the "y co-ordinate" needed to be summed

    Look at the structure of your data. You want to sort a list of numbers where each list is associated with a unique key What do you think is the data structure you need to put the data into?
    Something like:
    %records{$number_before_colon => @list_of_numbers_seperated_by_commas}
    So read each file splitting the result into an index and an array and add the array to a hash on the index.

    while(<INFILE>){ chomp; ($index,@record)=split/[:,]/; push @{$records{$index}},@record; }
    When you need to do data conversion, half the battle is examining the relationship between the data structures. Now you can access the data in a sorted order:
    for $index (sort{$a<=>$b}(keys (%records))){ print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$ +index}}))),"\n"; }
    Wrapping it up
    #!/usr/bin/perl use strict; use warnings; my @files=qw(temp.data temp1.data temp2.data); my ($file,$index, %records,$INFILE,$OUTFILE); for $file (@files){ open($INFILE,"<","$file")|| die "Failed to open $file: $!"; while(<$INFILE>){ my @record; chomp; ($index,@record)=split/[:,]/; push @{$records{$index}},@record; } close $INFILE; } open ($OUTFILE,">","newfile.data")|| die "Failed to open newfile.data: + $!"; for $index (sort{$a<=>$b}(keys (%records))){ print $OUTFILE "$index: ",join(",",(sort{$a<=>$b}(@{$records{$ +index}}))),"\n"; } close $OUTFILE