in reply to Re: Hash w/ multiple values + merging
in thread Hash w/ multiple values + merging

The following code assumes that a header line is provided for each data set as shown in the sample data. A hash is built containing the merged data from both files then only those records containing data for all columns is printed.

#!/usr/bin/perl use strict; use warnings; my $data1 = <<DATA1; trip value1 value2 ATG adsad dsf CTG 23432 2342 TTA 24312 144 CTT 452 5fw ATA rff sgsh DATA1 my $data2 = <<DATA2; trip value3 ATG asdas CCG asdadd TTA 24 CAT 45 DATA2 my %data; my @columnNames; open my $in, '<', \$data1; push @columnNames, parseFile (\%data, $in); close $in; open $in, '<', \$data2; push @columnNames, parseFile (\%data, $in); close $in; my $format = (('%-9s ') x (@columnNames + 1)) . "\n"; printf $format, '', @columnNames; for my $key (sort keys %data) { next if keys %{$data{$key}} != @columnNames; printf $format, $key, @{$data{$key}}{@columnNames}; } sub parseFile { my ($dataRef, $inFile) = @_; my $header = <$inFile>; my ($keyColumn, @columns) = map {chomp; split} $header; while (defined (my $line = <$inFile>)) { chomp $line; my ($key, @data) = split /\s+/, $line; @{$dataRef->{$key}}{@columns} = @data; } return @columns; }

Prints:

value1 value2 value3 ATG adsad dsf asdas TTA 24312 144 24

Note that strictures are used. Always use strictures (use strict; use warnings;). The three parameter version of open is used with lexical file handles.

@{$data{$key}}{@columnNames} and @{$dataRef->{$key}}{@columns} are hash slices - they access a list of hash values. The first case returns the list of values to be printed for a row. The second case is used to assign the list of column values to a record.

Note that parseFile doesn't check to see that data column names for the current file are different than any previous file nor that the key column name (assumed to be the first) is the same. Those are all things that can be fixed if you need them to be.


True laziness is hard work

Replies are listed 'Best First'.
Re^3: Hash w/ multiple values + merging
by sophix (Sexton) on Feb 08, 2010 at 00:03 UTC
    Thank you very much for this script! I would like to ask for some possible modifications. - I am not familiar with the open structure in this script. I would like to convert it to a familiar one (open(FILE1, "$ARGV[0]") etc.) but I could not do it. I tried the following:
    my $data1 = $ARGV[0]; my $data2 = $ARGV[1]; my $data3 = $ARGV[2];
    - Second, I failed at printing out once again. I used this one: print Data::Dumper->Dump([\%data],['MERGED HASH']),"\n"; How can I print the merged hash into an output file? I though of, again, the familiar structure, but it did not work.
    open(FILE3, ">$ARGV[2]"); {print Data::Dumper->Dump([\%data],['MERGED HASH']),"\n";}
    Is it the reference again?

      To avoid using "real" files for demonstration purposes I used Perl's facility for using a string as a file by passing a reference to the string into the open. To open a real file instead you should:

      open my $in, '<', $fileName or die "Unable to open $filename: $!";

      Please follow my advice and use strictures (use strict; use warnings; - see The strictures, according to Seuss), the three parameter form of open and lexical file handles (the 'my $in' bit in my sample code). These tips will save you time in the future!

      If you open an output file ($out) before the loop in my sample code you can change the print to:

      printf $out $format, $key, @{$data{$key}}{@columnNames};

      to print to the output file instead of STDOUT. Note that for testing and sample code using STDOUT is often much more convenient!


      True laziness is hard work
        My mistake. It now works beautifully. May I ask for a last favor, though? I want to keep the header -- first line in the first file. I looked at the code to see if I can find where to skip reading the first line, but I could not figure out. So the working code written by GrandFather:
        #!/usr/bin/perl use strict; use warnings; my $data1 = "/PRBB/input.txt"; my $data2 = "/PRBB/input2.txt"; my $data3 = "/PRBB/output.txt"; my %data; my @columnNames; #open my $in, '<', \$data1; open my $in, '<', $data1 or die "Unable to open $data1: $!"; push @columnNames, parseFile (\%data, $in); close $in; #open $in, '<', \$data2; open my $in2, '<', $data2 or die "Unable to open $data2: $!"; push @columnNames, parseFile (\%data, $in2); close $in2; my $format = (('%-9s ') x (@columnNames + 1)) . "\n"; open my $out, '>', $data3 or die "Unable to open $data3: $!"; for my $key (sort keys %data) { next if keys %{$data{$key}} != @columnNames; printf $out $format, $key, @{$data{$key}}{@columnNames}; } sub parseFile { my ($dataRef, $inFile) = @_; my $header = <$inFile>; my ($keyColumn, @columns) = map {chomp; split} $header; while (defined (my $line = <$inFile>)) { chomp $line; my ($key, @data) = split /\s+/, $line; @{$dataRef->{$key}}{@columns} = @data; } return @columns; }
        Thanks a lot, GrandFather. Now I get the following errors while it does not print out anything to the file. "my" variable $in masks earlier declaration in same scope (line 19) use of uninitialized value $key in hash element (line26) use of uninitialized value $key in printf (line26)
        #!/usr/bin/perl use strict; use warnings; my $data1 = "/DATA/input.txt"; my $data2 = "/DATA/input2.txt"; my $data3 = "/DATA/output.txt"; my %data; my @columnNames; my $key; [line 19]open my $in, '<', $data1 or die "Unable to open $data1: $!"; push @columnNames, parseFile (\%data, $in); close $in; open my $in, '<', $data2 or die "Unable to open $data2: $!"; push @columnNames, parseFile (\%data, $in); close $in; my $format = (('%-9s ') x (@columnNames + 1)) . "\n"; open my $out, '>', $data3 or die "Unable to open $data3: $!"; (line26)printf $out $format, $key, @{$data{$key}}{@columnNames}; for my $key (sort keys %data) { next if keys %{$data{$key}} != @columnNames; printf $format, $key, @{$data{$key}}{@columnNames}; } sub parseFile { my ($dataRef, $inFile) = @_; my $header = <$inFile>; my ($keyColumn, @columns) = map {chomp; split} $header; while (defined (my $line = <$inFile>)) { chomp $line; my ($key, @data) = split /\s+/, $line; @{$dataRef->{$key}}{@columns} = @data; } return @columns; }
        It prints out the output on dos-screen, though. (but not into file)