Re^2: Hash w/ multiple values + merging

The following code assumes that a header line is provided for each data set as shown in the sample data. A hash is built containing the merged data from both files then only those records containing data for all columns is printed.

#!/usr/bin/perl
use strict;
use warnings;

my $data1 = <<DATA1;
trip    value1    value2
ATG    adsad    dsf
CTG    23432    2342
TTA    24312    144
CTT    452    5fw
ATA    rff    sgsh
DATA1
my $data2 = <<DATA2;
trip    value3
ATG    asdas
CCG    asdadd
TTA    24
CAT    45
DATA2

my %data;
my @columnNames;

open my $in, '<', \$data1;
push @columnNames, parseFile (\%data, $in);
close $in;

open $in, '<', \$data2;
push @columnNames, parseFile (\%data, $in);
close $in;

my $format = (('%-9s ') x (@columnNames + 1)) . "\n";

printf $format, '', @columnNames;

for my $key (sort keys %data) {
    next if keys %{$data{$key}} != @columnNames;
    
    printf $format, $key, @{$data{$key}}{@columnNames};
}


sub parseFile {
    my ($dataRef, $inFile) = @_;
    my $header = <$inFile>;
    my ($keyColumn, @columns) = map {chomp; split} $header;

    while (defined (my $line = <$inFile>)) {
            chomp $line;
    
        my ($key, @data) = split /\s+/, $line;
        @{$dataRef->{$key}}{@columns} = @data;
    }
    
    return @columns;
}
[download]

Prints:

          value1    value2    value3    
ATG       adsad     dsf       asdas     
TTA       24312     144       24
[download]

Note that strictures are used. Always use strictures (use strict; use warnings;). The three parameter version of open is used with lexical file handles.

@{$data{$key}}{@columnNames} and @{$dataRef->{$key}}{@columns} are hash slices - they access a list of hash values. The first case returns the list of values to be printed for a row. The second case is used to assign the list of column values to a record.

Note that parseFile doesn't check to see that data column names for the current file are different than any previous file nor that the key column name (assumed to be the first) is the same. Those are all things that can be fixed if you need them to be.

True laziness is hard work

Comment on Re^2: Hash w/ multiple values + merging Select or Download Code

Replies are listed 'Best First'.
Re^3: Hash w/ multiple values + merging by sophix (Sexton) on Feb 08, 2010 at 00:03 UTC
Thank you very much for this script! I would like to ask for some possible modifications. - I am not familiar with the open structure in this script. I would like to convert it to a familiar one (open(FILE1, "$ARGV[0]") etc.) but I could not do it. I tried the following: `my $data1 = $ARGV[0]; my $data2 = $ARGV[1]; my $data3 = $ARGV[2];` [download] - Second, I failed at printing out once again. I used this one: `print Data::Dumper->Dump([\%data],['MERGED HASH']),"\n";` How can I print the merged hash into an output file? I though of, again, the familiar structure, but it did not work. `open(FILE3, ">$ARGV[2]"); {print Data::Dumper->Dump([\%data],['MERGED HASH']),"\n";}` [download] Is it the reference again?	[reply] [d/l] [select]
Re^4: Hash w/ multiple values + merging by GrandFather (Saint) on Feb 08, 2010 at 00:18 UTC
To avoid using "real" files for demonstration purposes I used Perl's facility for using a string as a file by passing a reference to the string into the open. To open a real file instead you should: `open my $in, '<', $fileName or die "Unable to open $filename: $!";` [download] Please follow my advice and use strictures (use strict; use warnings; - see The strictures, according to Seuss), the three parameter form of open and lexical file handles (the 'my $in' bit in my sample code). These tips will save you time in the future! If you open an output file (`$out`) before the loop in my sample code you can change the print to: `printf $out $format, $key, @{$data{$key}}{@columnNames};` [download] to print to the output file instead of STDOUT. Note that for testing and sample code using STDOUT is often much more convenient! True laziness is hard work	[reply] [d/l] [select]
Re^5: Hash w/ multiple values + merging by sophix (Sexton) on Feb 08, 2010 at 01:08 UTC
My mistake. It now works beautifully. May I ask for a last favor, though? I want to keep the header -- first line in the first file. I looked at the code to see if I can find where to skip reading the first line, but I could not figure out. So the working code written by GrandFather: #!/usr/bin/perl use strict; use warnings; my $data1 = "/PRBB/input.txt"; my $data2 = "/PRBB/input2.txt"; my $data3 = "/PRBB/output.txt"; my %data; my @columnNames; #open my $in, '<', \$data1; open my $in, '<', $data1 or die "Unable to open $data1: $!"; push @columnNames, parseFile (\%data, $in); close $in; #open $in, '<', \$data2; open my $in2, '<', $data2 or die "Unable to open $data2: $!"; push @columnNames, parseFile (\%data, $in2); close $in2; my $format = (('%-9s ') x (@columnNames + 1)) . "\n"; open my $out, '>', $data3 or die "Unable to open $data3: $!"; for my $key (sort keys %data) { next if keys %{$data{$key}} != @columnNames; printf $out $format, $key, @{$data{$key}}{@columnNames}; } sub parseFile { my ($dataRef, $inFile) = @_; my $header = <$inFile>; my ($keyColumn, @columns) = map {chomp; split} $header; while (defined (my $line = <$inFile>)) { chomp $line; my ($key, @data) = split /\s+/, $line; @{$dataRef->{$key}}{@columns} = @data; } return @columns; } [download]	[reply] [d/l]
Re^6: Hash w/ multiple values + merging by GrandFather (Saint) on Feb 08, 2010 at 01:17 UTC
Re^7: Hash w/ multiple values + merging by sophix (Sexton) on Feb 08, 2010 at 01:40 UTC
Re^7: Hash w/ multiple values + merging by sophix (Sexton) on Feb 08, 2010 at 02:52 UTC
Re^5: Hash w/ multiple values + merging by sophix (Sexton) on Feb 08, 2010 at 00:41 UTC
Thanks a lot, GrandFather. Now I get the following errors while it does not print out anything to the file. "my" variable $in masks earlier declaration in same scope (line 19) use of uninitialized value $key in hash element (line26) use of uninitialized value $key in printf (line26) #!/usr/bin/perl use strict; use warnings; my $data1 = "/DATA/input.txt"; my $data2 = "/DATA/input2.txt"; my $data3 = "/DATA/output.txt"; my %data; my @columnNames; my $key; [line 19]open my $in, '<', $data1 or die "Unable to open $data1: $!"; push @columnNames, parseFile (\%data, $in); close $in; open my $in, '<', $data2 or die "Unable to open $data2: $!"; push @columnNames, parseFile (\%data, $in); close $in; my $format = (('%-9s ') x (@columnNames + 1)) . "\n"; open my $out, '>', $data3 or die "Unable to open $data3: $!"; (line26)printf $out $format, $key, @{$data{$key}}{@columnNames}; for my $key (sort keys %data) { next if keys %{$data{$key}} != @columnNames; printf $format, $key, @{$data{$key}}{@columnNames}; } sub parseFile { my ($dataRef, $inFile) = @_; my $header = <$inFile>; my ($keyColumn, @columns) = map {chomp; split} $header; while (defined (my $line = <$inFile>)) { chomp $line; my ($key, @data) = split /\s+/, $line; @{$dataRef->{$key}}{@columns} = @data; } return @columns; } [download] It prints out the output on dos-screen, though. (but not into file)	[reply] [d/l]