comment on

The following code assumes that a header line is provided for each data set as shown in the sample data. A hash is built containing the merged data from both files then only those records containing data for all columns is printed.

#!/usr/bin/perl
use strict;
use warnings;

my $data1 = <<DATA1;
trip    value1    value2
ATG    adsad    dsf
CTG    23432    2342
TTA    24312    144
CTT    452    5fw
ATA    rff    sgsh
DATA1
my $data2 = <<DATA2;
trip    value3
ATG    asdas
CCG    asdadd
TTA    24
CAT    45
DATA2

my %data;
my @columnNames;

open my $in, '<', \$data1;
push @columnNames, parseFile (\%data, $in);
close $in;

open $in, '<', \$data2;
push @columnNames, parseFile (\%data, $in);
close $in;

my $format = (('%-9s ') x (@columnNames + 1)) . "\n";

printf $format, '', @columnNames;

for my $key (sort keys %data) {
    next if keys %{$data{$key}} != @columnNames;
    
    printf $format, $key, @{$data{$key}}{@columnNames};
}


sub parseFile {
    my ($dataRef, $inFile) = @_;
    my $header = <$inFile>;
    my ($keyColumn, @columns) = map {chomp; split} $header;

    while (defined (my $line = <$inFile>)) {
            chomp $line;
    
        my ($key, @data) = split /\s+/, $line;
        @{$dataRef->{$key}}{@columns} = @data;
    }
    
    return @columns;
}
[download]

Prints:

          value1    value2    value3    
ATG       adsad     dsf       asdas     
TTA       24312     144       24
[download]

Note that strictures are used. Always use strictures (use strict; use warnings;). The three parameter version of open is used with lexical file handles.

@{$data{$key}}{@columnNames} and @{$dataRef->{$key}}{@columns} are hash slices - they access a list of hash values. The first case returns the list of values to be printed for a row. The second case is used to assign the list of column values to a record.

Note that parseFile doesn't check to see that data column names for the current file are different than any previous file nor that the key column name (assumed to be the first) is the same. Those are all things that can be fixed if you need them to be.

True laziness is hard work

In reply to Re^2: Hash w/ multiple values + merging by GrandFather
in thread Hash w/ multiple values + merging by sophix

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.