bkish11 has asked for the wisdom of the Perl Monks concerning the following question:

Hello guru's

I want to read two files and match two keys and merge second key value to first key value.

File one

Bbbb,3333,4444

Cdd,3444,3444

Zddd,345,456

File two

Bbbb_rf1,23,45

Cdd_rf2,24.45

Zddd,34,456.4

I would make it

Bbbb,3333,4444,23,45

So I have created two hash of arrary

 open (P1, "< /tmp/p1");
while (<PROJ>) { next unless s/^(.*?),\s*//; $pr1{$1} = [ split ]; }
open (S1, "< /tmp/s1"); while (<S1>) { next unless s/^(.*?),\s*//; $si1{$1} = [ split ]; } for my $key1 (keys %pr1 ) { for my $key (keys %si1 ) { if (($pr1{$key1} = $sil{$key._ref1}) || $sil{$key}) print "$key: $key1\n";

I am stuck at comaparing, not sure how to go further and merge with first or new hash.Please help me.

Replies are listed 'Best First'.
Re: hash help
by GrandFather (Saint) on Jun 28, 2009 at 20:37 UTC

    First off a few general coding tips:

    Always use the three parameter version of open, always check the result and always lexical file handles. Taking all that into account an open statement becomes:

    open my $file1In, '<', $file1 or die "Failed to open $file1: $!";

    Always use strictures (use strict; use warnings;). if (($pr1{$key1} = $sil{$key._ref1}) ... may be a copy error, but it is wrong on two counts. Perl's numeric equality operator is == (assignment within an if expression is a really bad idea btw), and you actually want a string compare, not a numeric compare, so you should be using eq.

    Something like the following may be what you want:

    use strict; use warnings; my $file1 = <<END_FILE1; Bbbb,3333,4444 Cdd,3444,3444 Zddd,345,456 END_FILE1 my $file2 = <<END_FILE2; Bbbb_rf1,23,45 Cdd_rf2,24.45 Zddd,34,456.4 END_FILE2 my %concat; open my $file1In, '<', \$file1 or die "Failed to open file1: $!"; while (<$file1In>) { chomp; my ($key, @tail) = split ','; next if ! @tail; $concat{$key} = \@tail; } close $file1In; open my $file2In, '<', \$file2 or die "Failed to open file2: $!"; while (<$file2In>) { chomp; my ($key, @tail) = split ','; next if ! @tail or $key !~ s/_(\w*)$// or $1 ne 'rf1'; push @{$concat{$key}}, @tail; print join (',', $key, @{$concat{$key}}), "\n"; } close $file2In;

    Prints:

    Bbbb,3333,4444,23,45

    True laziness is hard work

      I apperciate your help,if I want take all the entries (all file) how would do it

        Remove  or $1 ne 'rf1' from the test in the second while loop. If that doesn't do what you want then show us an example of the output you do want (assuming the original input data).


        True laziness is hard work
Re: hash help
by mzedeler (Pilgrim) on Jun 28, 2009 at 18:00 UTC

    Since you stuff everything in hashes, you can use them to look up.

    my %result; for my $key (keys %pr1, keys %si1) { my @values; push @values, @{$pr1{$key}} if $pr1{$key}; push @values, @{$si1{$key}} if $si1{$key}; $result{$key} = join ', ', @values; }

    The code above is not completely done, but it should get you in the right direction. Please do also consider more clear variable names. Names like $pr1 is an indicator of bad design.

      I would take your sugeestions, my script is very initial stage, gathering data from different file,the reason I choose hash of array, in future I would be able to manupulate the data more flexiable and am still learning perl,I apperciate your help.If am worng please correct me

      I have simillar so many keys like bbbb_rf1,ccc_12,in mutiple files, I should be able gather all the data and merge to a sinle table. you are always welcome to suggest better directopns.

Re: hash help
by Bloodnok (Vicar) on Jun 28, 2009 at 21:13 UTC
    ...depends on what you want to do with the result, but if it's simply to get a conjoined result from 2 input files, then personally I'd KISS (Keep It Simple Stupid) and use join on the CLI (as I've previously said elsewhere).

    A user level that continues to overstate my experience :-))
Re: hash help
by Marshall (Canon) on Jun 29, 2009 at 19:01 UTC
    I think this looks pretty close...
    #!/usr/bin/perl -w use strict; my @files = qw(file1.dat file2.dat); my %hash; foreach my $file (@files) { open (FILE, "<", $file) || die "can't open $file $!"; while (<FILE>) { next if /^\s*$/; chomp; my ($name,@tokens) = split(/,/,$_); $name =~ s/_.*$//; #delete trailing _blah in name push @{$hash{$name}},@tokens; } } foreach my $name (sort keys %hash) { print "$name @{$hash{$name}}\n"; } __END__ prints: Bbbb 3333 4444 23 45 Cdd 3444 3444 24.45 Zddd 345 456 34 456.4 file1.dat: Bbbb,3333,4444 Cdd,3444,3444 Zddd,345,456 file2.dat Bbbb_rf1,23,45 Cdd_rf2,24.45 Zddd,34,456.4

      Appericate your help, it worked some extended,thanks to that. May be I haven't explain too well

      Some of the input are concatenating and trimmed,I would say the first file (P1) and first field will be uniqe and in the second file (S1) "_ref1 or _ref3 or _ref4" will have extension.

      Input File One P1

      bbb,2,3

      aaa_1,4,5

      ccc_1,5,6

      Input file Two s1

      bbb_ref1,5

      aaa_1_ref3,8

      ccc_1_ref2,6

      Input file three S2 and S3

      bbb_ref1,10

      aaa_1_ref3,9

      ccc_1_ref2,11

      The result will be

      bbb,2,3,5,10

      aaa_1,4,5,8,9

      ccc_1,5,6,6,11

      I apologize for the confusion

        All that is required is a very slight "tweak" to the $name regex statement (if name ends in _refX, then that part is deleted, otherwise not).

        I suppose you are new to Perl. Note that one of the true "Powerhitter" feature of Perl is the total absence of indices (no [$i] stuff). The only "if" statement is pretty much optional provided that you have good data file to work with as it only skips completely blank lines. Play with the code. You will also notice that the order of the files doesn't matter (no special case for the first file).

        Have fun and happy Perling!

        #!/usr/bin/perl -w use strict; my @files = qw(file1.dat file2.dat file3.dat); my %hash; foreach my $file (@files) { open (FILE, "<", $file) || die "can't open $file $!"; while (<FILE>) { next if /^\s*$/; #simply skips blank lines chomp; my ($name,@tokens) = split(/,/,$_); $name =~ s/_ref\d+$//; push @{$hash{$name}},@tokens; } } foreach my $name (sort keys %hash) { print "$name @{$hash{$name}}\n"; } __END__ prints: aaa_1 4 5 8 9 bbb 2 3 5 10 ccc_1 5 6 6 11
Re: hash help
by bichonfrise74 (Vicar) on Jun 29, 2009 at 17:18 UTC
    Another possible solution?
    #!/usr/bin/perl use strict; my $file_1 = <<EOF; Bbbb,3333,4444 Cdd,3444,3444 Zddd,345,456 EOF my %records; open( my $fh, "<", \$file_1 ) or die "error opening file."; while( <$fh> ) { chomp; my @cols = split( "," ); $records{$cols[0]} = "$cols[1],$cols[2]"; } close( $fh ); while (<DATA>) { my @cols = split( "," ); $cols[0] =~ s/_\w+//; print "$cols[0],$records{$cols[0]},$cols[1],$cols[2]"; } __DATA__ Bbbb_rf1,23,45 Cdd_rf2,24,45 Zddd,34,456.4