comment on

Hello ray15, and welcome to the Monastery!

Here’s a solution using Text::CSV_XS:

File “1.csv”

fragment,id,index
accb,10,A
bbc,11,B
ccd,12,C
[download]

File “2.csv”

fragment,id,index
bbc,14,E
ccd,15,D
llk,11,B
kks,12,C
[download]

Script in file “main.pl”

#!perl
use strict;
use warnings;
use List::MoreUtils 'uniq';
use Text::CSV_XS;

my %files = (file1 => '1.csv', file2 => '2.csv');
my %hashes;
my $csv   = Text::CSV_XS->new( { binary => 1 } );

for my $file (keys %files)
{
    open(my $in, '<', $files{$file})
        or die "Cannot open file '$files{$file}' for reading: $!";

    <$in>;      # Discard column headings

    while (my $row = $csv->getline($in))
    {
        my $key = shift @$row;
        $hashes{$file}{$key} = [ @$row ];
    }

    close $in
        or die "Cannot close file '$files{$file}': $!";
}

separator_line();
print join("\t", qw(frag id1 file1 id2 file2)), "\n";
separator_line();

my   @keys;
push @keys, keys %$_ for values %hashes;
     @keys = uniq @keys;

for my $fragment (sort @keys)
{
    my $f1 = exists $hashes{file1}{$fragment} ? 1 : 0;
    my $f2 = exists $hashes{file2}{$fragment} ? 1 : 0;

    printf "%s\t%s\t%s\t%s\t%s\n",
            $fragment,
            $f1 ? $hashes{file1}{$fragment}->[0] : '',
            $f1,
            $f2 ? $hashes{file2}{$fragment}->[0] : '',
            $f2,
}

separator_line();

sub separator_line
{
    print '-' x 37, "\n";
}
[download]

Output:

13:06 >perl main.pl
-------------------------------------
frag    id1     file1   id2     file2
-------------------------------------
accb    10      1               0
bbc     11      1       14      1
ccd     12      1       15      1
kks             0       12      1
llk             0       11      1
-------------------------------------

13:07 >
[download]

Note: I do not try to access $hashes{file1}{$fragment}->[0] until I have confirmed that $hashes{file1}{$fragment} already exists in the hash. This is to avoid autovivification, which is a great Perl feature but is not wanted in this case. (See e.g. Uri Guttman’s tutorial for the gory details.)

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

In reply to Re: comparing csv files in perl by Athanasius
in thread comparing csv files in perl by ray15

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.