Hello ray15, and welcome to the Monastery!

Here’s a solution using Text::CSV_XS:

File “1.csv”

fragment,id,index accb,10,A bbc,11,B ccd,12,C

File “2.csv”

fragment,id,index bbc,14,E ccd,15,D llk,11,B kks,12,C

Script in file “main.pl”

#!perl use strict; use warnings; use List::MoreUtils 'uniq'; use Text::CSV_XS; my %files = (file1 => '1.csv', file2 => '2.csv'); my %hashes; my $csv = Text::CSV_XS->new( { binary => 1 } ); for my $file (keys %files) { open(my $in, '<', $files{$file}) or die "Cannot open file '$files{$file}' for reading: $!"; <$in>; # Discard column headings while (my $row = $csv->getline($in)) { my $key = shift @$row; $hashes{$file}{$key} = [ @$row ]; } close $in or die "Cannot close file '$files{$file}': $!"; } separator_line(); print join("\t", qw(frag id1 file1 id2 file2)), "\n"; separator_line(); my @keys; push @keys, keys %$_ for values %hashes; @keys = uniq @keys; for my $fragment (sort @keys) { my $f1 = exists $hashes{file1}{$fragment} ? 1 : 0; my $f2 = exists $hashes{file2}{$fragment} ? 1 : 0; printf "%s\t%s\t%s\t%s\t%s\n", $fragment, $f1 ? $hashes{file1}{$fragment}->[0] : '', $f1, $f2 ? $hashes{file2}{$fragment}->[0] : '', $f2, } separator_line(); sub separator_line { print '-' x 37, "\n"; }

Output:

13:06 >perl main.pl ------------------------------------- frag id1 file1 id2 file2 ------------------------------------- accb 10 1 0 bbc 11 1 14 1 ccd 12 1 15 1 kks 0 12 1 llk 0 11 1 ------------------------------------- 13:07 >

Note: I do not try to access $hashes{file1}{$fragment}->[0] until I have confirmed that $hashes{file1}{$fragment} already exists in the hash. This is to avoid autovivification, which is a great Perl feature but is not wanted in this case. (See e.g. Uri Guttman’s tutorial for the gory details.)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: comparing csv files in perl by Athanasius
in thread comparing csv files in perl by ray15

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.