comment on

Your question is barely readable, your code does not compile (you still have file3 all over the code), and you are declaring the use of a Text::CSV module without making use of it anywhere in the code. This has homework written all over it.

Still, for someone with a similar problem, a solution could help. Let's define a problem: we have a few files, organized in columns. The first column is the ID, the second is an item name, the rest are not interesting from our point of view. We are making a few assumptions for the input - the ID's cannot be equal to 0, and there is only one line in a file for a given item name.

The input:

 
### test1.csv ###
1  aaa  ignored_field
3  ccc ignored_field
4  ddd ignored_field

### test2.csv ###
11 aaa ignored_field
22 bbb ignored_field
44 ddd   ignored_field

### test3.csv ###
333 ccc ignored_field
555 eee ignored_field
666  fff   ignored_field
777  ggg   ignored_field
[download]

What we want to accomplish is to produce a report with all items, clearly stating the ID under which an item is stored in a file. If a file does not contain this item, instead of the ID, a 0 will be shown.

#!/usr/bin/perl
use v5.14;

my %data;

sub cnt_fields {
    my @fields = split "\t", $_[0];
    return scalar @fields;
}

sub gather_file {
    state $count = 0;
    return $count unless @_;

    my ($filename, $d) = @_;
    open my $fh, "<", $filename;
    while (<$fh>) {
        chomp;
        my @field = split;
        my $offset = $count - cnt_fields($d->{$field[1]});
        $d->{$field[1]} .= "0\t" x $offset;
        $d->{$field[1]} .= "$field[0]\t";
    }
    close $fh;

    for my $key (keys %$d) {
        if (cnt_fields($d->{$key}) <= $count) {
            $d->{$key} .= "0\t";
        }
    }
    $count++;
}

for (1..3) {
    gather_file("test$_.csv", \%data);
}

my $header = "Key\t";
for (1..gather_file()) {
    $header .= "f$_\t";
}
say $header;

for my $key (sort keys %data) {
    say "$key\t$data{$key}";
}
[download]

The output:

$ ./report.pl 
Key    f1    f2    f3
aaa    1    11    0    
bbb    0    22    0    
ccc    3    0    333    
ddd    4    44    0    
eee    0    0    555    
fff    0    0    666    
ggg    0    0    777
[download]

regards,
Luke Jefferson

In reply to Re: comparing csv files in perl by blindluke
in thread comparing csv files in perl by ray15

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.