Re: Aggregating the column based on the common column values

Welcome to the Monastery.

Your "undefined value as an ARRAY reference" errors come from code like '@{$k{$_}{Added}}'. None of your (shown) input data has types of 'Added' or 'Deleted', so those keys won't exist: the types you do have are: 'added' and 'deleted'. However, not all users have both of those types so, again, some keys won't exist. You can get around that by changing '@{$k{$_}{$type}}' to '@{$k{$_}{$type} || []}', see my code below for examples of this.

You're using string comparison tests ('ne') against arrays (e.g. @{...} ne ''). Don't do that! You only need to test like this: 'if (@{...})'. If '@{...}' evaluates to an empty list, it's FALSE; otherwise, it's TRUE: see "perlsyn: Truth and Falsehood".

Your input data is a ~~tab~~pipe-separated file (with odd bits of whitespace floating around in it). Unless you're doing this for reasons of your own (education, entertainment, whatever), there's really no value in reinventing this wheel. It's already been coded for you: Text::CSV. I show an example of how you might use this, below.

Your example input shows only one date throughout: I suspect that's a poor example and your real data has different dates (but do correct me if I'm wrong). In my example code below, I've given an example of how you might deal with different dates; and I've changed the input so every "Role" action has a different date. What I have there may be unsuitable for you but, as you don't even address the "Date" field in the code you've shown, perhaps it might provide a pointer or two.

In the code example below, I've used the filehandles \*DATA and \*STDOUT. You're going to need to create filehandles to real files. I recommend you stop using the 2-argument form of open with global, package variables and, instead, use the 3-argument form with lexical filehandles. Text::CSV has example code; open also has example code with substantial discussion.

Here's the example code I keep talking about:

#!/usr/bin/env perl

use strict;
use warnings;

use Text::CSV;

my $psv = Text::CSV::->new({sep_char => '|', allow_whitespace => 1})
    or die 'Text::CSV problem: ', Text::CSV::->error_diag();

my %data_for;

while (my $row = $psv->getline(\*DATA)) {
    my ($user, $role, $type, $date) = @$row;
    push @{$data_for{$user}{$type}}, $role; 
    push @{$data_for{$user}{dates}{$type}}, $date;
}

$psv->eol("\n");

for (sort keys %data_for) {
    $psv->print(\*STDOUT, [
        $_,
        join(',', @{$data_for{$_}{added} || []}),
        join(',', @{$data_for{$_}{deleted} || []}),
        join(',',
            @{$data_for{$_}{dates}{added} || []},
            @{$data_for{$_}{dates}{deleted} || []}
        ),
    ]);
}
    
__DATA__
abc|admin |added | 01072015
abc|developer |deleted |02072015
abc|deploy |added |03072015
xyz |admin |deleted |04072015
xyz| deploy|deleted|05072015
cdf|deploy|added|06072015
[download]

Here's the output that script produces:

abc|admin,deploy|developer|01072015,03072015,02072015
cdf|deploy||06072015
xyz||admin,deploy|04072015,05072015
[download]

Update: I originally wrote that the input was tab-separated; no idea why; obviously it's pipe (|) separated. I've corrected that within the text. I've also changed all (4) instances of $tsv to $psv and retested the code: output remains the same. Apologies if that caused any confusion. Changes were made 23 minutes after original posting.

-- Ken

Comment on Re: Aggregating the column based on the common column values Select or Download Code