hello_beginner has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to get the below output by reading rows from table. Need to join values of column 3 for unique column 1 values

Read input data

A1 text1 NY Jan 01 A2 text2 LN Feb 02 A3 text3 SG Mar 03 A2 text2 NY Feb 02 A1 text1 SG Jan 01 A1 text1 AUS Jan 01

Expected result

A1 text1 NY:SG:AUS Jan 01 A2 text2 LN:NY Feb 02 A3 text3 SG Mar 03

For unique column 1 values, column2, 4 and 5 will unique.

use warnings; use strict; my %hash; while (<DATA>){ chomp; my ($key, $col2, $col3) = split; push @{ $hash{$key}{$col2} }, $col3; } for my $k (keys %hash){ for (keys %{ $hash{$k} }){ my $cols3 = join ':', @{ $hash{$k}{$_} }; print "$k $_ $cols3\n"; } }

Replies are listed 'Best First'.
Joining columns
by hippo (Archbishop) on Sep 09, 2016 at 21:52 UTC

    This question is surprisingly similar to join column 3 for unique values by you as posted on Monday. Do the answers in that thread not provide the information you require?

Re: Perl help
by kevbot (Vicar) on Sep 10, 2016 at 03:32 UTC
    Hi hello_beginner,

    I agree with hippo, this question is very similar to join column 3 for unique values. In fact, a few changes to the code I posted in Re: join column 3 for unique values gives the results that you require.

    If you have your data in data.txt

    A1 text1 NY Jan 01 A2 text2 LN Feb 02 A3 text3 SG Mar 03 A2 text2 NY Feb 02 A1 text1 SG Jan 01 A1 text1 AUS Jan 01
    This script,
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; my $t = Data::Table::fromCSV('data.txt', 0, undef,{delimiter => ' '}); my $mt = $t->melt(['col1', 'col2', 'col4', 'col5']); my $ct = $mt->cast(['col1', 'col2', 'col4', 'col5'], 'variable', Data: +:Table::STRING, 'value', \&col_join); $ct->sort('col1', 1, 0); $ct->reorder( [ 'col1', 'col2', 'col3', 'col4', 'col5' ] ); print $ct->csv(0, {delimiter => " "}); exit; sub col_join { my @data = @_; my $joined = join(":", @data); return($joined); }
    will give this output
    A1 text1 NY:SG:AUS Jan 01 A2 text2 LN:NY Feb 02 A3 text3 SG Mar 03

    If you often work on problems like these, I recommend reading Data::Table and the Data::Table Cookbook (especially Section 8.1: Reshaping - melting and casting for table statistics).

    As mentioned by stevieb, please take a closer look at How do I post a question effectively? especially the Select an informative title section.

      Thanks for your help. I am reading the data from database and not csv.

        Then just adapt the SQLite code I gave you at: Re: join column 3 for unique values.
        Read the whole previous thread carefully, there are many ideas in there that are applicable to this problem which is almost the same one.
Re: Perl help
by kcott (Archbishop) on Sep 10, 2016 at 14:54 UTC

    G'day hello_beginner,

    I think you have an overly complex data structure: a Hash of Hashes of Arrays (HoHoA). I've used a Hash of Arrays (HoA): the keys are col1 values (e.g. A1, A2, etc.); the values are arrayrefs containing all the elements returned by split.

    [In the code below: UNIQ refers to your A1, A2 & A3; CODE refers to your NY, SG, AUS, etc. codes.]

    #!/usr/bin/env perl -l use strict; use warnings; use autodie; use constant { UNIQ => 0, CODE => 2, JOINER => ':', }; my %data; while (<DATA>) { my @cols = split; if (exists $data{$cols[UNIQ]}) { $data{$cols[UNIQ]}->[CODE] .= JOINER . $cols[CODE]; } else { $data{$cols[UNIQ]} = [ @cols ]; } } print "@{$data{$_}}" for sort keys %data; __DATA__ A1 text1 NY Jan 01 A2 text2 LN Feb 02 A3 text3 SG Mar 03 A2 text2 NY Feb 02 A1 text1 SG Jan 01 A1 text1 AUS Jan 01

    Which outputs:

    A1 text1 NY:SG:AUS Jan 01 A2 text2 LN:NY Feb 02 A3 text3 SG Mar 03

    See also: "perldsc -- Perl Data Structures Cookbook".

    — Ken

Re: Perl help
by neilwatson (Priest) on Sep 09, 2016 at 18:47 UTC

    This line: print "$k $_ $cols3\n"; what do you expect $_ to be? And always post expected and actual results.

    Neil Watson
    watson-wilson.ca

Re: Perl help
by stevieb (Canon) on Sep 09, 2016 at 18:18 UTC