join column 3 for unique values

hello_beginner has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: join column 3 for unique values by stevieb (Canon) on Sep 05, 2016 at 14:51 UTC
I'd stash everything into a data structure (HoHoA, ie. hash of hashes of arrays), which makes sure everything gets aggregated properly, then print it out. `use warnings; use strict; my %hash; while (<DATA>){ chomp; my ($key, $val, $colour) = split; push @{ $hash{$key}{$val} }, $colour; } for my $k (keys %hash){ for (keys %{ $hash{$k} }){ my $colours = join ':', @{ $hash{$k}{$_} }; print "$k $_ $colours\n"; } } __DATA__ A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue` [download] Output: `A1 abc yellow:green:blue A2 cde red:yellow B1 xyz green` [download]	[reply] [d/l] [select]
Re^2: join column 3 for unique values by hello_beginner (Novice) on Sep 05, 2016 at 15:04 UTC
Great, this works. If there is duplicate value for A1, like below Yellow, it reapts. we need i to print only once </> __DATA__ A1 abc yellow A1 abc yellow A2 cde yellow A1 abc blue Output: A1 abc yellow:yellow:blue ---only yellow:blue A2 cde yellow B1 xyz green	[reply]
Re^3: join column 3 for unique values by stevieb (Canon) on Sep 05, 2016 at 15:30 UTC
You'll need to show a little effort here now. Hint: check the array if the colour already exists before pushing it onto itself, perldoc -f grep, or use a hash for the colours as well, if the order doesn't matter. update: Also, please see How do I post a question effectively? for the use of `<code></code>` tags (they are for code and data), and use the Preview button to ensure all of your tags are closed etc.	[reply] [d/l]
Re: join column 3 for unique values by Marshall (Canon) on Sep 06, 2016 at 00:03 UTC
I tried a different approach which produces: `A1 abc blue:green:yellow A2 cde red:yellow B1 xyz green` [download] Most of this code is in making the SQLite DB. My "SQL kung-foo" is pretty weak. I think better SQL would be superior. But I got the unique values of Column 1 using the DISTINCT SQL keyword. Then I used those unique values in a SELECT statement loop to get unique rows which filters out the duplicate col3 values, then I printed the results in the same loop. I suppose that there are many variations on this theme. You could just get the unique rows in a single SQL statement and then use Perl to organize the printout by Column 1. I decided to go ahead and post this ~~"weak"~~ solution because it has a few examples of how to work with a reference to an Array of Array, which is one of the ways that you can access the results from the DBI. Update: I went ahead and coded my own suggestion of using one SELECT statement for the distinct rows and using Perl to organize the printout. This works much better and produces the same result: `# Ok, with SELECT DISTINCT * and using Perl for Printout $sth= $dbh->prepare ('SELECT DISTINCT * from test'); $sth -> execute; my $ref_distinct_rows = $sth->fetchall_arrayref; my %result; foreach my $row (@$ref_distinct_rows) { push (@{$result{"$row->[1] $row->[2]"}}, $row->[3]); } foreach my $key (sort keys %result) { print "$key ",join (':',sort @{$result{$key}}),"\n"; }` [download] I think the post, Re: join column 3 for unique values from NetWallah++ is the best direction to go. Try to get the DB to do as much as possible. SQLite doesn't have stored procedures so I couldn't do that. However Sybase does and you should consider that. If the SQL is "right", then there will be only one Perl loop that just prints the results from the DB. ==end update #!usr/bin/perl use warnings; use strict; use Data::Dumper; use DBI qw(:sql_types); my $dbfile = "test1171182.sqlite"; if (-e $dbfile) {unlink $dbfile or die "Delete of $dbfile failed! $!\n +";} my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseError = +> 1}) or die "Couldn't connect to database: " . DBI->errstr; $dbh->do ("CREATE TABLE test ( id integer PRIMARY KEY AUTOINCREMENT, col1 varchar(10) DEFAULT '', col2 varchar(10) DEFAULT '', col3 varchar(10) DEFAULT '' ); "); $dbh->do("BEGIN"); #a single transaction makes this MUCH faster import_data(); $dbh->do("COMMIT"); ### Find unique values for Column 1 my $sth = $dbh->prepare('SELECT DISTINCT col1 FROM test'); $sth->execute; my $ref_unique_col1 = $sth->fetchall_arrayref; #ref to Array of + Arrays my @unique_col1 = sort map{$_->[0]}@$ref_unique_col1; #first col from +AoA $sth = $dbh->prepare('SELECT DISTINCT * FROM test WHERE col1 IS ?'); ### Use unique values for Column one to get AoA of unique col3 values foreach my $col1 (@unique_col1) { $sth->execute($col1); my $ref_result = $sth->fetchall_arrayref; my @col3 = map{$_->[3]}@$ref_result; #get entire column # print Dumper $ref_result; #turn on for debugging... print "$ref_result->[0][1] $ref_result->[0][2] "; print join (':',sort @col3),"\n"; } sub import_data { my $add = $dbh->prepare("INSERT INTO test (col1, col2, col3) VALUES(?,?,?)"); while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines my ($col1, $col2, $col3) = split ' ', $line; $add->execute($col1, $col2, $col3); } } =Above Prints: A1 abc blue:green:yellow A2 cde red:yellow B1 xyz green =cut __DATA__ A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue [download]	[reply] [d/l] [select]
Re: join column 3 for unique values by NetWallah (Canon) on Sep 05, 2016 at 18:33 UTC
It is likely much more efficient to have the database to this for you. Even though sybase lacks the "group concat" functionality, you can get the equivalent by following these steps. ...it is unhealthy to remain near things that are in the process of blowing up. man page for WARP, by Larry Wall	[reply]
Re: join column 3 for unique values by kevbot (Vicar) on Sep 06, 2016 at 02:12 UTC
Hi hello_beginner, Here is another way to do it with Data::Table. Space-delimited data is in data.txt `A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue` [download] Running this script, `#!/usr/bin/env perl use strict; use warnings; use Data::Table; my $t = Data::Table::fromCSV('data.txt', 0, undef,{delimiter => ' '}); my $mt = $t->melt(['col1', 'col2']); my $ct = $mt->cast(['col1', 'col2'], 'variable', Data::Table::STRING, +'value', \&col_join); $ct->sort('col1', 1, 0); print $ct->csv(0, {delimiter => " "}); exit; sub col_join { my @data = @_; my $joined = join(":", @data); return($joined); }` [download] gives the following output, `A1 abc yellow:green:blue A2 cde red:yellow B1 xyz green` [download]	[reply] [d/l] [select]
Re^2: join column 3 for unique values by hello_beginner (Novice) on Sep 06, 2016 at 04:25 UTC
How do I print column 4 and col 5 data, if there are more then 3 columns</p	[reply]
Re^3: join column 3 for unique values by kevbot (Vicar) on Sep 06, 2016 at 05:17 UTC
I'm not sure that I understand your request. The script in my earlier reply (Re: join column 3 for unique values) will work with additional columns in the data file. For example, if data.txt contains this `A1 abc yellow 1 abc B1 xyz green 2 def A2 cde red 3 ghi A1 abc green 4 jkl A2 cde yellow 5 mno A1 abc blue 6 pqr` [download] the output will be `A1 abc yellow:green:blue 1:4:6 abc:jkl:pqr A2 cde red:yellow 3:5 ghi:mno B1 xyz green 2 def` [download]	[reply] [d/l] [select]
Re: join column 3 values of unique values in column 1 by BillKSmith (Monsignor) on Sep 05, 2016 at 14:01 UTC
hello_beginner, Let me try to be helpful, without doing your homework for you. Store your data in a hash of arrays (use column 1 as the hash key). Show us what you have done and how it failed. I see that your output contains field 2 data which is not mentioned in your spec. - Explain. You can improve the appearance of your post by enclosing your input and output in code tags. Bill	[reply]
Re^2: join column 3 values of unique values in column 1 by hello_beginner (Novice) on Sep 05, 2016 at 14:13 UTC
reading from table below data __DATA__ A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue OUTPUT A1 abc Yellow:green:blue B1 xyz green A2 cde red:yellow	[reply]
Re^3: join column 3 values of unique values in column 1 by BillKSmith (Monsignor) on Sep 06, 2016 at 12:58 UTC
You still have not addressed my important question about field 2. Can we be certain that all input entries which have the same field 1 will have the same field 2? If not, how do you handle it? Report an error? Make separate output lines for each field 1/field 2 combination? Arbitrarily choose one of the field 2 values for the output? Bill	[reply]
Re^4: join column 3 values of unique values in column 1 by Anonymous Monk on Sep 06, 2016 at 13:34 UTC
Re: join column 3 for unique values by Anonymous Monk on Sep 05, 2016 at 13:31 UTC
`#!/usr/bin/perl # http://perlmonks.org/?node_id=1171182 use strict; use warnings; $_ = join '', <DATA>; 1 while s/^(\S+ \S+ ).\K\n((?:.\n))\1(.\n)/:$3$2/m; print; __DATA__ A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue` [download]	[reply] [d/l]
Re^2: join column 3 for unique values by hello_beginner (Novice) on Sep 05, 2016 at 14:03 UTC
while ( my @row = $sth>fetchrow_array ) { s/^(\S+ \S+ ).\K\n((?:.\n))\1(.\n)/:$3$2/m; print ; Tried this, but did not get expected result .. please help	[reply]
Re^3: join column 3 for unique values by Anonymous Monk on Sep 05, 2016 at 21:48 UTC
Since I don't have sybase, I'll have to set up a test environment in SQLite instead. I had figured you already had the data in memory, but I guess not. #!/usr/bin/perl # http://perlmonks.org/?node_id=1171182 use strict; use warnings; use DBI; my $dbfile = '/tmp/db.1171182'; unlink $dbfile; my $db = DBI->connect_cached( "DBI:SQLite(RaiseError=>1,PrintError=>0):$dbfile"); $db->do('create table testing (one text, two text, color text)'); $db->begin_work; while( <DATA> ) { $db->do('insert into testing (one, two, color) values (?, ?, ?)', {}, split ' '); } $db->commit; my $sth = $db->prepare('select one, two, color from testing'); $sth->execute; $_ = ''; while( my @row = $sth->fetchrow_array ) { $_ .= "@row\n"; } 1 while s/^(.\n)((?:.\n))\K\1//m; 1 while s/^(\S+ \S+ ).\K\n((?:.\n)?)\1(.*\n)/:$3$2/m; print; __DATA__ A1 abc yellow A1 abc yellow B1 xyz green A2 cde red A1 abc green A2 cde yellow A1 abc blue [download]	[reply] [d/l]