in reply to Counting PDL vectors in a PDL matrix
Since using uniqvec leads to subsequent manual looping over original and perhaps a bit too much additional math, then maybe if, looks like, we are out of luck with elegant vectorized solution anyway, let's find unique lines directly. The @keys are not required to be stored, they can be extracted again using @index, or even get_dararef can be called on severed $uniq and then large string can be split into equal chunks. I didn't investigate what would be more efficient. If data are very large, then maybe md5( $$ref ) could be used. Not sure if appending counts to data is a good idea, but it's in final line, anyway.
All this assuming that order of unique lines should be preserved, and that your data are already efficiently put into large piddle (or, otherwise, if matrix is built line by line, the lookup table can be more easily constructed in the same loop.)
use strict; use warnings; use feature 'say'; use PDL; my $x = pdl [ [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 1, 2], [0, 1, 2], [6, 7, 8], ]; my @index; my @keys; my %count; for ( 0 .. $x-> dim( 1 ) - 1 ) { my $ref = $x-> slice( [], $_ )-> get_dataref; next if $count{ $$ref } ++; push @index, $_; push @keys, $$ref; } my $uniq = $x-> dice( 'X', pdl \@index ); my $counts = pdl @count{ @keys }; say $uniq; say $counts; say $uniq-> append( $counts-> transpose );
Output:
>perl pdl180220.pl [ [0 1 2] [3 4 5] [6 7 8] ] [3 1 2] [ [0 1 2 3] [3 4 5 1] [6 7 8 2] ]
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Counting PDL vectors in a PDL matrix
by mxb (Pilgrim) on Feb 21, 2018 at 15:25 UTC | |
by etj (Priest) on May 07, 2022 at 23:08 UTC |