in reply to Counting PDL vectors in a PDL matrix

Since using uniqvec leads to subsequent manual looping over original and perhaps a bit too much additional math, then maybe if, looks like, we are out of luck with elegant vectorized solution anyway, let's find unique lines directly. The @keys are not required to be stored, they can be extracted again using @index, or even get_dararef can be called on severed $uniq and then large string can be split into equal chunks. I didn't investigate what would be more efficient. If data are very large, then maybe md5( $$ref ) could be used. Not sure if appending counts to data is a good idea, but it's in final line, anyway.

All this assuming that order of unique lines should be preserved, and that your data are already efficiently put into large piddle (or, otherwise, if matrix is built line by line, the lookup table can be more easily constructed in the same loop.)

use strict; use warnings; use feature 'say'; use PDL; my $x = pdl [ [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 1, 2], [0, 1, 2], [6, 7, 8], ]; my @index; my @keys; my %count; for ( 0 .. $x-> dim( 1 ) - 1 ) { my $ref = $x-> slice( [], $_ )-> get_dataref; next if $count{ $$ref } ++; push @index, $_; push @keys, $$ref; } my $uniq = $x-> dice( 'X', pdl \@index ); my $counts = pdl @count{ @keys }; say $uniq; say $counts; say $uniq-> append( $counts-> transpose );

Output:

>perl pdl180220.pl [ [0 1 2] [3 4 5] [6 7 8] ] [3 1 2] [ [0 1 2 3] [3 4 5 1] [6 7 8 2] ]

Replies are listed 'Best First'.
Re^2: Counting PDL vectors in a PDL matrix
by mxb (Pilgrim) on Feb 21, 2018 at 15:25 UTC

    Hi,

    Thanks for the quick replies all. Thanks to choroba for linking to uniqvec, which I must have missed during reading the documentation.

    Thanks to vr for the good example code, this clears things up a bit. I came to the conclusion that my whole approach was incorrect (which I kind of expected when learning PDL).

    I am building up the matrix vector by vector, so if we consider each vector as a single entity (which is what I am doing), then rather than a 2D matrix of vectors, in essence I have a 1D vectors of entities. Therefore, I am choosing to build the lookup table during creation.

    Many thanks all.