geogpx has asked for the wisdom of the Perl Monks concerning the following question:

The data i've got output was not yet what i expected. So to debug, i've sorted the data en printed it

@$points = sort { $a->[3] <=> $b->[3] } @$points; foreach (@$points) { print " $_->[3] $_->[0] $_->[1] $_->[2] \n"; }

gives me

1338020418 33.514447422 9.142337666 16.479736 1338020431 33.514425964 9.142852650 16.960449 1338020431 33.514425964 9.142852650 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020459 33.514211388 9.144140110 16.479736 1338020479 33.514125557 9.145019875 14.557007 1338020479 33.514125557 9.145019875 14.557007 1338020484 33.514104099 9.145234451 14.557007 1338020484 33.514104099 9.145234451 14.557007

and yes...there are duplicates. that need to be eliminated. I added the following

my @unique = uniq @$points;

but then my data is gone, only empty fields are printed. That is a good and fast way to delete duplicates in data ???

Replies are listed 'Best First'.
Re: What is the fastest way to delete duplicates from multi dimensional array ?
by Corion (Patriarch) on May 29, 2012 at 13:42 UTC
Re: What is the fastest way to delete duplicates from multi dimensional array ?
by CountZero (Bishop) on May 29, 2012 at 14:40 UTC
    Join all fields together and use a delimiter not in the character set (e.g. use '|'), then throw it in a hash and split the keys again on that delimiter. Fast as the greased lightning.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: What is the fastest way to delete duplicates from multi dimensional array ?
by Tanktalus (Canon) on May 29, 2012 at 19:09 UTC

    Side note. Instead of your sort call, consider using Sort::Key's nkeysort_inplace:

    use Sort::Key qw(nkeysort_inplace); nkeysort_inplace { $_->[3] } @$points;
    Read Sort::Key's docs to find out how to sort on multiple keys if you want to secondarily sort on other values of your points.

    The uniq function doesn't work for you because it treats everything as a string. Since you actually have a list of references (to arrays), not a list of simple scalars, this doesn't quite work. You will pretty much have to roll your own here. I'm sure there are ways to cheat, there always are, but it's probably more work than warranted. Something like setting up your points as objects that overload the q[""] operator to return whatever you want to determine uniqueness on - that might work. But I'm not sure about that. :-)

Re: What is the fastest way to delete duplicates from multi dimensional array ?
by kcott (Archbishop) on May 30, 2012 at 00:19 UTC

    To achieve this, you can change

    my @unique = uniq @$points;

    to

    my %seen; @unique = grep { ! $seen{join(q{,}, @$_)}++ } @$points;

    Tested on the commandline using your posted data:

    $ perl -Mstrict -Mwarnings -e ' my ($points, %seen, @unique); # read your posted data while (<>) { push @$points => [split]; } # remove duplicates @unique = grep { ! $seen{join(q{,}, @$_)}++ } @$points; # print result print qq{@{$_}[0..3]\n} for @unique; ' 1338020418 33.514447422 9.142337666 16.479736 1338020431 33.514425964 9.142852650 16.960449 1338020431 33.514425964 9.142852650 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020459 33.514211388 9.144140110 16.479736 1338020479 33.514125557 9.145019875 14.557007 1338020479 33.514125557 9.145019875 14.557007 1338020484 33.514104099 9.145234451 14.557007 1338020484 33.514104099 9.145234451 14.557007 1338020418 33.514447422 9.142337666 16.479736 1338020431 33.514425964 9.142852650 16.960449 1338020446 33.514318676 9.143496380 16.960449 1338020459 33.514211388 9.144140110 16.479736 1338020479 33.514125557 9.145019875 14.557007 1338020484 33.514104099 9.145234451 14.557007

    As a side issue, note the use of the array slice on the last line. Your

    foreach (@$points) { print " $_->[3] $_->[0] $_->[1] $_->[2] \n"; }

    could have been written as:

    print " @{$_}[3,0,1,2] \n" for @$points;

    -- Ken