Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I wrote two functions to clean arrays (delete '' and undef elements): one which doesn't preserve the order, and another which preserves the order.

Unfortunately the following doesn't work:

#!/usr/bin/perl use strict; use warnings; my @array = (3,2,1,1,1,2,3,4,6,"",5,"","",9,"",""); my @clean_array = clean(@array); my @clean_ord_array = clean_ord(@array); print "\nclean_array:\n"; map { print "elem: $_\n"; } @clean_array; print "\nclean_ord_array:\n"; map { print "elem: $_\n"; } @clean_ord_array; # clean, don't preserve order sub clean { my @clean = @_; my @seen; my @cleaned; my %seen; my $i; for ( $i = 0 ; $i <= $#clean ; $i++ ) { unless ( ( defined $clean[$i] ) && ( $clean[$i] ne '' ) ) { splice @clean, $i, 1; } } @seen{@clean} = (); @cleaned = sort keys %seen; if (wantarray) { return @cleaned; } else { if ( defined($/) ) { return join "$/", @cleaned; } else { return join "\n", @cleaned; } } } # clean, preserve order sub clean_ord { my @clean = @_; my $clean; my @seen; my @cleaned; my %seen; my $i; for ( $i = 0 ; $i <= $#clean ; $i++ ) { unless ( ( defined $clean[$i] ) && ( $clean[$i] ne '' ) ) { splice @clean, $i, 1; } } foreach $clean (@clean) { unless ( exists $seen{$clean} ) { push ( @cleaned, $clean ); $seen{$clean} = 1; } } if (wantarray) { return @cleaned; } else { if ( defined($/) ) { return join "$/", @cleaned; } else { return join "\n", @cleaned; } } }

The output is:

clean_array: elem: elem: 1 elem: 2 elem: 3 elem: 4 elem: 5 elem: 6 elem: 9 clean_ord_array: elem: 3 elem: 2 elem: 1 elem: 4 elem: 6 elem: 5 elem: elem: 9

But should be:

clean_array: elem: 1 elem: 2 elem: 3 elem: 4 elem: 5 elem: 6 elem: 9 clean_ord_array: elem: 3 elem: 2 elem: 1 elem: 4 elem: 6 elem: 5 elem: 9

What am I doing wrong?

Thanks in advance, ReelBigFish

2003-05-02 edit ybiC: <readmore>

Replies are listed 'Best First'.
Re: Array Cleaning
by crenz (Priest) on May 02, 2003 at 16:55 UTC

    Maybe I'm missing something, but why are you not using grep? E.g.

    @array = (3, 2, 1, "", 2, undef); @cleanarray = grep { defined $_ && $_ ne "" } @array; @sortedcleanarray = sort grep { defined $_ && $_ ne "" } @array;

    Of course, this will not remove duplicates. From your post I am not sure whether you want that or not (Your "should be" output has the duplicates removed).

      Yes, I forgot to mention the duplicate entries. They should also be removed.

      Another problem with grep is, that it is extremely slow with big arrays (at least I heard that :))

        1. Never guess wildly at what may or may not be slow.
        2. Don't concern yourself with performance unless it is a problem.
        3. When you need performance, profile your code to find out where the problem is.
        4. If you really really need a very very fast program then why are you using Perl?
        And some quotes from the collection on my home node:
        More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.
        — William A. Wulf, A Case Against the GOTO
        Rules of Optimization:
        Rule 1: Don't do it.
        Rule 2 (for experts only): Don't do it yet.
        — Michael A. Jackson

        Makeshifts last the longest.

        Well, I find it hard to believe that a built-in perl function that has been optimised and is written in C should be slower than walking through the array yourself... (ie. doing the exact same thing the built-in is supposed to do). It is contrary to my own experience, at least. Plus I find myself too lazy to write and debug own code when I can use a built-in :).

        If you really have reason to believe what you've heard, why not just quickly whip up a benchmark.

Re: Array Cleaning
by chromatic (Archbishop) on May 02, 2003 at 17:13 UTC

    No one has yet explained why this doesn't work as you expect.

    Consider an array of three values:

    my @to_clean = ( '', 1, 2 );

    We'll loop through the array with indices. We'll make sure that the element at that position is defined, true, and not the empty string:

    for my $i ( 0 .. $#to_clean ) { next if defined $to_clean[ $i ] and $to_clean[ $i ]; next unless $to_clean[ $i ] eq ''; }

    Here's where it gets tricky. By spliceing, you're changing the original array.

    splice( @to_clean, $i, 1 );

    After the first loop, we'll splice out the empty string at index zero. The array will then contain 1 and 0 2. Unfortunately, $i will be 1 on the next loop iteration, and it won't check the 0th element again.

    You could mess with redo. You could push the elements you want to keep on a secondary array. It's unlikely you'll come up with anything simpler or faster than grep though.

    Update: This lets undef through, but it catches zero. It's just an example, though.

      We'll make sure that the element at that position is defined, true, and not the empty string:
      for my $i ( 0 .. $#to_clean ) { next if defined $to_clean[ $i ] and $to_clean[ $i ]; next unless $to_clean[ $i ] eq ''; }

      If testing for a truth value as above, why is there a need to separately test for an empty string? As empty string will evaluate to a false value, 2d "next" statement is of no use.

Re: Array Cleaning
by artist (Parson) on May 02, 2003 at 17:01 UTC
    Here is the simple solution in terms of what you want to achieve.
    my %X; @clean_ord_array = grep {!defined $X{$_} and $X{$_} = 1 } grep /\d+/ ,@array; print join "\n", @clean_ord_array,"\n"; print join "\n", sort { $a <=> $b } @clean_ord_array,"\n"
    artist

      Thanks, this works great.

      How I additionally remove all whitespaces from every element (s/\s//g)?

        It doesn't look like that you have white spaces in the numbers. To resolve your problem, just in case.. you can put our code before the @array using map.
        my %X; @clean_ord_array = grep {!defined $X{$_} and $X{$_} = 1 } grep /\d+/, map { s/\s//g } @array;
        artist
Re: Array Cleaning
by shemp (Deacon) on May 02, 2003 at 17:12 UTC
    Other posts have given you code to do what you need, here is why your code fails.
    Consider an array that has the following:
    my @array = ("", "");
    Now hand trace the part of clean() that splices out the blank elements.

    the loop will iterate from 0 to 1
    Element 0 gets spliced out, leaving the array with the singile element at index 0 = ""
    and your loop iterator gets incremented to 1, but the blank element that was at position 1 is now at position 0 as a result of the splice. That element never gets a chance to be examined.

    To summarize, splice()ing out array elements changes the indexes of subsequent elements, so your loop iterator wont indicate whta it was intended to.
    As a "band-aid" to your function, if you change the unless() action as follows:
    splice @clean, $i, 1; $i--;
    your function should work as you want it to.

    But, others have offered better solutions.
Re: Array Cleaning
by Necos (Friar) on May 02, 2003 at 20:35 UTC
    There is one more way to deal with the splicing problem: run through the array backwards. It sounds wierd, but it does work. If you build up a list of the indexes that you want to splice out, process it in reverse (using pop), and then splice, the indices won't change (because the array is shrinking from the end), and everything is fine. I just so happened to get bitten by something similar.

    Consider the following (untested) snippet:
    my $splice_index = 0; my @splice_list; for (@clean) { if (length($_) == 0) { #zero - length string push(@splice_list,$splice_index); $splice_index++; next; } $splice_index++; } #now we have our splice list built up. so, we take care of the element +s that need to go. while (@splice_list) { my $index = pop(@splice_list); splice(@clean,$index,1); } print "$_\n" for @clean; #should produce the correct results @clean = map {s/\s+//g; $_ } @clean; #i try to avoid map in void conte +xt, except for in my sig ^_~ print "$_\t" for @clean; print "\n";
    I actually used something like this in a module I'm writing. I hope it's of some use to you. YMMV.

    Theodore Charles III
    Network Administrator
    Los Angeles Senior High
    email->secon_kun@hotmail.com
    perl -e "map{print++$_}split//,Mdbnr;"
Re: Array Cleaning
by Anonymous Monk on May 02, 2003 at 18:29 UTC

    Thanks for your answers. It works great. But how would I also remove all whitespaces (s/\s//g) from every array element?

    ReelBigFish

Re: Array Cleaning
by Anonymous Monk on May 02, 2003 at 17:25 UTC
    Why not just map the defined array values to a hash as keys and then sort numerically? Takes care of duplicates, the sort is fast, and order is retained...not sure how it scales with input size compared with grep though.
    @array = sort {$a <=> $b} keys (%hash = map { $_ => 1 } @array);
      nevermind. idea is ok but untested code sucks. my bad. wallowing in shame. Forget I said anything. :)