Array Cleaning

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I wrote two functions to clean arrays (delete '' and undef elements): one which doesn't preserve the order, and another which preserves the order.

Unfortunately the following doesn't work:

#!/usr/bin/perl

use strict;
use warnings;

my @array = (3,2,1,1,1,2,3,4,6,"",5,"","",9,"","");

my @clean_array = clean(@array);
my @clean_ord_array = clean_ord(@array);

print "\nclean_array:\n";
map { print "elem: $_\n"; } @clean_array;

print "\nclean_ord_array:\n";
map { print "elem: $_\n"; } @clean_ord_array;

# clean, don't preserve order
sub clean {
    my @clean = @_;
    my @seen;
    my @cleaned;
    my %seen;
    my $i;

    for ( $i = 0 ; $i <= $#clean ; $i++ ) {
        unless ( ( defined $clean[$i] ) && ( $clean[$i] ne '' ) ) {
            splice @clean, $i, 1;
        }
    }

    @seen{@clean} = ();
    @cleaned = sort keys %seen;

    if (wantarray) {
        return @cleaned;
    }
    else {
        if ( defined($/) ) {
            return join "$/", @cleaned;
        }
        else {
            return join "\n", @cleaned;
        }
    }
}

# clean, preserve order
sub clean_ord {
    my @clean = @_;
    my $clean;
    my @seen;
    my @cleaned;
    my %seen;
    my $i;

    for ( $i = 0 ; $i <= $#clean ; $i++ ) {
        unless ( ( defined $clean[$i] ) && ( $clean[$i] ne '' ) ) {
            splice @clean, $i, 1;
        }
    }

    foreach $clean (@clean) {
        unless ( exists $seen{$clean} ) {
            push ( @cleaned, $clean );
            $seen{$clean} = 1;
        }
    }

    if (wantarray) {
        return @cleaned;
    }
    else {
        if ( defined($/) ) {
            return join "$/", @cleaned;
        }
        else {
            return join "\n", @cleaned;
        }
    }
}
[download]

The output is:

 
clean_array:
elem:
elem: 1
elem: 2
elem: 3
elem: 4
elem: 5
elem: 6
elem: 9
 
clean_ord_array:
elem: 3
elem: 2
elem: 1
elem: 4
elem: 6
elem: 5
elem:
elem: 9
[download]

But should be:

 
clean_array:
elem: 1
elem: 2
elem: 3
elem: 4
elem: 5
elem: 6
elem: 9
 
clean_ord_array:
elem: 3
elem: 2
elem: 1
elem: 4
elem: 6
elem: 5
elem: 9
[download]

What am I doing wrong?

Thanks in advance, ReelBigFish

2003-05-02 edit ybiC: <readmore>

Comment on Array Cleaning Select or Download Code

Replies are listed 'Best First'.
Re: Array Cleaning by crenz (Priest) on May 02, 2003 at 16:55 UTC
Maybe I'm missing something, but why are you not using `grep`? E.g. `@array = (3, 2, 1, "", 2, undef); @cleanarray = grep { defined $_ && $_ ne "" } @array; @sortedcleanarray = sort grep { defined $_ && $_ ne "" } @array;` [download] Of course, this will not remove duplicates. From your post I am not sure whether you want that or not (Your "should be" output has the duplicates removed).	[reply] [d/l] [select]
Re: Re: Array Cleaning by Anonymous Monk on May 02, 2003 at 17:01 UTC
Yes, I forgot to mention the duplicate entries. They should also be removed. Another problem with grep is, that it is extremely slow with big arrays (at least I heard that :))	[reply]
Re^3: Array Cleaning ("grep is slow") by Aristotle (Chancellor) on May 03, 2003 at 03:01 UTC
Never guess wildly at what may or may not be slow. Don't concern yourself with performance unless it is a problem. When you need performance, profile your code to find out where the problem is. If you really really need a very very fast program then why are you using Perl? And some quotes from the collection on my home node: More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity. — William A. Wulf, A Case Against the GOTO Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. — Michael A. Jackson Makeshifts last the longest.	[reply]
Re: Re: Re: Array Cleaning by crenz (Priest) on May 02, 2003 at 17:11 UTC
Well, I find it hard to believe that a built-in perl function that has been optimised and is written in C should be slower than walking through the array yourself... (ie. doing the exact same thing the built-in is supposed to do). It is contrary to my own experience, at least. Plus I find myself too lazy to write and debug own code when I can use a built-in :). If you really have reason to believe what you've heard, why not just quickly whip up a benchmark.	[reply]
Re: Array Cleaning by chromatic (Archbishop) on May 02, 2003 at 17:13 UTC
No one has yet explained why this doesn't work as you expect. Consider an array of three values: `my @to_clean = ( '', 1, 2 );` We'll loop through the array with indices. We'll make sure that the element at that position is defined, true, and not the empty string: `for my $i ( 0 .. $#to_clean ) { next if defined $to_clean[ $i ] and $to_clean[ $i ]; next unless $to_clean[ $i ] eq ''; }` [download] Here's where it gets tricky. By spliceing, you're changing the original array. `splice( @to_clean, $i, 1 );` After the first loop, we'll splice out the empty string at index zero. The array will then contain 1 and 0 2. Unfortunately, `$i` will be 1 on the next loop iteration, and it won't check the 0th element again. You could mess with redo. You could push the elements you want to keep on a secondary array. It's unlikely you'll come up with anything simpler or faster than grep though. Update: This lets undef through, but it catches zero. It's just an example, though.	[reply] [d/l] [select]
Re: Re: Array Cleaning by parv (Parson) on May 03, 2003 at 06:42 UTC
We'll make sure that the element at that position is defined, true, and not the empty string: `for my $i ( 0 .. $#to_clean ) { next if defined $to_clean[ $i ] and $to_clean[ $i ]; next unless $to_clean[ $i ] eq ''; }` [download] If testing for a truth value as above, why is there a need to separately test for an empty string? As empty string will evaluate to a false value, 2d "next" statement is of no use.	[reply] [d/l]
Re: Array Cleaning by artist (Parson) on May 02, 2003 at 17:01 UTC
Here is the simple solution in terms of what you want to achieve. `my %X; @clean_ord_array = grep {!defined $X{$_} and $X{$_} = 1 } grep /\d+/ ,@array; print join "\n", @clean_ord_array,"\n"; print join "\n", sort { $a <=> $b } @clean_ord_array,"\n"` [download] artist	[reply] [d/l]
Re: Re: Array Cleaning by Anonymous Monk on May 02, 2003 at 18:20 UTC
Thanks, this works great. How I additionally remove all whitespaces from every element (`s/\s//g`)?	[reply] [d/l]
Re: Re: Re: Array Cleaning by artist (Parson) on May 02, 2003 at 18:29 UTC
It doesn't look like that you have white spaces in the numbers. To resolve your problem, just in case.. you can put our code before the @array using map. `my %X; @clean_ord_array = grep {!defined $X{$_} and $X{$_} = 1 } grep /\d+/, map { s/\s//g } @array;` [download] artist	[reply] [d/l]
Re: Array Cleaning by shemp (Deacon) on May 02, 2003 at 17:12 UTC
Other posts have given you code to do what you need, here is why your code fails. Consider an array that has the following: `my @array = ("", "");` [download] Now hand trace the part of clean() that splices out the blank elements. the loop will iterate from 0 to 1 Element 0 gets spliced out, leaving the array with the singile element at index 0 = "" and your loop iterator gets incremented to 1, but the blank element that was at position 1 is now at position 0 as a result of the splice. That element never gets a chance to be examined. To summarize, splice()ing out array elements changes the indexes of subsequent elements, so your loop iterator wont indicate whta it was intended to. As a "band-aid" to your function, if you change the unless() action as follows: `splice @clean, $i, 1; $i--;` [download] your function should work as you want it to. But, others have offered better solutions.	[reply] [d/l] [select]
Re: Array Cleaning by Necos (Friar) on May 02, 2003 at 20:35 UTC
There is one more way to deal with the splicing problem: run through the array backwards. It sounds wierd, but it does work. If you build up a list of the indexes that you want to splice out, process it in reverse (using pop), and then splice, the indices won't change (because the array is shrinking from the end), and everything is fine. I just so happened to get bitten by something similar. Consider the following (untested) snippet: my $splice_index = 0; my @splice_list; for (@clean) { if (length($_) == 0) { #zero - length string push(@splice_list,$splice_index); $splice_index++; next; } $splice_index++; } #now we have our splice list built up. so, we take care of the element +s that need to go. while (@splice_list) { my $index = pop(@splice_list); splice(@clean,$index,1); } print "$_\n" for @clean; #should produce the correct results @clean = map {s/\s+//g; $_ } @clean; #i try to avoid map in void conte +xt, except for in my sig ^_~ print "$_\t" for @clean; print "\n"; [download] I actually used something like this in a module I'm writing. I hope it's of some use to you. YMMV. Theodore Charles III Network Administrator Los Angeles Senior High email->secon_kun@hotmail.com `perl -e "map{print++$_}split//,Mdbnr;"`	[reply] [d/l] [select]
Re: Array Cleaning by Anonymous Monk on May 02, 2003 at 18:29 UTC
Thanks for your answers. It works great. But how would I also remove all whitespaces (`s/\s//g`) from every array element? ReelBigFish	[reply] [d/l]
Re: Array Cleaning by Anonymous Monk on May 02, 2003 at 17:25 UTC
Why not just map the defined array values to a hash as keys and then sort numerically? Takes care of duplicates, the sort is fast, and order is retained...not sure how it scales with input size compared with grep though. `@array = sort {$a <=> $b} keys (%hash = map { $_ => 1 } @array);` [download]	[reply] [d/l]
Re: Re: Array Cleaning by Anonymous Monk on May 02, 2003 at 17:42 UTC
nevermind. idea is ok but untested code sucks. my bad. wallowing in shame. Forget I said anything. :)	[reply]