Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am using the following code to try and determine if a column has any data other then 0

#!/perl/bin/perl.exe -w use strict; my @array = (); my @row = (); my %hash = (); if ( @ARGV != 1 ) { print "\nUsage: <File Name> \n\n"; exit(0); } my $file = $ARGV[0]; open (FILE, $file) || die "ERROR: Unable to open $file :$!\n"; @array = <FILE>; my $size = @array; for (0..$size) { @row = (split /\;/, $array[$_]); $hash{$_} = \@row; foreach (@row) { print $_, "\n"; } @row = (); }
I know have a hash of row data, How can I determine if the entire column consists of zero data?

Replies are listed 'Best First'.
Re: Finding Empty Columns
by demerphq (Chancellor) on Feb 14, 2003 at 20:49 UTC

    I am using the following code to try and determine if a column has any data other then 0

    Ill assume you mean 0 or empty

    use strict; use warnings; use Data::Dumper; my @rows; my @empty; while (<>) { chomp; my @row=split /;/; push @rows,\@row; $empty[$_]||=$row[$_] for 0..$#row; } print "Empty columns:",join(", ",grep { $empty[$_] } 0..$#empty),"\n"; print Data::Dumper->Dump([\@rows],['*rows']);

    HTH

    ---
    demerphq


      This seems to work, but can you explain to me what is actually going on ?

      Thanks

        use strict; use warnings; use Data::Dumper; my @rows; # Store the rows here my @empty; # If we see a true value we put it here while (<>) { # read a line at a time chomp; # lose the newline my @row=split /;/; # split the line by ; push @rows,\@row; # store the row $empty[$_]||=$row[$_] # Equivelent to saying $empty[$_] or ($empty[$ +_]=$row[$_]) for 0..$#row; # for each column } print "Empty columns:",join(", ",grep { $empty[$_] } 0..$#empty),"\n"; print Data::Dumper->Dump([\@rows],['*rows']);

        I suppose @empty is a misnomer. It probably should be called @has_data and the code would be clearer. The basic idea is to maintain an array of flags, where the flag tells us if we have seen a value we care about. Since you said "0" this equates to FALSE (along with undef and "") so the ||= operator comes in. It basically says "unless the left side is TRUE, assign it the value on the right side". And since anything but the aformentioned values are true its an elegant way to set the flags.

        ---
        demerphq


Re: Finding Empty Columns
by l2kashe (Deacon) on Feb 14, 2003 at 20:00 UTC
    There is the defined and exists functions.
    snip... my @row = split(/;/, $array[$_]); if ( defined @row ) { $hash{$_} = \@row; } else { print "Empty line at: $_\n"; } snip...
    Alternately the defined could be exists.

    Some comments:
    1) Since you are not returning slices from your split, there is no need for the parentheses surrounding it.
    2) Within the split itself, are you afraid that a semicolon will be interpreted? If so dont be, within the regular expression it is treated as a literal character. If on the other hand you are matching a literal backslash, you need to change the regex to /\\;/, as the back slash is a metacharacter.
    3) Since you appear to be using the line number as a hash index, you could try something along the lines of the following which uses $. This var keeps track of what line you are on in the file in question (Note it begins counting at 1 not 0 like normal). This trick will reduce the memory you use while running this chunk of code
    open(IN, "$file") || die "Cant open $file\nReason: $!\n"; while (<IN>) { chomp(); my @row = split(/;/); if ( defined @row ) { $hash{$.} = \@row; } else { print "Empty line in $file at line: $.\n"; } }
    4) Since I my'd @row inside of the while loop, every iteration of the loop @row will be emptied by perl itself, which can avoid the redundant line @row = (); at the end of your loop.

    Just some pointers :)

    /* And the Creator, against his better judgement, wrote man.c */
Re: Finding Empty Columns
by jdporter (Paladin) on Feb 14, 2003 at 20:12 UTC
    How can I determine if the entire column consists of zero data?
    Three things:
    1. Which column?
    2. Why not let perl open the file for you?
    3. Wouldn't it be better to make an array of records, rather than a hash of records?
    my @rows = map { chomp; [ split /;/ ] } <>; # now you have an array of arrays (rows of columns) my $column = 1; # which column to check for 0 in? if ( grep { $_->[$column] ne "0" } @rows ) { print "Column $column does NOT consist of all 0's.\n"; } # you could extract the contents of just that column, if you need to +: my @column1 = map { $_->[1] } @rows;

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      This is my problem. I have apoximately 240 columns of data.
      I need to determine which columns consist of "0.000000", and disregard them before printing them out.

      Thanks

        So, for example, if column 1 is 0.0 in all rows, then don't print out that column?

        Maybe this will do the trick: (assumes all rows have equal number of cells)
        my @rows = map { chomp; [ split /;/ ] } <>; # now you have an array of arrays (rows of columns) my @non_zero_columns = grep { my $column = $_; grep { $_->[$column] != 0 } @rows } 0 .. $#{$rows[0]}; for ( @rows ) { print "@{$_}[ @non_zero_columns ]\n"; }
        However, that's highly sub-optimal if there are few zero fields in the table.
        This would be better:
        my %zero_columns; @zero_columns{ 0 .. $#{$rows[0]} } = (); # initially, all columns mi +ght have 0. # eliminate any columns that have non-0 in any row: for ( my $r = 0; $r <= $#rows && keys %zero_columns; $r++ ) { for my $i ( keys %zero_columns ) { $rows[$r][$i] == 0 or delete $zero_columns{$i}; } } # invert the set: my @non_zero_columns = grep { ! exists $zero_columns{$_} } 0 .. $#{$rows[0]};

        jdporter
        The 6th Rule of Perl Club is -- There is no Rule #6.