How could I simplify this redundant-column-removing code?

rubystallion has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: How could I simplify this redundant-column-removing code?
by kennethk (Abbot) on Jun 17, 2015 at 16:09 UTC

If you are just incrementing by 1, use Foreach Loops instead of C-style loops -- fewer moving parts:
```
for my $i (0 .. @F-1) {
[download]
```
Avoid punctuation variables if you can, unless this is your toy and you are super comfy with them. Rather than testing $., just test if @F1 is initialized:
```
    @F1 = @F if !@F1;
[download]
```
Note that you shouldn't use logical compound assignment operators because they have scalar context.
Contrasting the above, you should be using $_ in this case because $line is so highly localized.
```
while (<DATA>) {
    chomp;
    my @F = split '&';
[download]
```
+@F1 is a no-op. Numification requires two arguments, so you'd need to write 0+@F1, but you don't even need to do that because logical operators like != also apply scalar context to their arguments.
Your algorithm gets simpler and allows using a hash if you track which terms to delete rather than which ones to keep. I'm assuming that you don't have repeated keys.

#!/usr/bin/env perl -w
use v5.014;
my %seen;
my $count;
my @recs;
while (<DATA>) {
    chomp;
    my @F = split '&';
    $count //= @F;
    die "NF mismatch" if @F != $count;
    $seen{$_}++ for @F;
    push @recs, \@F;
}

for my $rec (@recs) {
    say join "\t",
        grep $seen{$_} != @recs, # Doesn't show up in every record
        @$rec
        ;
}



__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
[download]

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

[reply]
[d/l]
[select]

Re: How could I simplify this redundant-column-removing code?
by Athanasius (Archbishop) on Jun 17, 2015 at 15:47 UTC

Hello rubystallion,

Your approach looks about right to me. The only thing that concerns me is that in the second (non-nested) for loop you have to test whether each member of @F1 is defined for every element of every record. The following reduces the number of tests by deploying an array slice, which is calculated only once:

use 5.020;      # includes strictures
use warnings;

my (@F1, @recs);

while (my $line = <DATA>)
{
    chomp $line;
    my @F = split '&', $line;
    @F1 = @F if $. == 1;
    die "NF mismatch" unless @F1 == @F;
    push @recs, \@F;

    for my $i (0 .. $#F)
    {
        next    unless defined $F1[$i];
        $F1[$i] = undef unless $F1[$i] eq $F[$i];
    }        
}

my @keep;
defined $F1[$_] || push @keep, $_ for (0 .. $#F1);

print join("\t", @$_[ @keep ]), "\n" for @recs;

__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
[download]

Output:

 1:46 >perl 1276_SoPW.pl
b=1     c=1     d=2     f=3
b=2     c=3     d=2     f=4
b=2     c=5     d=1     f=5

 1:46 >
[download]

Is there any way to ... make it easier for me to write something like this bug-free the first time?

If only! ;-)

Anyway, hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: How could I simplify this redundant-column-removing code?
by BrowserUk (Patriarch) on Jun 17, 2015 at 19:28 UTC

My version:

#! perl -sw
use strict;

my $pos = tell DATA;
my %tally; ++$tally{ $_ } for map split( '&' ), <DATA>;
my $lines = $.;
seek DATA, $pos, 0;

print join '&', grep{ $tally{ $_ } != $lines } split '&' while <DATA>

__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
[download]

Produces:

C:\test>junk
b=1&c=1&d=2&f=3
b=2&c=3&d=2&f=4
b=2&c=5&d=1&f=5
[download]

For a real file, you wouldn't need the tell, just rewind the file with seek $fh, 0, 0; for the second pass.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this

In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

[reply]
[d/l]
[select]

Re: How could I simplify this redundant-column-removing code?
by kcott (Archbishop) on Jun 17, 2015 at 18:55 UTC

G'day rubystallion,

Welcome to the Monastery.

The approach I took was to read the spec, grab the DATA and write the code. I didn't spend a lot of time looking at your code initially; although, I have commented on it further down in my post. Here's what I came up with:

#!/usr/bin/env perl -l

use strict;
use warnings;
no warnings 'uninitialized';

my @key_value_pairs;

# Capture key-value pairs from original query strings
while (<DATA>) {
    chomp;
    push @key_value_pairs, { map { (split /=/)[0,1] } split /&/ };
}

# Remove common key-value pairs
KEY: for my $key (keys %{$key_value_pairs[0]}) {
    for my $i (1 .. $#key_value_pairs) {
        next KEY unless $key_value_pairs[0]{$key} eq $key_value_pairs[
+$i]{$key};
    }
    delete $key_value_pairs[$_]{$key} for 0 .. $#key_value_pairs;
}

# Recreate query strings without common key-value pairs
for my $kvp (@key_value_pairs) {
    print join '&', map { join '=', $_, $kvp->{$_} } sort keys %$kvp;
}

__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
[download]

Output:

b=1&c=1&d=2&f=3
b=2&c=3&d=2&f=4
b=2&c=5&d=1&f=5
[download]

From the comments embedded in the code, you can see three distinct steps: capture all the initial data; remove the common data; recreate the query strings with what's left.

As you indicated (i.e. "in my head it's very simple") this was fairly straightforward:

split on '&' and then on '='
only delete if all equality tests are TRUE
join with '=' and then with '&'

"Is there any way to make the code significantly simpler or make it easier for me to write something like this bug-free the first time?"

That's a little difficult to answer without knowing what you did on your first three attempts.

Perhaps a lack of an initial design?
Perhaps you had problems with poorly named variables? I certainly did! As soon as I saw your first runtime statement (my @F1;), I realised I was going to have to read more code to find out what F1 represented.
Did you somehow get caught up with "use v5.020;"? Although I tested my code under v5.22.0, I suspect it'll run on any Perl5 released this century.

A couple of notes on command switches: