Re: How could I simplify this redundant-column-removing code?
by kennethk (Abbot) on Jun 17, 2015 at 16:09 UTC
|
A couple suggestions:
- If you are just incrementing by 1, use Foreach Loops instead of C-style loops -- fewer moving parts:
for my $i (0 .. @F-1) {
- Avoid punctuation variables if you can, unless this is your toy and you are super comfy with them. Rather than testing $., just test if @F1 is initialized:
@F1 = @F if !@F1;
Note that you shouldn't use logical compound assignment operators because they have scalar context.
- Contrasting the above, you should be using $_ in this case because $line is so highly localized.
while (<DATA>) {
chomp;
my @F = split '&';
- +@F1 is a no-op. Numification requires two arguments, so you'd need to write 0+@F1, but you don't even need to do that because logical operators like != also apply scalar context to their arguments.
- Your algorithm gets simpler and allows using a hash if you track which terms to delete rather than which ones to keep. I'm assuming that you don't have repeated keys.
So I might write that as:
#!/usr/bin/env perl -w
use v5.014;
my %seen;
my $count;
my @recs;
while (<DATA>) {
chomp;
my @F = split '&';
$count //= @F;
die "NF mismatch" if @F != $count;
$seen{$_}++ for @F;
push @recs, \@F;
}
for my $rec (@recs) {
say join "\t",
grep $seen{$_} != @recs, # Doesn't show up in every record
@$rec
;
}
__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
Re: How could I simplify this redundant-column-removing code?
by Athanasius (Archbishop) on Jun 17, 2015 at 15:47 UTC
|
Hello rubystallion,
Your approach looks about right to me. The only thing that concerns me is that in the second (non-nested) for loop you have to test whether each member of @F1 is defined for every element of every record. The following reduces the number of tests by deploying an array slice, which is calculated only once:
use 5.020; # includes strictures
use warnings;
my (@F1, @recs);
while (my $line = <DATA>)
{
chomp $line;
my @F = split '&', $line;
@F1 = @F if $. == 1;
die "NF mismatch" unless @F1 == @F;
push @recs, \@F;
for my $i (0 .. $#F)
{
next unless defined $F1[$i];
$F1[$i] = undef unless $F1[$i] eq $F[$i];
}
}
my @keep;
defined $F1[$_] || push @keep, $_ for (0 .. $#F1);
print join("\t", @$_[ @keep ]), "\n" for @recs;
__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
Output:
1:46 >perl 1276_SoPW.pl
b=1 c=1 d=2 f=3
b=2 c=3 d=2 f=4
b=2 c=5 d=1 f=5
1:46 >
Is there any way to ... make it easier for me to write something like this bug-free the first time?
If only! ;-)
Anyway, hope that helps,
| [reply] [d/l] [select] |
Re: How could I simplify this redundant-column-removing code?
by BrowserUk (Patriarch) on Jun 17, 2015 at 19:28 UTC
|
#! perl -sw
use strict;
my $pos = tell DATA;
my %tally; ++$tally{ $_ } for map split( '&' ), <DATA>;
my $lines = $.;
seek DATA, $pos, 0;
print join '&', grep{ $tally{ $_ } != $lines } split '&' while <DATA>
__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
Produces: C:\test>junk
b=1&c=1&d=2&f=3
b=2&c=3&d=2&f=4
b=2&c=5&d=1&f=5
For a real file, you wouldn't need the tell, just rewind the file with seek $fh, 0, 0; for the second pass.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
| [reply] [d/l] [select] |
Re: How could I simplify this redundant-column-removing code?
by kcott (Archbishop) on Jun 17, 2015 at 18:55 UTC
|
G'day rubystallion,
Welcome to the Monastery.
The approach I took was to read the spec, grab the DATA and write the code.
I didn't spend a lot of time looking at your code initially; although, I have commented on it further down in my post.
Here's what I came up with:
#!/usr/bin/env perl -l
use strict;
use warnings;
no warnings 'uninitialized';
my @key_value_pairs;
# Capture key-value pairs from original query strings
while (<DATA>) {
chomp;
push @key_value_pairs, { map { (split /=/)[0,1] } split /&/ };
}
# Remove common key-value pairs
KEY: for my $key (keys %{$key_value_pairs[0]}) {
for my $i (1 .. $#key_value_pairs) {
next KEY unless $key_value_pairs[0]{$key} eq $key_value_pairs[
+$i]{$key};
}
delete $key_value_pairs[$_]{$key} for 0 .. $#key_value_pairs;
}
# Recreate query strings without common key-value pairs
for my $kvp (@key_value_pairs) {
print join '&', map { join '=', $_, $kvp->{$_} } sort keys %$kvp;
}
__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
Output:
b=1&c=1&d=2&f=3
b=2&c=3&d=2&f=4
b=2&c=5&d=1&f=5
From the comments embedded in the code, you can see three distinct steps:
capture all the initial data; remove the common data; recreate the query strings with what's left.
As you indicated (i.e. "in my head it's very simple") this was fairly straightforward:
- split on '&' and then on '='
- only delete if all equality tests are TRUE
- join with '=' and then with '&'
"Is there any way to make the code significantly simpler or make it easier for me to write something like this bug-free the first time?"
That's a little difficult to answer without knowing what you did on your first three attempts.
-
Perhaps a lack of an initial design?
-
Perhaps you had problems with poorly named variables? I certainly did! As soon as I saw your first runtime statement (my @F1;), I realised I was going to have to read more code to find out what F1 represented.
-
Did you somehow get caught up with "use v5.020;"?
Although I tested my code under v5.22.0, I suspect it'll run on any Perl5 released this century.
A couple of notes on command switches:
-
By using the -l command switch, my print works like say (no need to add "\n").
-
The -w command switch is a poor choice.
The warnings pragma is better; the documentation explains why.
And, of course, if anything else in my code needs further explanation, just ask.
| [reply] [d/l] [select] |
|
|
Hi Ken,
Thanks for the suggestions. You're right, poorly named variables (or poor commenting) might have been the reason for one bug I had. Also good to know the advantages of the warnings pragma. I still needed a few minutes to get my head around your solution, but if I get more practise with nested commands and nested data structures this will hopefully become fairly straightforward for me, too.
| [reply] |
Re: How could I simplify this redundant-column-removing code?
by pme (Monsignor) on Jun 17, 2015 at 16:03 UTC
|
#!/usr/bin/perl -w
use strict;
my %F1;
my @recs;
while (my $line = <DATA>) {
chomp $line;
my @F = split '&', $line;
if ($. == 1) {
$F1{$_} = 0 for (@F);
}
die "NF mismatch" if keys %F1 != @F;
push @recs, \@F;
foreach (@F) {
$F1{$_}++ if exists $F1{$_};
}
}
for my $rec (@recs) {
for my $field (@$rec) {
print "$field\t" unless exists $F1{$field} and $F1{$field} ==
+ $.;
}
print "\n";
}
__DATA__
a=1&b=1&c=1&d=2&e=&f=3
a=1&b=2&c=3&d=2&e=&f=4
a=1&b=2&c=5&d=1&e=&f=5
Output:
b=1 c=1 d=2 f=3
b=2 c=3 d=2 f=4
b=2 c=5 d=1 f=5
| [reply] [d/l] [select] |
Re: How could I simplify this redundant-column-removing code?
by aaron_baugher (Curate) on Jun 17, 2015 at 18:07 UTC
|
This is kind of an interesting problem. If I were being handed this task, my first two questions would be:
- Are the fields always in the same order? (in query strings, that's usually not guaranteed)
- Does each key appear in every line?
For a real-world task, I'd assume both answers are 'no,' so I'd have to be prepared to handle missing keys (including keys not appearing in the first line), and keys out of order. Given all that, I'd store the keys and values of each line in an array of hashes (an array to maintain the order of the lines). I'd also have a hash for keeping track of all the keys, and another hash for tracking which keys have the same value throughout so they can be dropped from the output. (There may be a clever way to make one hash do both those things, but it didn't occur to me.) This is what I ended up with. It drops the key/value pairs that are always the same (a & e), regardless of order or whether a key is sometimes missing.
One note: I tell the inner split to always produce 2 fields, because otherwise it'll produce undef where there's no value, which messes up the nifty map-to-hash.
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
use Data::Printer;
my @l; # array of hashes, to hold the keys and values for each line
my %a; # hash to keep track of all keys
my %c; # hash to track whether a key changes value
# if a key is still in the hash when the loop finishes,
# that means it had the same value throughout and should
# be ignored
while(<DATA>){
chomp;
my %h = map { split /=/,$_,2 } split '&'; # split string into keys
+/values
push @l, \%h;
if( $. == 1 ){ # on the first line, load into %c
%c = %h;
} else { # on other lines, check values
for my $k (keys %h){
delete $c{$k} unless exists $c{$k} and $c{$k} eq $h{$k}; #
+ remove if different
$a{$k}=1; # keep track of key existing
}
}
}
# remove consistent keys from output
delete $a{$_} for keys %c;
for my $l (@l){
for my $k (sort keys %a){
printf "%-8s", $l->{$k} ? "$k=$l->{$k}" : ' ';
}
say '';
}
# I've mixed up the data a bit to reflect my open requirements.
# a & e should still be dropped from the output.
__DATA__
a=1&b=1&f=3&c=1&d=2&e=
a=1&b=2&c=3&e=&f=4
b=2&a=1&c=5&d=1&e=&f=5
Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.
| [reply] [d/l] |
Re: How could I simplify this redundant-column-removing code?
by rubystallion (Novice) on Jun 18, 2015 at 03:47 UTC
|
Wow, of all my "first posts" since I started using the internet in 2000, this must be the one with the largest number of helpful replies in a single day. Glad to have found such a lively community!
All replies had some useful suggestions. Just for reference: In bash I would have written something like this, which of course has horrible performance, but is very close to pseudocode and therefore hard to get wrong. Some of the solutions posted above are just as readable, but faster and more extensible. Thanks a lot everyone, I couldn't figure that out myself.
for i in {1..7};do if (( $(cut -d '&' -f $i query.txt|uniq|wc -l) != 1 ));then f="$f,$i";fi;done;cut -d '&' -f "${f#,}" query.txt
| [reply] [d/l] |