Re: A perverse use of grep
by dragonchild (Archbishop) on Jun 15, 2005 at 03:00 UTC
|
You were right in the code smell, but wrong in your fix. It's poor style because someone's trying to be cool and do everything on one line. Instead, the code should really be something like:
my @lines;
{ local $_;
while ( <$in> )
{
# Remove leading and trailing whitespace
s/^\s+//; s/\s+$//;
# Remove comments from the first # to the end of the line
s/#.*//;
push @lines, $_ if length $_;
}
}
Are you paying for your whitespace or something??
Update: Fixed bug as per brian_d_foy's comment.
Update: Changed per demerphq's comment.
Update: Added bare-block to localize $_ per itub's comment. I don't think this is completely necessary, but it's good to be anal.
- In general, if you think something isn't in Perl, try it out, because it usually is. :-)
- "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
| [reply] [d/l] |
|
|
| [reply] |
|
|
Well, it is not "my" code; I inherited it and have
to maintain it. Despite my dubious past as a golfer,
I never play golf on production code. ;-)
I like your approach, BTW.
| [reply] |
|
|
while ( defined( $_ = <$in> ) )
should be written
while (<$in>) {
---
$world=~s/war/peace/g
| [reply] [d/l] [select] |
|
|
I would also recommend localizing $_, because while doesn't do it automatically:
sub whatever {
local $_;
while(<$in>) {
# do stuff
}
}
Even if the loop is not in a sub, localizing $_ can prevent nasty bugs if you decided to wrap the loop in a sub later. | [reply] [d/l] [select] |
|
|
You're right ... I originally had while (defined( my $line = <$in> )) and never got rid of the expanded form when I went back to $_. Updated.
- In general, if you think something isn't in Perl, try it out, because it usually is. :-)
- "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
| [reply] [d/l] |
modifying $_ in map (was Re: A perverse use of grep)
by merlyn (Sage) on Jun 15, 2005 at 10:35 UTC
|
my @lines = map { s/^\s+//; s/#.*//; s/\s+$//; length($_) ? $_ : () }
+<$in>;
One style rule I strongly recommend is:
Never modify $_ in a map.
If you think of map as transforming one list to another, modifying $_ makes it "unclean", because it also edits the input list! This can lead to subtle bugs, and is best avoided always, rather than permitted occasionally, such as in this code.
So, I'd rewrite that trivially as:
my @lines = map { local $_ = $_; s/^\s+//; s/#.*//; s/\s+$//; length($
+_) ? $_ : () } <$in>;
| [reply] [d/l] [select] |
|
|
funny that I just read that section of "Effective Perl" last night (while sitting on my toilet:-). To summarize: use grep to select, use map to transform, in either case, don't change $_. It's interesting to note that the OP is a combination of select and transform.
| [reply] |
Re: A perverse use of grep
by Errto (Vicar) on Jun 15, 2005 at 02:55 UTC
|
I don't so much object to the destructive grep as to the fact that it's using a substitution for what should be a negated pattern match:
my @lines = grep { s/^\s+//; s/\s+$//; length > 0 && !/^#/ } <$in>;
But this is all aesthetics anyway, | [reply] [d/l] |
|
|
keep this part # part strip this part
Update: That wasn't an intended feature. I guess I'm just too used to doing that since it's so easy to handle. :)
--
brian d foy <brian@stonehenge.com>
| [reply] [d/l] |
|
|
True, but the "spec" for the OP didn't say that. It just said to ignore lines starting with # :)
| [reply] |
|
|
Yes, Errto is right, the super simple config file format
specified for this project only supports comment lines
where the first non-white space char is #.
| [reply] [d/l] |
Re: A perverse use of grep
by brian_d_foy (Abbot) on Jun 15, 2005 at 03:01 UTC
|
I don't get to excited about using fewer operators. I let map() work on the data and grep() pick out which elements it wants.
my @lines = grep length(), map { s/#.*//; s/^\s+|\s+$//g; $_ } <$in>;
Update japhy's right that that s/// with the alternation is slow. It's even in the perlfaq, and I know that because I updated that answer recently. That's what I get for golfing the line to not break with the code formatter thinger. :)
--
brian d foy <brian@stonehenge.com>
| [reply] [d/l] |
|
|
Don't use s/^\s+|\s+$//g. It tries BOTH branches at EVERY chunk of whitespace. It's awfully slow compared to s/^\s+//, s/\s+$//.
| [reply] [d/l] [select] |
|
|
What is actually slower? Checking each whitespace on a line for both conditions or parsing the entire line a third time? I guess it depends on how long the line is compared to how many whitespaces there are. If you had a standard text file where each line started at the margin and ended with a CR/LF or just LF I would say that parsing that line a third time would be slower then checking each whitespace against both conditions but this is just off the hip. I have no benchmarks to prove it and am really just asking in the first place.
Edit:
OK well it appears that japhy's suggestion is considerably faster. I would not have thought that it would make that much of a difference but it does appear to be considerable.
Time taken on brian 42 wallclock secs (41.70 usr 0.23 sys + 0.00 cusr 0.00 csys = 41.93 CPU) seconds
Time taken on japhy 17 wallclock secs (16.52 usr 0.21 sys + 0.00 cusr 0.00 csys = 16.73 CPU) seconds
Here is the code I used to benchmark both these methods.
#!/usr/bin/perl -w
use strict;
use Benchmark;
my $file = shift;
my ($start, $end);
$start = new Benchmark;
brian($file);
$end = new Benchmark;
calc($start, $end, 'brian');
$start = new Benchmark;
japhy($file);
$end = new Benchmark;
calc($start, $end, 'japhy');
sub calc {
my ($start, $end, $test) = @_;
my $diff = timediff($end, $start);
print "Time taken on ", $test, " ",timestr($diff, 'all'), " seconds
+\n";
}
sub brian {
my $file = shift;
open(my $in, '<', $file) or die "error: open $file: $!";
for (1..1000) {
seek($in, 0, 0);
my @lines = grep length(), map { s/#.*//; s/^\s+|\s+$//g; $_ } <
+$in>;
}
close($in);
}
sub japhy {
my $file = shift;
open(my $in, '<', $file) or die "error: open $file: $!";
for (1..1000) {
seek($in, 0, 0);
my @lines = grep length(), map { s/#.*//; s/^\s+//g; s/\s$//g; $
+_ } <$in>;
}
close($in);
}
| [reply] [d/l] |
Re: A perverse use of grep
by wfsp (Abbot) on Jun 15, 2005 at 09:04 UTC
|
while (<$f>){
chomp;
s/^\s+//;
s/\s+$//;
next unless length;
next if /^#/;
push @lines, $_;
}
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
Re: A perverse use of grep (filtering)
by tye (Sage) on Jun 15, 2005 at 20:05 UTC
|
Yes, Perl has grep and map but is missing this idiom, which is why I added it to Algorithm::Loops as Filter(). But I still use grep to remove selected lines, which makes the map w/ "local $_= $_;" version appealing.
use Algorithm::Loops qw( Filter );
my @lines = grep defined, Filter {
s/^\s+//;
if( /^#/ ) {
undef $_;
} else {
s/\s+$//;
}
} <$in>;
It isn't terribly hard to do "filtering" directly with grep or map, but it is terribly easy to only get filtering almost right with them and the difference is usually subtle but can still lead to real problems.
A pragma to make the aliases that map and grep provide in $_ always be read-only would be something I would use... I wonder if Perl6 lets you specify "read-only aliases" vs. "read-write copies" vs. "read-write aliases" for map / grep like I know it will for function arguments.
A concise alternative:
s/^\s+//, s/\s+$//
for my @lines= grep ! /^\s*#/, <$in>;
| [reply] [d/l] [select] |