A perverse use of grep

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: A perverse use of grep by dragonchild (Archbishop) on Jun 15, 2005 at 03:00 UTC
You were right in the code smell, but wrong in your fix. It's poor style because someone's trying to be cool and do everything on one line. Instead, the code should really be something like: `my @lines; { local $_; while ( <$in> ) { # Remove leading and trailing whitespace s/^\s+//; s/\s+$//; # Remove comments from the first # to the end of the line s/#.//; push @lines, $_ if length $_; } }` [download] Are you paying for your whitespace or something?? Update:* Fixed bug as per brian_d_foy's comment. Update: Changed per demerphq's comment. Update: Added bare-block to localize $_ per itub's comment. I don't think this is completely necessary, but it's good to be anal. In general, if you think something isn't in Perl, try it out, because it usually is. :-) "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"	[reply] [d/l]
Re^2: A perverse use of grep by brian_d_foy (Abbot) on Jun 15, 2005 at 03:05 UTC
Your code reproduces the bug of ignoring the line that is the bare '0'. You don't want to test the value of the line, but its length. -- brian d foy <brian@stonehenge.com>	[reply]
Re^2: A perverse use of grep by eyepopslikeamosquito (Archbishop) on Jun 15, 2005 at 03:16 UTC
Well, it is not "my" code; I inherited it and have to maintain it. Despite my dubious past as a golfer, I never play golf on production code. ;-) I like your approach, BTW.	[reply]
Re^2: A perverse use of grep by demerphq (Chancellor) on Jun 15, 2005 at 09:44 UTC
IMO the expanded style is bad form: `while ( defined( $_ = <$in> ) )` [download] should be written `while (<$in>) {` [download] --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re^3: A perverse use of grep by itub (Priest) on Jun 15, 2005 at 17:20 UTC
I would also recommend localizing `$_`, because `while` doesn't do it automatically: `sub whatever { local $_; while(<$in>) { # do stuff } }` [download] Even if the loop is not in a sub, localizing $_ can prevent nasty bugs if you decided to wrap the loop in a sub later.	[reply] [d/l] [select]
Re^3: A perverse use of grep by dragonchild (Archbishop) on Jun 15, 2005 at 12:01 UTC
You're right ... I originally had `while (defined( my $line = <$in> ))` and never got rid of the expanded form when I went back to $_. Updated. In general, if you think something isn't in Perl, try it out, because it usually is. :-) "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"	[reply] [d/l]
modifying $_ in map (was Re: A perverse use of grep) by merlyn (Sage) on Jun 15, 2005 at 10:35 UTC
`my @lines = map { s/^\s+//; s/#.//; s/\s+$//; length($_) ? $_ : () } +<$in>;` [download] One style rule I strongly recommend is: Never modify $_ in a map.* If you think of map as transforming one list to another, modifying $_ makes it "unclean", because it also edits the input list! This can lead to subtle bugs, and is best avoided always, rather than permitted occasionally, such as in this code. So, I'd rewrite that trivially as: `my @lines = map { local $_ = $_; s/^\s+//; s/#.*//; s/\s+$//; length($ +_) ? $_ : () } <$in>;` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l] [select]
Re: modifying $_ in map (was Re: A perverse use of grep) by johnnywang (Priest) on Jun 15, 2005 at 17:29 UTC
funny that I just read that section of "Effective Perl" last night (while sitting on my toilet:-). To summarize: use grep to select, use map to transform, in either case, don't change $_. It's interesting to note that the OP is a combination of select and transform.	[reply]
Re: A perverse use of grep by Errto (Vicar) on Jun 15, 2005 at 02:55 UTC
I don't so much object to the destructive grep as to the fact that it's using a substitution for what should be a negated pattern match: `my @lines = grep { s/^\s+//; s/\s+$//; length > 0 && !/^#/ } <$in>;` [download] But this is all aesthetics anyway,	[reply] [d/l]
Re^2: A perverse use of grep by brian_d_foy (Abbot) on Jun 15, 2005 at 03:04 UTC
Your code fails to strip comments off lines it will keep, though. :) `keep this part # part strip this part` [download] Update: That wasn't an intended feature. I guess I'm just too used to doing that since it's so easy to handle. :) -- brian d foy <brian@stonehenge.com>	[reply] [d/l]
Re^3: A perverse use of grep by Errto (Vicar) on Jun 15, 2005 at 03:13 UTC
True, but the "spec" for the OP didn't say that. It just said to ignore lines starting with # :)	[reply]
Re^3: A perverse use of grep by eyepopslikeamosquito (Archbishop) on Jun 15, 2005 at 03:26 UTC
Yes, Errto is right, the super simple config file format specified for this project only supports comment lines where the first non-white space char is `#`.	[reply] [d/l]
Re: A perverse use of grep by brian_d_foy (Abbot) on Jun 15, 2005 at 03:01 UTC
I don't get to excited about using fewer operators. I let map() work on the data and grep() pick out which elements it wants. `my @lines = grep length(), map { s/#.//; s/^\s+\|\s+$//g; $_ } <$in>;` [download] Update* japhy's right that that s/// with the alternation is slow. It's even in the perlfaq, and I know that because I updated that answer recently. That's what I get for golfing the line to not break with the code formatter thinger. :) -- brian d foy <brian@stonehenge.com>	[reply] [d/l]
Re^2: A perverse use of grep by japhy (Canon) on Jun 15, 2005 at 04:37 UTC
Don't use `s/^\s+\|\s+$//g`. It tries BOTH branches at EVERY chunk of whitespace. It's awfully slow compared to `s/^\s+//, s/\s+$//`. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^3: A perverse use of grep by Elijah (Hermit) on Jun 15, 2005 at 05:29 UTC
What is actually slower? Checking each whitespace on a line for both conditions or parsing the entire line a third time? I guess it depends on how long the line is compared to how many whitespaces there are. If you had a standard text file where each line started at the margin and ended with a CR/LF or just LF I would say that parsing that line a third time would be slower then checking each whitespace against both conditions but this is just off the hip. I have no benchmarks to prove it and am really just asking in the first place. Edit: OK well it appears that japhy's suggestion is considerably faster. I would not have thought that it would make that much of a difference but it does appear to be considerable. Time taken on brian 42 wallclock secs (41.70 usr 0.23 sys + 0.00 cusr 0.00 csys = 41.93 CPU) seconds Time taken on japhy 17 wallclock secs (16.52 usr 0.21 sys + 0.00 cusr 0.00 csys = 16.73 CPU) seconds Here is the code I used to benchmark both these methods. #!/usr/bin/perl -w use strict; use Benchmark; my $file = shift; my ($start, $end); $start = new Benchmark; brian($file); $end = new Benchmark; calc($start, $end, 'brian'); $start = new Benchmark; japhy($file); $end = new Benchmark; calc($start, $end, 'japhy'); sub calc { my ($start, $end, $test) = @_; my $diff = timediff($end, $start); print "Time taken on ", $test, " ",timestr($diff, 'all'), " seconds +\n"; } sub brian { my $file = shift; open(my $in, '<', $file) or die "error: open $file: $!"; for (1..1000) { seek($in, 0, 0); my @lines = grep length(), map { s/#.//; s/^\s+\|\s+$//g; $_ } < +$in>; } close($in); } sub japhy { my $file = shift; open(my $in, '<', $file) or die "error: open $file: $!"; for (1..1000) { seek($in, 0, 0); my @lines = grep length(), map { s/#.//; s/^\s+//g; s/\s$//g; $ +_ } <$in>; } close($in); } [download] www.perlskripts.com	[reply] [d/l]
Re: A perverse use of grep by wfsp (Abbot) on Jun 15, 2005 at 09:04 UTC
How would you do it? On 8 lines instead of 1 :-( `while (<$f>){ chomp; s/^\s+//; s/\s+$//; next unless length; next if /^#/; push @lines, $_; }` [download]	[reply] [d/l]
Re^2: A perverse use of grep by QM (Parson) on Jun 15, 2005 at 20:41 UTC
Don't do all that work, and then throw it away -- put the `next if /^#/;` first: `while (<$f>){ next if /^#/; chomp; s/^\s+//; s/\s+$//; next unless length; push @lines, $_; }` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re: A perverse use of grep (filtering) by tye (Sage) on Jun 15, 2005 at 20:05 UTC
Yes, Perl has grep and map but is missing this idiom, which is why I added it to Algorithm::Loops as Filter(). But I still use grep to remove selected lines, which makes the map w/ "local $_= $_;" version appealing. `use Algorithm::Loops qw( Filter ); my @lines = grep defined, Filter { s/^\s+//; if( /^#/ ) { undef $_; } else { s/\s+$//; } } <$in>;` [download] It isn't terribly hard to do "filtering" directly with grep or map, but it is terribly easy to only get filtering almost right with them and the difference is usually subtle but can still lead to real problems. A pragma to make the aliases that map and grep provide in $_ always be read-only would be something I would use... I wonder if Perl6 lets you specify "read-only aliases" vs. "read-write copies" vs. "read-write aliases" for map / grep like I know it will for function arguments. A concise alternative: `s/^\s+//, s/\s+$// for my @lines= grep ! /^\s*#/, <$in>;` [download] - tye	[reply] [d/l] [select]