eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

I noticed this code today:

# Read a text file, stripping leading and trailing whitespace # and ignoring any lines starting with '#' my $file = shift; open(my $in, '<', $file) or die "error: open $file: $!"; my @lines = grep { s/^\s+//; s/#.*//; s/\s+$//; $_ } <$in>; close($in); for my $x (@lines) { print "x='$x'\n" }
It caught my eye because it seems stylistically dubious in that grep should be used to select from a list, while here it's being used to modify the list. It also contains a subtle bug in that lines consisting of a bald zero (0) will be erroneously discarded. Now, I could fix this bug with:
my @lines = grep { s/^\s+//; s/#.*//; s/\s+$//; length($_) } <$in>;
or:
my @lines = map { s/^\s+//; s/#.*//; s/\s+$//; length($_) ? $_ : () } +<$in>;
though I'm not especially happy with either of these. How would you do it?

Replies are listed 'Best First'.
Re: A perverse use of grep
by dragonchild (Archbishop) on Jun 15, 2005 at 03:00 UTC
    You were right in the code smell, but wrong in your fix. It's poor style because someone's trying to be cool and do everything on one line. Instead, the code should really be something like:
    my @lines; { local $_; while ( <$in> ) { # Remove leading and trailing whitespace s/^\s+//; s/\s+$//; # Remove comments from the first # to the end of the line s/#.*//; push @lines, $_ if length $_; } }
    Are you paying for your whitespace or something??

    Update: Fixed bug as per brian_d_foy's comment.

    Update: Changed per demerphq's comment.

    Update: Added bare-block to localize $_ per itub's comment. I don't think this is completely necessary, but it's good to be anal.


    • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
    • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"

      Your code reproduces the bug of ignoring the line that is the bare '0'. You don't want to test the value of the line, but its length.

      --
      brian d foy <brian@stonehenge.com>

      Well, it is not "my" code; I inherited it and have to maintain it. Despite my dubious past as a golfer, I never play golf on production code. ;-) I like your approach, BTW.

      IMO the expanded style is bad form:

      while ( defined( $_ = <$in> ) )

      should be written

      while (<$in>) {
      ---
      $world=~s/war/peace/g

        I would also recommend localizing $_, because while doesn't do it automatically:
        sub whatever { local $_; while(<$in>) { # do stuff } }

        Even if the loop is not in a sub, localizing $_ can prevent nasty bugs if you decided to wrap the loop in a sub later.

        You're right ... I originally had while (defined( my $line = <$in> )) and never got rid of the expanded form when I went back to $_. Updated.

        • In general, if you think something isn't in Perl, try it out, because it usually is. :-)
        • "What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?"
modifying $_ in map (was Re: A perverse use of grep)
by merlyn (Sage) on Jun 15, 2005 at 10:35 UTC
    my @lines = map { s/^\s+//; s/#.*//; s/\s+$//; length($_) ? $_ : () } +<$in>;
    One style rule I strongly recommend is:
    Never modify $_ in a map.
    If you think of map as transforming one list to another, modifying $_ makes it "unclean", because it also edits the input list! This can lead to subtle bugs, and is best avoided always, rather than permitted occasionally, such as in this code.

    So, I'd rewrite that trivially as:

    my @lines = map { local $_ = $_; s/^\s+//; s/#.*//; s/\s+$//; length($ +_) ? $_ : () } <$in>;

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      funny that I just read that section of "Effective Perl" last night (while sitting on my toilet:-). To summarize: use grep to select, use map to transform, in either case, don't change $_. It's interesting to note that the OP is a combination of select and transform.
Re: A perverse use of grep
by Errto (Vicar) on Jun 15, 2005 at 02:55 UTC
    I don't so much object to the destructive grep as to the fact that it's using a substitution for what should be a negated pattern match:
    my @lines = grep { s/^\s+//; s/\s+$//; length > 0 && !/^#/ } <$in>;
    But this is all aesthetics anyway,

      Your code fails to strip comments off lines it will keep, though. :)

      keep this part # part strip this part

      Update: That wasn't an intended feature. I guess I'm just too used to doing that since it's so easy to handle. :)

      --
      brian d foy <brian@stonehenge.com>
        True, but the "spec" for the OP didn't say that. It just said to ignore lines starting with # :)
        Yes, Errto is right, the super simple config file format specified for this project only supports comment lines where the first non-white space char is #.
Re: A perverse use of grep
by brian_d_foy (Abbot) on Jun 15, 2005 at 03:01 UTC

    I don't get to excited about using fewer operators. I let map() work on the data and grep() pick out which elements it wants.

    my @lines = grep length(), map { s/#.*//; s/^\s+|\s+$//g; $_ } <$in>;

    Update japhy's right that that s/// with the alternation is slow. It's even in the perlfaq, and I know that because I updated that answer recently. That's what I get for golfing the line to not break with the code formatter thinger. :)

    --
    brian d foy <brian@stonehenge.com>
      Don't use s/^\s+|\s+$//g. It tries BOTH branches at EVERY chunk of whitespace. It's awfully slow compared to s/^\s+//, s/\s+$//.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
        What is actually slower? Checking each whitespace on a line for both conditions or parsing the entire line a third time? I guess it depends on how long the line is compared to how many whitespaces there are. If you had a standard text file where each line started at the margin and ended with a CR/LF or just LF I would say that parsing that line a third time would be slower then checking each whitespace against both conditions but this is just off the hip. I have no benchmarks to prove it and am really just asking in the first place.

        Edit:
        OK well it appears that japhy's suggestion is considerably faster. I would not have thought that it would make that much of a difference but it does appear to be considerable.
        Time taken on brian 42 wallclock secs (41.70 usr 0.23 sys + 0.00 cusr 0.00 csys = 41.93 CPU) seconds
        Time taken on japhy 17 wallclock secs (16.52 usr 0.21 sys + 0.00 cusr 0.00 csys = 16.73 CPU) seconds

        Here is the code I used to benchmark both these methods.
        #!/usr/bin/perl -w use strict; use Benchmark; my $file = shift; my ($start, $end); $start = new Benchmark; brian($file); $end = new Benchmark; calc($start, $end, 'brian'); $start = new Benchmark; japhy($file); $end = new Benchmark; calc($start, $end, 'japhy'); sub calc { my ($start, $end, $test) = @_; my $diff = timediff($end, $start); print "Time taken on ", $test, " ",timestr($diff, 'all'), " seconds +\n"; } sub brian { my $file = shift; open(my $in, '<', $file) or die "error: open $file: $!"; for (1..1000) { seek($in, 0, 0); my @lines = grep length(), map { s/#.*//; s/^\s+|\s+$//g; $_ } < +$in>; } close($in); } sub japhy { my $file = shift; open(my $in, '<', $file) or die "error: open $file: $!"; for (1..1000) { seek($in, 0, 0); my @lines = grep length(), map { s/#.*//; s/^\s+//g; s/\s$//g; $ +_ } <$in>; } close($in); }
Re: A perverse use of grep
by wfsp (Abbot) on Jun 15, 2005 at 09:04 UTC
    How would you do it?

    On 8 lines instead of 1 :-(

    while (<$f>){ chomp; s/^\s+//; s/\s+$//; next unless length; next if /^#/; push @lines, $_; }

      Don't do all that work, and then throw it away -- put the next if /^#/; first:
      while (<$f>){ next if /^#/; chomp; s/^\s+//; s/\s+$//; next unless length; push @lines, $_; }

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: A perverse use of grep (filtering)
by tye (Sage) on Jun 15, 2005 at 20:05 UTC

    Yes, Perl has grep and map but is missing this idiom, which is why I added it to Algorithm::Loops as Filter(). But I still use grep to remove selected lines, which makes the map w/ "local $_= $_;" version appealing.

    use Algorithm::Loops qw( Filter ); my @lines = grep defined, Filter { s/^\s+//; if( /^#/ ) { undef $_; } else { s/\s+$//; } } <$in>;

    It isn't terribly hard to do "filtering" directly with grep or map, but it is terribly easy to only get filtering almost right with them and the difference is usually subtle but can still lead to real problems.

    A pragma to make the aliases that map and grep provide in $_ always be read-only would be something I would use... I wonder if Perl6 lets you specify "read-only aliases" vs. "read-write copies" vs. "read-write aliases" for map / grep like I know it will for function arguments.

    A concise alternative:

    s/^\s+//, s/\s+$// for my @lines= grep ! /^\s*#/, <$in>;

    - tye