Re: parsing comments in newline-delimited files as lists

Why make your life more difficult than it needs to be? I prefer the straightforward approach: First deal with comments by making them whitespace, then deal with extraneous whitespace:

while (<DATA>) {
  chomp;
  s/#.*$/ /;  # first replace any comments with space

  # skip this line if it consists only of whitespace (or nothing)
  next if /^\s*$/;

  # at this point we KNOW there is at least some non-blank on the line
  # which is also non-comment.
  # Just grab everything sans leading/trailing whitespace
  my ($non_blank) = /^\s*(.+)\s*$/;
  push @enabled_lines, $non_blank if $non_blank;  # 'if' may not be ne
+cessary
}
[download]

Of course this does not account for escaped comment characters, since I thought it would obscure the simplicity (I leave it as an exercise for the reader).

dmm

You can give a man a fish and feed him for a day ...
Or, you can teach him to fish and feed him for a lifetime

Comment on Re: parsing comments in newline-delimited files as lists Download Code

Replies are listed 'Best First'.
Re: Re: parsing comments in newline-delimited files as lists by innerfire (Novice) on Dec 27, 2001 at 10:54 UTC
You don't need to turn comments into whitespace, and you can get more exact. I've already built a reputation on pedantry--just ask footpad... `while (<DATA>) { chomp; s/#.$//; next if /^\s$/; /([^\s]+)/; push(@enabled_lines, $1); # the if was not necessary :) }` [download] Of course this does not account for escaped comment characters, since I thought it would obscure the simplicity (I leave it as an exercise for the reader). The original poster specified that the URLs are URL-encoded, in which '#' appears as '%23', so there's nothing more to do. http://www.nodewarrior.org/chris/	[reply] [d/l]
Re: Re: Re: parsing comments in newline-delimited files as lists by merlyn (Sage) on Dec 27, 2001 at 19:51 UTC
You pushed my "`$1` used outside the context of a conditional" button here. That'd be failed in a code review if I were running the show. And yes, after I stared at the code for a minute or so, I can see that the assertion from the previous line ensures that there's always a match. But in that case, why not make the match, the match! `while (<DATA>) { chomp; s/#.$//; next unless /([^\s]+)/; push(@enabled_lines, $1); }` [download] There: it's now clear to me that we can't get to the push unless the match succeeds. I'd let this stand in a code review, but if I was looking for further optimization, I'd just keep pressing forward for more clarity: `while (<DATA>) { chomp; s/#.$//; push(@enabled_lines, $1) if /([^\s]+)/; }` [download] Nicer. Tighter. Dare I say, "faster" as well? But I see some equivalances that are down in the "nice" category (first was "must", second was "want", now "nice"): `while (<DATA>) { chomp; s/#.*$//; push @enabled_lines, $1 if /(\S+)/; }` [download] There. Clean, maintainable, pretty. I don't know if this does what the original poster wanted, but I didn't change the meaning at all from the node to which I'm replying. -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
Re: Re: Re: Re: parsing comments in newline-delimited files as lists by Anonymous Monk on Dec 28, 2001 at 01:21 UTC
Without taking this code twisting too far, I think this is even nicer: `while (<DATA>) { s/\s#.//; push @enabled_lines, /\s*(.+)/; }` [download]	[reply] [d/l]
Re(3): parsing comments in newline-delimited files as lists by dmmiller2k (Chaplain) on Dec 27, 2001 at 19:37 UTC
True, comments can be eliminated completely (along with any preceding whitespace), to wit: `s/\s#.$//; # first remove any comments` [download] But the expression, `/([^\s]+)/;` [download] is incorrect. Even the shorter equivalent, `/(\S+)/;` [download] is incorrect: if the line contains more than one word, this will only match the first one; you are explicitly disallowing embedded whitespace. We need to match from the first non-whitespace character to the last non-whitespace character and should include all intervening characters (including embedded whitespace). Perhaps your point regarding the final `if` has merit, but assuming we are dealing with files of up to several thousand lines (not, say millions), the performance hit should be nearly negligible. dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply] [d/l] [select]
Re: Re(3): parsing comments in newline-delimited files as lists by innerfire (Novice) on Dec 28, 2001 at 00:00 UTC
You don't need to remove whitespace preceding the comment, because it will be removed from the final result by the regex matching only non-whitespace characters. I originally had the first regex the way you have it. is incorrect: if the line contains more than one word, this will only match the first one; you are explicitly disallowing embedded whitespace. We need to match from the first non-whitespace character to the last non-whitespace character and should include all intervening characters (including embedded whitespace). No, we don't. Re-read the original post: we need to match URL-encoded URLs. Encoded URLs have no whitespace characters. My code is correct. http://www.nodewarrior.org/chris/	[reply]
Re(5): parsing comments in newline-delimited files as lists by dmmiller2k (Chaplain) on Dec 28, 2001 at 00:34 UTC