Re: Re: parsing comments in newline-delimited files as lists

Replies are listed 'Best First'.
Re: Re: Re: parsing comments in newline-delimited files as lists by merlyn (Sage) on Dec 27, 2001 at 19:51 UTC
You pushed my "`$1` used outside the context of a conditional" button here. That'd be failed in a code review if I were running the show. And yes, after I stared at the code for a minute or so, I can see that the assertion from the previous line ensures that there's always a match. But in that case, why not make the match, the match! `while (<DATA>) { chomp; s/#.$//; next unless /([^\s]+)/; push(@enabled_lines, $1); }` [download] There: it's now clear to me that we can't get to the push unless the match succeeds. I'd let this stand in a code review, but if I was looking for further optimization, I'd just keep pressing forward for more clarity: `while (<DATA>) { chomp; s/#.$//; push(@enabled_lines, $1) if /([^\s]+)/; }` [download] Nicer. Tighter. Dare I say, "faster" as well? But I see some equivalances that are down in the "nice" category (first was "must", second was "want", now "nice"): `while (<DATA>) { chomp; s/#.*$//; push @enabled_lines, $1 if /(\S+)/; }` [download] There. Clean, maintainable, pretty. I don't know if this does what the original poster wanted, but I didn't change the meaning at all from the node to which I'm replying. -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
Re: Re: Re: Re: parsing comments in newline-delimited files as lists by Anonymous Monk on Dec 28, 2001 at 01:21 UTC
Without taking this code twisting too far, I think this is even nicer: `while (<DATA>) { s/\s#.//; push @enabled_lines, /\s*(.+)/; }` [download]	[reply] [d/l]
Re(3): parsing comments in newline-delimited files as lists by dmmiller2k (Chaplain) on Dec 27, 2001 at 19:37 UTC
True, comments can be eliminated completely (along with any preceding whitespace), to wit: `s/\s#.$//; # first remove any comments` [download] But the expression, `/([^\s]+)/;` [download] is incorrect. Even the shorter equivalent, `/(\S+)/;` [download] is incorrect: if the line contains more than one word, this will only match the first one; you are explicitly disallowing embedded whitespace. We need to match from the first non-whitespace character to the last non-whitespace character and should include all intervening characters (including embedded whitespace). Perhaps your point regarding the final `if` has merit, but assuming we are dealing with files of up to several thousand lines (not, say millions), the performance hit should be nearly negligible. dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply] [d/l] [select]
Re: Re(3): parsing comments in newline-delimited files as lists by innerfire (Novice) on Dec 28, 2001 at 00:00 UTC
You don't need to remove whitespace preceding the comment, because it will be removed from the final result by the regex matching only non-whitespace characters. I originally had the first regex the way you have it. is incorrect: if the line contains more than one word, this will only match the first one; you are explicitly disallowing embedded whitespace. We need to match from the first non-whitespace character to the last non-whitespace character and should include all intervening characters (including embedded whitespace). No, we don't. Re-read the original post: we need to match URL-encoded URLs. Encoded URLs have no whitespace characters. My code is correct. http://www.nodewarrior.org/chris/	[reply]
Re(5): parsing comments in newline-delimited files as lists by dmmiller2k (Chaplain) on Dec 28, 2001 at 00:34 UTC
Your first point is true, I did change it a bit to more generally fit the semantics of "files that typically have comments" (e.g., programs). That may have been unnecessarily trying to generalize it. As for your second point, you're absolutely right, I did not go back and re-read the original post, mea culpa. However, in data-driven situations (aren't they all?), I find it unnecessarily wasteful NOT to code defensively. Perhaps today you happen to have URL-encoded URLs in a file. Perhaps tomorrow we'll discover we can save some time reusing this code with a file containing, say, first and last names (or something)? You're coupling the code way too closely to the (presumed) format of the input data for my comfort. I would always prefer `/^\s(.+)\s$/` over `/(\S+)/` (since the former matches things that the latter matches, but not vice versa). And, while I've got your attention, let me point out merlyn's post, regarding the use of $1 outside the context of a conditional. In fact, merlyn pretty much nailed it; anything further on this is probably beating a dead horse.	[reply] [d/l] [select]