joining lines efficiency?

perlperlperl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: joining lines efficiency? by CountZero (Bishop) on Jun 10, 2013 at 06:17 UTC
Matching C-style comments is very tricky: How do I use a regular expression to strip C style comments from a file? Or go easy on yourself and use File::Comments (but it says about C: "Implemented with regular expressions, only works for easy cases until real parsers are employed"). CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re: joining lines efficiency? (slurp) by Anonymous Monk on Jun 10, 2013 at 01:47 UTC
Well, its more efficient to slurp the whole file into one string instead of splitting the file into lines then joining the lines, one way is File::Slurp, the other way is `my $wholefile = do { local $/; scalar readline $filehandle };`	[reply]
Re: joining lines efficiency? by hbm (Hermit) on Jun 10, 2013 at 02:32 UTC
Another option is the flip-flop operator: Read the file line-by-line, and don't print if between an opening expression and closing expression. The general idea is `print unless /<expr1>/ .. /<expr2>/`, but here the slash and star need to be escaped. `while(<$fh>){ print unless m{/[]} .. m{[]/}; }` [download]	[reply] [d/l] [select]
Re^2: joining lines efficiency? by hdb (Monsignor) on Jun 10, 2013 at 05:36 UTC
While your proposal needs less memory, it can go wrong as well. Should the comment start or end on a line with code outside of the comments, the whole line and the code will be removed. The regex also needs to be more sophisticated to ignore "/*" ie should the characters be quoted.	[reply] [d/l]
Re^3: joining lines efficiency? by AnomalousMonk (Archbishop) on Jun 10, 2013 at 16:16 UTC
The regex also needs to ... ignore "/" ie should the characters be quoted.* It should also handle stuff like this, which compiles and runs just fine: `#include <stdio.h> #include <assert.h> void main (int argc, char ** argv) { int x = 4; int y = 2; int * p = &y; assert(x/p == 12345 / p points to y */); printf("everything looks just fine \n"); }` [download]	[reply] [d/l]
Re: joining lines efficiency? by JockoHelios (Scribe) on Jun 10, 2013 at 02:34 UTC
I've been working with large text data files recently. I do most of my test-bed Perl work on a 10-year-old WinXP PC with 1 GB RAM. The largest single file I've processed in one gulp was over 85 MB; the old XP handled it. My scripts pull it all in with code like @TextArray = <TEXTFILE>; The RegEx substitution you mention should work fine if it can handle a single string that long. I've never tried it that way, so I can't vouch for it. From what I've been doing, I'd suggest reading in the whole file, as you mentioned, and as indicated above. Then process each line in a foreach loop, copying lines into a separate array if they aren't the multi-line comments you don't want. You would use a variable, perhaps $IsCommentLine, to start and stop the ommission of lines. Set the variable to true when the "/" is found, set the variable to false when the "/" is found. When the variable is false, copy the line into the separate array. When the variable is true, don't copy the line. Everything between the delimiters gets omitted, because the variable is true until the "*/" is found. Like your RegEx idea would do, but line-by-line instead. Dyslexics Untie !!!	[reply]