Hi Marshall,
You're right, I didn't mention that because of what the FAQ says: "That might not matter to you, though". That attitude reminds me of the last time I read the Camel, I realized that for a large part of the book no mention is made of performance at all. I found that quite enlightening, along the lines of "until performance becomes an issue, don't worry about it" (a friendlier version of "premature optimization is the root of all evil"). I'm not advocating totally ignoring performance, but instead for worrying less when one can allow oneself to. Personally, I prefer the one-regex solution for its brevity, and so far I've usually been in a position where that performance difference doesn't matter. Of course, that can be a luxury, so thank you for mentioning the issue in case it does matter to the OP :-)
Regards, -- Hauke D
| [reply] |
Hi Hauke D,
Your comments are well taken. I certainly wouldn't say that there is
anything wrong at all with using the single regex for trimming the lines. A large amount of
Perl that I write involves processing text files which come from
various sources, sometimes cut-n-paste amalgamations generated by users. I can't think
of any example where the "line trim" code or "skip blank line" code, e.g. next if /^\s*$/;
played any significant performance role at all. Leaving a blank line in the file is so
common that I almost always add that fast regex (its fast because of the anchors) to
get rid of non-data lines.
Performance can be a very, very application specific thing. I wrote one program that
took 4 hours to run. I got complaints as to the run time. I asked, "how many times
per year do you run this program?". Answer: 4 times per year. I used algorithms that made it easy for me
to develop/debug and track down any questionable decision(s) and also to come as close as I could to guaranteeing that it produced a correct
result. Accuracy was my main goal. I never got an answer about why 4 hours mattered. New management ordered it to be recoded,
with a goal of much faster at the expense
of perhaps a 10% error rate. So the new version runs much faster, but makes more mistakes. Which is
better? Depends upon what you want.
Algorithms and data structures make a lot more difference than this simple "what do I
do with this single line" although I will admit to beating on a single critical important regex line for an entire week to squeeze some more performance out of it - if you do it a million times, it can matter a lot.
| [reply] [d/l] |