I assume this belongs in "Meditations," deep though it ain't. :) Hope the forced line-breaks at about 80 characters don't make a mess on folks' screens...

This is intended as a (useful, I hope) pep-talk to others like me--newcomers to Perl.
The punchline: when the bishops, popes, and minor and major deities here stress
"there's more than one way to do it," and when they post a surprising range of
solutions when answering someone's questions, take the hint: try different solutions,
yourself. Maybe lots of them. In time the payoff could well be markedly improved performance.

Case in point: yesterday I needed a script to extract only certain lines from a
plaintext file exported from a database (comma-separated values). If field 4
contained certain text, the record in question should be printed; otherwise,
skip the record and read the next one.

My first thought was to use split to create an array of each record's fields,
then compare the contents of the array's fourth element with the arg(s)
the user had provided on the command line (the script could test for
all of several strings the user might provide as filters.) All lines would
have to be read; there's no predicting how (or if) these databases have been
sorted before they're saved to CSV format.

It all seemed very straightforward. The script, reading a 13,576-line file,
produced the desired results in about 3.5 seconds. I wasn't about to give myself
a Hero Of The People medal for that, but I could live with it. I figured I
was done. No, wait . . .

It occurred to me: what if I were to get the contents of field 4 by using
a regular expression, instead? The RE was a bit unpleasant-looking.
This sort of thing: /^[^,]+,[^,]*,   etc. etc.

I looked at it a while and thought: Ridiculous. Long-ish regular expression--
big performance hit; "split" must be faster.

W r o n g.  The routine using the RE ran in about 3/4 of a second--roughly 4.5
times faster. Surprised the hell out of me. And here I'd almost dumped the
second approach as "obviously" less inefficient and "therefore" slower.

So my lesson for the day, reduced to one unscientific-sounding bromide,
was: Assume nothing; try stuff. Nirvana awaits (or, if not that, then
possible improvements in execution speed :).


In reply to From one beginner to others . . . by greenhorn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.