The reason for the multiple greps.

The reason is irrelevant. The point is only that you are looping through data multiple times. If you want it to be fast, do your work on one pass through the data if it is at all possible. In your case, it is possible. You could even roll at least some of the work into your following foreach loop. In your process_it() sub you loop through the data a minimum of four(!!!) times. Five sometimes. Lots of loops and speed don't mix.

On the whole conditional against $id, it really is just style. If I am only doing 1 thing based on truth, I inline it, if not I use the braces.

Well, some of it is a matter of style. I threw you off by using unless in that manner. The code

if (!$id || $id =~ /^(\s+|)$/)
isn't particularly good for a couple reasons. First, the !$id is not expressing what you are trying to say. Granted, you probably aren't going to have a message ID which evaluates to '0' but if you did... oops. Also, it probably isn't optimizing anything. If you have spaces, you have to check both clauses. Using if (/\S/) is much clearer and might well be a little more efficient in the long run too.

oh yeah.. in terms of tossing the data away. If I dont I run out of memory. I need to only process one file, extract the relevant data, then I need to clean my %data out or I simply dont have any memory left.

Declare %data with my just inside your foreach $file loop. Don't undef it one key at a time.

It amazing how your paradigm shifts along with the size of your data set :P

Oh, I don't know... I've dealt with biggish datasets in the tens and hundreds of gigs. It's true there are some unique logistical considerations and some shortcuts you can't take but the basics of writing efficient code stay the same. The real "paradigm shift" comes when you have to break your problem down such that it can be distributed to multiple machines.

-sauoq
"My two cents aren't worth a dime.";

In reply to Re: Re: Re: The need for speed by sauoq
in thread The need for speed by l2kashe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.