Based on this part of your description:

I first get a list of unique things I'm interested in, similar to a product id(list contains about 7000 unique items). Then, I iterate the big log file one time using regex to find lines I'm interested in for gathering additional data about each product id. I write those lines out to a few new (smaller) files. Then, I loop through the product ID list one time, and execute several different grep commands on the new smaller files I created.

I can't really tell: (a) how many different primary input files you have, or (b) how many times you read each primary input file from beginning to end. If you are reading a really big file many many times in order to get matches on some number of different patterns, then you might be able to speed things up by doing a slightly more complicated set of matches on a single pass over the data.

RonW gave you a really useful suggestion: PerlIO::gzip -- I second that. Read the gzip data directly into your perl script so you can use regex matches on each (uncompressed) line, because (1) Perl gives you a lot more power in matching things efficiently and flexibly, (2) you can use the matches directly in your script to build useful data structures, and (3) you save some overhead time by not launching sub-shells to run unix commands.


In reply to Re: Best way to search large files in Perl by graff
in thread Best way to search large files in Perl by ccmadd

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.