ActiveState has Devel::NYTProf in its ppm4 repository here: http://code.activestate.com/ppm/Devel-NYTProf/. Once installed, you would cd into the target script's directory and execute a one-liner that invokes your script: perl -d:NYTProf some_perl.pl input_file.txt. And after it completes, you can review the results by executing the following statement: nytprofhtml --open (while still in the same directory). You should get a browser window with more useful information than you can shake a stick at.

My optimized regex is going to help as an optimization of the exact regex you provided. But it's tricky to implement and maintain as your needs continue to evolve. A better solution would be to use threads, or to fork processes. BrowserUk already had some suggestions on how you might implement such a strategy. The beauty of that sort of approach is that you don't have to concern yourself quite as much with how efficient the regular expressions themselves are because you're processing several files in parallel.

If you end up with a ton of data every day that has to get chewed through before tomorrow, you might look into a Map-Reduce strategy such as with hadoop.


Dave


In reply to Re^3: Help with speeding up regex by davido
in thread Help with speeding up regex by eversuhoshin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.