Are you sure it's the contents of the loop that are slow, and not the reading of the ZIP file? Try
Devel::NYTProf to find the acutal bottleneck.
Also there are some things you can improve within the loop before investigating parallelism.
For one you can look up $categories{$k}->{traces} once outside the whole loop.
It might also be much faster to join all those regexes together to a single regex and match it once, instead of iterating over the regexes (Don't know if that works in your case).
Also you seem to read the whole file into memory first, and then iterate over it - that's rather inefficient. Instead use
OUTER: while(my $line = <GZIP>) {
...
to read it line by line.
Parallelization is usually a lot of trouble, so try the conventional optimization wisdom first.
(Update: removed one comment that's not applicable; added hint aboute memory usage).
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.