Ok, now I understand the performance requirements better.

Doubling the performance from 60K to 120K lines/sec with your current single process would be possible albeit with some C code. But that still wouldn't do all that you want. I predict that I could code $singleline=~s/((\S+)\s?)/$count{$2}++ ? '' : $1/eg; much more efficiently in ASM rather than in C because there are certain instructions that are difficult for the C compilier to even use. If this was an embedded hardware board application, it would be worth the effort. But here, I think not! I believe you are better served with a pure Perl application

I think you are on the right direction to distribute this incoming "firehose of data" between multiple processing entities. Right now it appears that you are thinking about one program with multiple threads. I would be thinking of multiple instances of a single threaded process with a "router" process. Let the OS assign these processes to different machine cores. I don't see any requirement for these processes to communicate with each other or share information. A consideration could be how easy it is to just add an additional machine when the load increases?

Leave your final "print" in the benchmark. That does all the work and it does go to STDOUT, it just result gets re-directed to the "bit bucket".

I am still curious as to what this analysis program does with this massive amount of data? It seems that some kind of "front-end" to this thing might be possible? Extract perhaps a time window, perhaps all data from Server X from the main log file that is then analyzed in non-real time. It seems to me that the processing power of super fast concatenation of lines and the compression of the data by 30% due to "dupes" must be minuscule to the overall effort of the analysis program? Aside from reducing the storage required, it is not clear how much this will help the "final end result"?


In reply to Re^5: Multi-CPU when reading STDIN and small tasks by Marshall
in thread Multi-CPU when reading STDIN and small tasks by bspencer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.