My machine is not maxed out. It is a dual proc system, with its filesystem being a hardware raid 0, and plenty of RAM. Theoretically I could use a DB, but then that raises the scenario of, read the data, sort the data into relevant chunks, now do I A) send the data to a DB of some sort, to later reopen, reread into memory what ever way, and then duly process it, or B) simply process it now. Also if I were to segregate the computation of the data across hosts, I'm still dealing with finding a sane way of splitting the data, sending it out to some other host, polling to see when they are done, or waiting for them to finish processing, and then pulling all the data back together and correlating it. All of which adds to the run time.
I have also realized that not all the events for a msgid will occur within a given logfile, due to rotation, but im not even gonna go into maintaining state across files, for a very small percentage of the actual data. As you can see in the code I'v been allowed to kinda punt in regards to absolute accuracy, even though it may drive my inner anal-retentive geek crazy, such is life
I guess I should step back and say, I'm not attacking your points, so much as stating I've already considered it and didn't think the benefits out weighed the returns in terms of code logic and run time. No offense intended in any way shape or form. I should also add that the system doing the processing itself is not one of the MTAs. It is a completly seperate host, which is doing just about absolutely nothing aside from sshd. Its also a FreeBSD box if it makes any difference
/* And the Creator, against his better judgement, wrote man.c */ | [reply] |
Having done processing of flat files in the way you have to, I can tell you that it's far, far easier, and often faster, to use a DB. Throughput is normally better, and having the data queryable in an easy way gives you the opportunity to analyze it in ways you may not have thought of, or though of but discarded because it was infeasable because of processing concerns.
| [reply] |
OK, well I'd suggest you try running all of the processes in paralell -- to get an idea of how much 'spare time' you have. You know, process data in one process while some of the others are reading through the data files.
Then maybe try running just three at a time, and so forth.
--t. alex
Life is short: get busy!
| [reply] |