in reply to How can I maximize performance for my router netflow analyzer? (was: Perl Performance Question)

If there is one thing you can say today, it is "disk is cheap".

I'd just have the collector write the data to disk files with sequence numbers or dates in their names. So either set a maximum file size of 400MB (for example) and just go to the next sequence number when you hit that or go to a new file every hour or N minutes.

Then have a separate process that extracts data from these files, summarizes it, stores it in a more permanent place, and finally deletes the file when it is sure that both it and the file writer are done with it (or have a separate step that deletes files so you can recover if you find a bug in the analysis or can purge unanalyzed files if the backlog gets really, really huge).

I'd think that any other scheme is going to be pretty vulnerable to loss of data.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: Perl Performance Question
by IkomaAndy (Acolyte) on Jun 13, 2001 at 20:14 UTC
    This is something I was thinking of. Hopefully, if the "processor" can't keep up with the "gatherer," it would be able to make up for lost time during non-peak times.
      I implemented this "disk queue" system, and found it very slow, even using a memory-based /tmp filesystem (on Solaris 2.6). I could only process about 200 flows per second, as opposed to the ~1,000 per second that should be gathered. That's a pretty extreme backlog-- I don't know if I would even be able to consume it during offpeak times. I didn't really see a slowdown in gathering when the "processor" was running, though, something I wondered about when writing to disk.