in reply to Email Thresholding

There are a couple of simple ways to go about this. There are plenty of complicated ways. Sometimes you can get 80% or 90% of what you want with a lot less effort if you just barely tweak your spec.

If you're okay with two emails in some cases rather than one, it's easy to get to a point where that edge case is just allowed to happen. Instead of worrying about "the past hour", work with "this current clock hour". "One per hour" in this case means per clock hour rather than sliding sixty-minute windows. Append events when they happen to an events log. Open a new log every hour, maybe with the filename format of yyyy-mm-dd-hh.log. If your log already exists before you've opened it to write out this event, your mail for that hour should have already been sent. Don't send mail in that case. Feel free to go ahead and append the event, though, as this gives a great diagnostic tool. The oldest logs can be cleaned up weekly or monthly for space concerns, although don't underestimate the power of gzip on text files. This might get you an email from 00:59 and another at 01:02 but you'll not get a third until 02:00 with this method.

Another option would be to append to an ongoing log and use something like logrotate to manage the size. A stat of the file for mtime could let you know if it's been written to in the past hour.

If you're already using a database, putting this information in a table with timestamps may be appropriate.

I'm curious about your prior assumptions. This sounds like you're sending an email per event when running 30 times per hour as it is. Why not collect all those events into a single email per run as a starting point? That gets you to a maximum of 30 emails per hour right off the bat, and gets all the alert information for the past two minutes into the initial contact.

You could gather all the information for this two-minute window into one text file, and combine that with the per-clock-hour stuff from above, appending or attaching all those per-run files for the hour to the one email.

This is really more of a problem space discussion than a Perl discussion. If you have some code you need help tweaking to any of the recommendations in the thread, show us what you have so far and I'm sure many of us would be happy to help.

Replies are listed 'Best First'.
Re^2: Email Thresholding
by bfdi533 (Friar) on Apr 02, 2015 at 17:09 UTC

    Thanks for your thoughts. I do have some Perl code that I can include once I can get to it.

    Agreed that this was intended more of a problem space solution but with a Perl implementation since my code is in Perl.

    As to the specifics, my code runs every 3 minutes and checks the last 5 minutes worth of logs from a DB. Every matching event is then logged to a different table in the DB. Once all events are gathered, the match table is then run through line by line to generate the emails.

    I know, rather lazy as I did not think this would blow up and spam e or my engineers. But, now that it has, here we are.

    With the input gathered so far, I think I had a couple of thoughts:

    1. run through the match table at the beginning of the script and store that last matches in a hash for easy lookup later
    2. aggregate the match table query for the look to get a count of each match in the period, rather than all of them
    3. For each match in the loop, check the has to make sure it has not been an hour since last detection. If more than an hour, send the email

    Any additional thoughts?

      I would query the database with the time constraint of the last 60 minutes. If you're not timestamping your entries with a native DB timestamp, start doing that.

      I would consider how many varieties of alert I could have, and if that's three or four, I'd limit each type to one per hour rather than one overall.

      For auditability you're going to want a record of the emails being sent anyway. Have a table where you record the email being sent. Select any sent for your class of alert (or for all if you go that route) from the last hour, by timestamp. If there are none, aggregate all the events from the last hour which you selected above, send an email, and insert your row into the email_sent table.

      The more we discuss this, the more it sounds like Nagios, Mon, Argus, Big Brother, Tripwire, or some other monitoring/IDS solution. You might be able to make a plugin to one of those or at least look to them for how to solve these issues.