johnprince1980 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a typical requirement to find users having at least three occurrence in a log within an hour. Please guide me how can I accomplish this.
[04/Jun/2013:13:06:13 -0600] conn=13570 op=14 msgId=13 - BIND dn="uid= +xyz123,ou=People,o=xyz.com" method=128 version=3 [04/Jun/2013:15:06:13 -0600] conn=13570 op=14 msgId=15 - RESULT err=0 +tag=101 nentries=48030 etime=139 SRCH=Q

Basically, we need to find any user ( ie uid=xyz123), getting "SRCH=Q" in a particular connection have more than three occurrence within an hour. If you see the logs, they are related with "conn=13570". In brief, here is the logic : - Get the "SRCH=Q" occurence. - Get the associated conn #, go back and get the bind user. - Carrying bind user, search for "SRCH=Q" occurrence, if > 3, run add group command.

Thanks, JPrince

Replies are listed 'Best First'.
Re: Pull users with multiple search
by smls (Friar) on Jun 07, 2013 at 06:31 UTC

    So at which part of implementing that logic do you get stuck?

    EDIT:

    In case you don't know where to even begin, here's one possible recipe for implementing this task:

    1. Define two hashes: %users and %searches
    2. Process the logfile line by line. For each line, use a regex to see if it matches the BIND or RESULT form, and extract the relevant fields ($conn, $uid, etc.) if it does. Also:
      1. If it is a BIND line:
        1. Add an entry to the %users hash, with $conn as the key and $uid as the value.
      2. If it is a RESULT line:
        1. Add relevant information (about the timestamp of the search) to the value of the %searches entry that belongs to the key $conn.
        2. Check the accumulated information in said hash value, for whether the condition of "three occurrences within an hour" has been met. If so, use the %users hash to look up the UID that belongs to the $conn in question and run the `add group` command for it.
        3. Remove information from said hash value that is no longer required.

    Of course, what exactly "add/check/remove relevant information" means in 2.b.i. - 2.b.iii., depends on the exact requirements of what "three occurrences within an hour" should mean. See hdb's answer for details.

    Also, this recipe assumes that the BIND line always comes before the corresponding RESULT lines, and that a little extra memory overhead is acceptable in order optimize speed. If either of these requirements is not given, a better way to do it might be to do a first parsing run through the logfile for the RESULT lines only, and then a second one for only those BIND lines that are actually needed.

    For general help on how to parse a file and use regexes, see the links in Anonymous Monk's answer.

Re: Pull users with multiple search
by hdb (Monsignor) on Jun 07, 2013 at 06:46 UTC

    You already provided the basic recipe. I only see a few open questions in your spec:

    1. The one hour you specify, counts this from the establishment of the connection?
    2. Is this any one hour or can one simplify to: 13:00:00 to 13:59:59 for example? The latter is much simpler than looking for an arbitrary period of 60 minutes.
    3. Can you read in a log file into memory or does the logic need to read line by line?
    After that it is straightforward using a few regexes and hashes.

Re: Pull users with multiple search
by Anonymous Monk on Jun 07, 2013 at 07:22 UTC
      Thanks for all your reply.

      I am not very much familiar with Perl, however while going though internet, i felt that Perl will be the best tool to implement this.

      As suggested by 'smls', i really like the suggestion. It will really a lot of effort to put that.

      The hourly processing is because we do not want to scan the entire log everytime, we can schedule cronjob to perform hourly scan.

      If anyone can put some sample code, that will be great.

      Thanks

        So you are really looking for a programmer to do the job for you ... ;(

        Anyways: your requirement "last hour" can be ignored, as your log file only covers the last hour. So it is only

        1. Link user to connection.
        2. Count "SRCH=Q" per connection.
        If you are looking for help here it would be good to
        1. Provide some of your own attempts.
        2. Provide a bigger sample to let people test code.
        The following code is not production ripe as it depends on a number of assumptions based on your limited sample.

        use strict; use warnings; my %user; my %conn; while(<DATA>){ my ($conn) = /conn=(\d+)\s/; my ($uid) = /uid=(.*?),/; $uid ? $user{$conn}=$uid : $conn{$conn}++; } for my $key ( keys %conn ) { print $user{$key}//"Unknown user"; print ": $conn{$key} times in logfile\n"; } __DATA__ [04/Jun/2013:13:06:13 -0600] conn=13570 op=14 msgId=13 - BIND dn="uid= +xyz123,ou=People,o=xyz.com" method=128 version=3 [04/Jun/2013:15:06:13 -0600] conn=13570 op=14 msgId=15 - RESULT err=0 +tag=101 nentries=48030 etime=139 SRCH=Q [04/Jun/2013:15:06:13 -0600] conn=13570 op=14 msgId=15 - RESULT err=0 +tag=101 nentries=48030 etime=139 SRCH=Q [04/Jun/2013:15:06:13 -0600] conn=13571 op=14 msgId=15 - RESULT err=0 +tag=101 nentries=48030 etime=139 SRCH=Q