Re: How to Process Log file

You don't say what sort of "database" you are using to store the log information, but for just about every DB I know of, you don't need to sort the log data before moving it into the database. because DB server/application can sort it for you.

But it is important to know how the log file is supposed to be related to the database: how is the database structured and used in order to store the log data?

Does the DB keep exactly one row in its log-data table for each distinct MAC address, or does it always add a new row for every log entry? That is, if a certain MAC address shows up more than once in the log (or if you have to move the log data into the DB every day/week/month, and it usually involves the same set of MAC addresses), are you updating an existing row if a given MAC has already been seen, or are you just inserting a new row every time?

As for processing the log file itself, if the format is really (and reliably) as shown in your example, with a blank line separating the distinct log records, then the best approach (IMO) is to set $/ (INPUT_RECORD_SEPARATOR) so that you read and process one whole record at a time -- something like this (not tested):

{
   local $/ = '';  # empty string sets input_record_separator to "para
+graph mode"
                   # (blank line == end of input record)

   while (<LOGFILE>) {  # read a whole log record into $_
      my ($addr,$name) = (/address: (\S+) \((.*)\)/);
      my %numbers;
      for my $io (qw/Incoming Outgoing/) {
         for my $fact (qw/packets bytes IP/) {
             ($numbers{$io}{$fact}) = (/$io total.*?(\d+) $fact/);
         }
         ($numbers{$io}{Rate}) = (/([\d.]+) kbytes.. $io/i);
      }
      # now move $addr, $name and contents of %numbers into the databa
+se
   }
}
[download]

If a given MAC shows up more than once, and you need to keep only the latest set of log values in the DB, then you probably do want to keep a hash keyed by $addr -- either to hold everything for moving it into the database all at once (after reading the whole log), or else to hold just the known values of $addr, so you know when to do an update as opposed to an insert. (Or, for each MAC in the log, you need to query the DB first to see if that MAC is already present in the table.)

Comment on Re: How to Process Log file Download Code