Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to read a an iptraf log file into a script. The data will then be processed and inserted into a database. The iptraf log file looks like so ..
*** LAN traffic log, generated Thu Jul 14 17:49:44 2005 Ethernet address: 009096c7cf98 (ADSL Modem) Incoming total 297089 packets, 45056408 bytes; 293887 IP packe +ts Outgoing total 401212 packets, 252051283 bytes; 398003 IP pack +ets Average rates: 0.47 kbytes/s incoming, 2.63 kbytes/s outgoing Last 5-second rates: 0.00 kbytes/s incoming, 0.00 kbytes/s out +going Ethernet address: 0020ed7924c2 (Debian Router ext) Incoming total 401212 packets, 252051283 bytes; 398003 IP pack +ets Outgoing total 297096 packets, 45056702 bytes; 293887 IP packe +ts Average rates: 2.63 kbytes/s incoming, 0.47 kbytes/s outgoing Last 5-second rates: 0.00 kbytes/s incoming, 0.00 kbytes/s out +going

I am reading the file using the following technique
my $LOGFILE = "iptraf.log"; open(LOGFILE, $LOGFILE) or die("Could not open log file."); foreach my $line (<LOGFILE>) { #chomp($line); }
I am then wanted to organise and sort the data from the log file appropriately. This is where i require some help. How can i create a hash thats primary key is the MAC Address which points to its corresponding name (eg Debian Router), and other details such as Incoming total packets , Outgoing etc. MAC ADDRESS ---> NAME --> Debian Router Incoming --> 21311 Outgoing --> 12112 Average Rate --> 4.0 I understand their is a bit of work in this script, but some heads up on how to structure this array for creating and writing. Or should i directly begin inserting data in the database rather than reading it into an associate array?

Replies are listed 'Best First'.
Re: How to Process Log file
by holli (Abbot) on Jul 14, 2005 at 12:18 UTC
    This demonstrates how to parse such a file and extract the relevant info with regexes. You may tweak those to match the fields you want.
    use strict; use warnings; use Data::Dumper; my ($mac, %data); while ( <DATA> ) { if ( /^Ethernet address: ([0-9a-f]{12}) \(([^\)]+)/ ) { $mac = $1; $data{$mac}{device} = $2; next; } if ( /(Incoming|Outgoing) total ([0-9]+) packets/ ) { $data{$mac}{$1} = $2; next; } if ( /Average rates: ([0-9\.]+) kbytes\/s incoming, ([0-9\.]+) kby +tes\/s outgoing/ ) { $data{$mac}{incomingRate} = $1; $data{$mac}{outgoingRate} = $2; } } print Dumper (\%data); __DATA__ Ethernet address: 009096c7cf98 (ADSL Modem) Incoming total 297089 packets, 45056408 bytes; 293887 IP packe +ts Outgoing total 401212 packets, 252051283 bytes; 398003 IP pack +ets Average rates: 0.47 kbytes/s incoming, 2.63 kbytes/s outgoing Last 5-second rates: 0.00 kbytes/s incoming, 0.00 kbytes/s out +going Ethernet address: 0020ed7924c2 (Debian Router ext) Incoming total 401212 packets, 252051283 bytes; 398003 IP pack +ets Outgoing total 297096 packets, 45056702 bytes; 293887 IP packe +ts Average rates: 2.63 kbytes/s incoming, 0.47 kbytes/s outgoing Last 5-second rates: 0.00 kbytes/s incoming, 0.00 kbytes/s out +going
    Output:
    $VAR1 = { '009096c7cf98' => { 'outgoingRate' => '2.63', 'incomingRate' => '0.47', 'Outgoing' => '401212', 'Incoming' => '297089', 'device' => 'ADSL Modem' }, '0020ed7924c2' => { 'outgoingRate' => '0.47', 'incomingRate' => '2.63', 'Outgoing' => '297096', 'Incoming' => '401212', 'device' => 'Debian Router ext' } };


    holli, /regexed monk/
Re: How to Process Log file
by blazar (Canon) on Jul 14, 2005 at 12:02 UTC
    my $LOGFILE = "iptraf.log"; open(LOGFILE, $LOGFILE) or die("Could not open log file."); foreach my $line (<LOGFILE>) { #chomp($line); }
    Depending on the actual structure of your script you may even use Perl's own automagic <>. Whatwever, it is generally recommended to avoid slurping files all at once (and if doing so, possibly to use a suitable module, e.g. File::Slurp1), unless of course one has to do so. A for loop will slurp in all at once. In any case Perl5's idiomatic cycle for iterating over the lines of a file is
    while (<$handle>) { # ... }
    I understand their is a bit of work in this script, but some heads up on how to structure this array for creating and writing. Or should i directly begin inserting data in the database rather than reading it into an associate array?
    This is up to you, and hopefully someone more experienced with DBses than I am will give you more insightful advice. In any case, split is your friend.

    Update: on a second reading, the format of the log file is not simple enough that a plain split would suffice. Fortunately holli has shown you how to do it with an extensive code sample at Re: How to Process Log file.


    1 I'm not really sure about the latter point. But many people recommend doing so.

Re: How to Process Log file
by graff (Chancellor) on Jul 15, 2005 at 02:16 UTC
    You don't say what sort of "database" you are using to store the log information, but for just about every DB I know of, you don't need to sort the log data before moving it into the database. because DB server/application can sort it for you.

    But it is important to know how the log file is supposed to be related to the database: how is the database structured and used in order to store the log data?

    Does the DB keep exactly one row in its log-data table for each distinct MAC address, or does it always add a new row for every log entry? That is, if a certain MAC address shows up more than once in the log (or if you have to move the log data into the DB every day/week/month, and it usually involves the same set of MAC addresses), are you updating an existing row if a given MAC has already been seen, or are you just inserting a new row every time?

    As for processing the log file itself, if the format is really (and reliably) as shown in your example, with a blank line separating the distinct log records, then the best approach (IMO) is to set $/ (INPUT_RECORD_SEPARATOR) so that you read and process one whole record at a time -- something like this (not tested):

    { local $/ = ''; # empty string sets input_record_separator to "para +graph mode" # (blank line == end of input record) while (<LOGFILE>) { # read a whole log record into $_ my ($addr,$name) = (/address: (\S+) \((.*)\)/); my %numbers; for my $io (qw/Incoming Outgoing/) { for my $fact (qw/packets bytes IP/) { ($numbers{$io}{$fact}) = (/$io total.*?(\d+) $fact/); } ($numbers{$io}{Rate}) = (/([\d.]+) kbytes.. $io/i); } # now move $addr, $name and contents of %numbers into the databa +se } }
    If a given MAC shows up more than once, and you need to keep only the latest set of log values in the DB, then you probably do want to keep a hash keyed by $addr -- either to hold everything for moving it into the database all at once (after reading the whole log), or else to hold just the known values of $addr, so you know when to do an update as opposed to an insert. (Or, for each MAC in the log, you need to query the DB first to see if that MAC is already present in the table.)
Re: How to Process Log file
by Anonymous Monk on Jul 15, 2005 at 05:22 UTC
    Thank you for your great advise and code. I will be using a Microsoft SQL Database (as this is already in place on the netw?6??6??ata is imported into the database i will be looking at techniques for reporting, I have had experience with GD::Graph, but am currently reseaching Microsoft Crystal Reporting. Does anyone have any other suggestions for reporting techniques?