Re^2: parsing a directory of log files.

I now have it working using www.mechanize but want to add the following functionality.. check date, and check for login..if both exist it should increment count for that match and also store the login time. the next thing it should look for is logout. it should also count session ends, and capture login/logout times, which I am not sure how to do. anyways here is my code so far..

 
 #Include the WWW::Mechanize module
use OLE;
use LWP;
use WWW::Mechanize;
use Mail::Sender;


$url="http://operations/idslogs/latest.html";
# url gets clobbered by pagelist added latest.html to end of array
# Create a new instance of WWW::Mechanize
# enabling autoheck checks each request to ensure it was successful,
# producing an error if not.
my $mechanize = WWW::Mechanize->new(autocheck => 1);
 
# Retrieve the page
# this is the list of logfiles to process
@pagelist= qw{aps_2009-11-05_11-07-26-633.html
aps_2009-11-10_11-30-44-204.html
aps_2009-11-17_11-30-53-298.html
aps_2009-11-24_11-31-02-401.html
aps_2009-12-01_11-31-11-606.html
aps_2009-12-08_11-31-20-751.html
aps_2009-12-10_12-51-29-069.html
aps_2009-12-10_12-57-31-621.html
aps_2009-12-10_13-00-06-560.html
aps_2009-12-10_13-09-05-531.html
aps_2009-12-15_03-37-42-906.html
aps_2009-12-17_15-50-58-140.html
aps_2009-12-18_03-05-35-625.html
aps_2009-12-19_03-06-39-703.html
aps_2009-12-20_03-04-17-265.html
aps_2009-12-21_03-05-41-125.html
aps_2009-12-21_12-11-58-078.html
aps_2009-12-22_03-07-48-265.html
aps_2009-12-23_03-07-16-265.html
aps_2009-12-23_17-00-05-997.html
aps_2009-12-24_03-06-38-765.html
aps_2009-12-25_03-06-42-734.html
aps_2009-12-26_03-04-40-546.html
aps_2009-12-27_03-05-33-125.html
aps_2009-12-28_03-06-18-640.html
aps_2009-12-29_03-06-48-937.html
aps_2009-12-30_03-05-58-812.html
aps_2009-12-31_03-05-24-000.html
aps_2010-01-01_03-04-54-031.html
aps_2010-01-02_03-05-29-421.html
aps_2010-01-03_03-06-57-968.html
aps_2010-01-04_03-06-08-046.html
aps_2010-01-05_03-06-25-046.html
aps_2010-01-06_03-07-26-953.html
aps_2010-01-07_03-07-26-750.html
aps_2010-01-08_03-08-19-859.html
aps_2010-01-09_03-07-48-015.html
aps_2010-01-10_03-07-46-906.html
aps_2010-01-11_03-05-28-734.html
aps_2010-01-12_03-07-37-265.html
aps_2010-01-13_03-09-14-609.html
aps_2010-01-14_03-07-46-328.html
aps_2010-01-15_03-07-03-359.html
aps_2010-01-16_03-06-19-421.html
aps_2010-01-16_22-07-56-921.html
aps_2010-01-17_03-06-47-812.html
aps_2010-01-18_03-06-15-156.html
aps_2010-01-19_03-06-50-250.html
aps_2010-01-20_03-07-55-359.html
aps_2010-01-21_03-09-13-843.html
aps_2010-01-22_03-07-09-453.html
aps_2010-01-23_03-06-24-343.html
aps_2010-01-24_03-07-24-578.html
aps_2010-01-25_03-08-38-812.html
aps_2010-01-25_17-05-12-843.html
aps_2010-01-26_03-07-15-750.html
aps_2010-01-27_03-08-56-171.html
aps_2010-01-28_03-07-54-078.html
aps_2010-01-28_10-37-28-218.html
aps_2010-01-29_03-04-43-703.html
aps_2010-01-30_03-03-58-640.html
aps_2010-01-31_01-41-31-125.html
aps_2010-01-31_03-05-49-359.html
aps_2010-02-01_03-05-57-890.html
aps_2010-02-02_03-06-01-046.html
aps_2010-02-03_03-06-34-828.html
aps_2010-02-04_03-06-30-343.html
aps_2010-02-05_03-04-44-218.html
aps_2010-02-06_02-31-31-968.html
aps_2010-02-06_03-16-57-750.html
aps_2010-02-06_12-02-37-792.html
aps_2010-02-07_03-06-34-718.html
aps_2010-02-08_03-08-46-125.html
latest.html};

#getting size of pagelist ie number of elements
$pagelist=@pagelist;


#setting value=number of array elements

$value=$pagelist;



$value=$value-1;

#taking last element off of value for url

foreach $paper (@pagelist){
$url="http://operations/idslogs/$pagelist[$value]";

#my $mechanize = WWW::Mechanize->new(autocheck => 1);
$mechanize->get($url);





# Assign the page content to $page
my $page = $mechanize->content;

# how will I count ip's and also need times



$match_count+=()=($page=~/Memphis/g);
$actris_count+=()=($page=~/ACTRIS/g);
$sef_count+=()=($page=~/South East Florida/g);
}
#closing for loop so count doesn't get cleared
{
print "Logins for Memphis $match_count\t";
print "Logins for Actris  $actris_count\t";
print "Logins for SEF $sef_count\t";
}
[download]

its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1.

 #Include the WWW::Mechanize module
use OLE;
use LWP;
use WWW::Mechanize;
use Mail::Sender;

$url="http://someservername/TESTLOG.html";
# Create a new instance of WWW::Mechanize
# enabling autoheck checks each request to ensure it was successful,
# producing an error if not.
my $mechanize = WWW::Mechanize->new(autocheck => 1);
 
# Retrieve the page



$mechanize->get($url);
 
# Assign the page content to $page
my $page = $mechanize->content;
my $ipcount=0;

my $match_count=()=($page=~/Memphis/g);{

print "Logins for Memphis $match_count\t";

}
my $actris_count=()=($page=~/ACTRIS/g);{

print "Logins for Austin $actris_count\t";
}

my $sef_count=()=($page=~/South East Florida/g);{
print "Logins for SEF $sef_count\t";
}
[download]

However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2

#Read in files for IDSLOGS. 
use:OLE;

use IO::File;

$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
    next if $file=~/^\./;
        print $file;
    open (FH, "$file") or die "Can't open file $file: $!\n";
        my @file_lines=<FH>;
        
        close FH;
        foreach my $line (@file_lines) {
        #search for a match here like I do above
        # but it doesn't work why not? 
        }
       
}
print "processed $count lines";
[download]

sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".

Comment on Re^2: parsing a directory of log files. Select or Download Code