in reply to Re: parsing a directory of log files.
in thread parsing a directory of log files.
its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1.#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://operations/idslogs/latest.html"; # url gets clobbered by pagelist added latest.html to end of array # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page # this is the list of logfiles to process @pagelist= qw{aps_2009-11-05_11-07-26-633.html aps_2009-11-10_11-30-44-204.html aps_2009-11-17_11-30-53-298.html aps_2009-11-24_11-31-02-401.html aps_2009-12-01_11-31-11-606.html aps_2009-12-08_11-31-20-751.html aps_2009-12-10_12-51-29-069.html aps_2009-12-10_12-57-31-621.html aps_2009-12-10_13-00-06-560.html aps_2009-12-10_13-09-05-531.html aps_2009-12-15_03-37-42-906.html aps_2009-12-17_15-50-58-140.html aps_2009-12-18_03-05-35-625.html aps_2009-12-19_03-06-39-703.html aps_2009-12-20_03-04-17-265.html aps_2009-12-21_03-05-41-125.html aps_2009-12-21_12-11-58-078.html aps_2009-12-22_03-07-48-265.html aps_2009-12-23_03-07-16-265.html aps_2009-12-23_17-00-05-997.html aps_2009-12-24_03-06-38-765.html aps_2009-12-25_03-06-42-734.html aps_2009-12-26_03-04-40-546.html aps_2009-12-27_03-05-33-125.html aps_2009-12-28_03-06-18-640.html aps_2009-12-29_03-06-48-937.html aps_2009-12-30_03-05-58-812.html aps_2009-12-31_03-05-24-000.html aps_2010-01-01_03-04-54-031.html aps_2010-01-02_03-05-29-421.html aps_2010-01-03_03-06-57-968.html aps_2010-01-04_03-06-08-046.html aps_2010-01-05_03-06-25-046.html aps_2010-01-06_03-07-26-953.html aps_2010-01-07_03-07-26-750.html aps_2010-01-08_03-08-19-859.html aps_2010-01-09_03-07-48-015.html aps_2010-01-10_03-07-46-906.html aps_2010-01-11_03-05-28-734.html aps_2010-01-12_03-07-37-265.html aps_2010-01-13_03-09-14-609.html aps_2010-01-14_03-07-46-328.html aps_2010-01-15_03-07-03-359.html aps_2010-01-16_03-06-19-421.html aps_2010-01-16_22-07-56-921.html aps_2010-01-17_03-06-47-812.html aps_2010-01-18_03-06-15-156.html aps_2010-01-19_03-06-50-250.html aps_2010-01-20_03-07-55-359.html aps_2010-01-21_03-09-13-843.html aps_2010-01-22_03-07-09-453.html aps_2010-01-23_03-06-24-343.html aps_2010-01-24_03-07-24-578.html aps_2010-01-25_03-08-38-812.html aps_2010-01-25_17-05-12-843.html aps_2010-01-26_03-07-15-750.html aps_2010-01-27_03-08-56-171.html aps_2010-01-28_03-07-54-078.html aps_2010-01-28_10-37-28-218.html aps_2010-01-29_03-04-43-703.html aps_2010-01-30_03-03-58-640.html aps_2010-01-31_01-41-31-125.html aps_2010-01-31_03-05-49-359.html aps_2010-02-01_03-05-57-890.html aps_2010-02-02_03-06-01-046.html aps_2010-02-03_03-06-34-828.html aps_2010-02-04_03-06-30-343.html aps_2010-02-05_03-04-44-218.html aps_2010-02-06_02-31-31-968.html aps_2010-02-06_03-16-57-750.html aps_2010-02-06_12-02-37-792.html aps_2010-02-07_03-06-34-718.html aps_2010-02-08_03-08-46-125.html latest.html}; #getting size of pagelist ie number of elements $pagelist=@pagelist; #setting value=number of array elements $value=$pagelist; $value=$value-1; #taking last element off of value for url foreach $paper (@pagelist){ $url="http://operations/idslogs/$pagelist[$value]"; #my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # how will I count ip's and also need times $match_count+=()=($page=~/Memphis/g); $actris_count+=()=($page=~/ACTRIS/g); $sef_count+=()=($page=~/South East Florida/g); } #closing for loop so count doesn't get cleared { print "Logins for Memphis $match_count\t"; print "Logins for Actris $actris_count\t"; print "Logins for SEF $sef_count\t"; }
However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://someservername/TESTLOG.html"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; my $ipcount=0; my $match_count=()=($page=~/Memphis/g);{ print "Logins for Memphis $match_count\t"; } my $actris_count=()=($page=~/ACTRIS/g);{ print "Logins for Austin $actris_count\t"; } my $sef_count=()=($page=~/South East Florida/g);{ print "Logins for SEF $sef_count\t"; }
sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".#Read in files for IDSLOGS. use:OLE; use IO::File; $dir= shift || '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; print $file; open (FH, "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>; close FH; foreach my $line (@file_lines) { #search for a match here like I do above # but it doesn't work why not? } } print "processed $count lines";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: parsing a directory of log files.
by marto (Cardinal) on Feb 25, 2010 at 15:25 UTC |