its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1.#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://operations/idslogs/latest.html"; # url gets clobbered by pagelist added latest.html to end of array # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page # this is the list of logfiles to process @pagelist= qw{aps_2009-11-05_11-07-26-633.html aps_2009-11-10_11-30-44-204.html aps_2009-11-17_11-30-53-298.html aps_2009-11-24_11-31-02-401.html aps_2009-12-01_11-31-11-606.html aps_2009-12-08_11-31-20-751.html aps_2009-12-10_12-51-29-069.html aps_2009-12-10_12-57-31-621.html aps_2009-12-10_13-00-06-560.html aps_2009-12-10_13-09-05-531.html aps_2009-12-15_03-37-42-906.html aps_2009-12-17_15-50-58-140.html aps_2009-12-18_03-05-35-625.html aps_2009-12-19_03-06-39-703.html aps_2009-12-20_03-04-17-265.html aps_2009-12-21_03-05-41-125.html aps_2009-12-21_12-11-58-078.html aps_2009-12-22_03-07-48-265.html aps_2009-12-23_03-07-16-265.html aps_2009-12-23_17-00-05-997.html aps_2009-12-24_03-06-38-765.html aps_2009-12-25_03-06-42-734.html aps_2009-12-26_03-04-40-546.html aps_2009-12-27_03-05-33-125.html aps_2009-12-28_03-06-18-640.html aps_2009-12-29_03-06-48-937.html aps_2009-12-30_03-05-58-812.html aps_2009-12-31_03-05-24-000.html aps_2010-01-01_03-04-54-031.html aps_2010-01-02_03-05-29-421.html aps_2010-01-03_03-06-57-968.html aps_2010-01-04_03-06-08-046.html aps_2010-01-05_03-06-25-046.html aps_2010-01-06_03-07-26-953.html aps_2010-01-07_03-07-26-750.html aps_2010-01-08_03-08-19-859.html aps_2010-01-09_03-07-48-015.html aps_2010-01-10_03-07-46-906.html aps_2010-01-11_03-05-28-734.html aps_2010-01-12_03-07-37-265.html aps_2010-01-13_03-09-14-609.html aps_2010-01-14_03-07-46-328.html aps_2010-01-15_03-07-03-359.html aps_2010-01-16_03-06-19-421.html aps_2010-01-16_22-07-56-921.html aps_2010-01-17_03-06-47-812.html aps_2010-01-18_03-06-15-156.html aps_2010-01-19_03-06-50-250.html aps_2010-01-20_03-07-55-359.html aps_2010-01-21_03-09-13-843.html aps_2010-01-22_03-07-09-453.html aps_2010-01-23_03-06-24-343.html aps_2010-01-24_03-07-24-578.html aps_2010-01-25_03-08-38-812.html aps_2010-01-25_17-05-12-843.html aps_2010-01-26_03-07-15-750.html aps_2010-01-27_03-08-56-171.html aps_2010-01-28_03-07-54-078.html aps_2010-01-28_10-37-28-218.html aps_2010-01-29_03-04-43-703.html aps_2010-01-30_03-03-58-640.html aps_2010-01-31_01-41-31-125.html aps_2010-01-31_03-05-49-359.html aps_2010-02-01_03-05-57-890.html aps_2010-02-02_03-06-01-046.html aps_2010-02-03_03-06-34-828.html aps_2010-02-04_03-06-30-343.html aps_2010-02-05_03-04-44-218.html aps_2010-02-06_02-31-31-968.html aps_2010-02-06_03-16-57-750.html aps_2010-02-06_12-02-37-792.html aps_2010-02-07_03-06-34-718.html aps_2010-02-08_03-08-46-125.html latest.html}; #getting size of pagelist ie number of elements $pagelist=@pagelist; #setting value=number of array elements $value=$pagelist; $value=$value-1; #taking last element off of value for url foreach $paper (@pagelist){ $url="http://operations/idslogs/$pagelist[$value]"; #my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # how will I count ip's and also need times $match_count+=()=($page=~/Memphis/g); $actris_count+=()=($page=~/ACTRIS/g); $sef_count+=()=($page=~/South East Florida/g); } #closing for loop so count doesn't get cleared { print "Logins for Memphis $match_count\t"; print "Logins for Actris $actris_count\t"; print "Logins for SEF $sef_count\t"; }
However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://someservername/TESTLOG.html"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; my $ipcount=0; my $match_count=()=($page=~/Memphis/g);{ print "Logins for Memphis $match_count\t"; } my $actris_count=()=($page=~/ACTRIS/g);{ print "Logins for Austin $actris_count\t"; } my $sef_count=()=($page=~/South East Florida/g);{ print "Logins for SEF $sef_count\t"; }
sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".#Read in files for IDSLOGS. use:OLE; use IO::File; $dir= shift || '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; print $file; open (FH, "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>; close FH; foreach my $line (@file_lines) { #search for a match here like I do above # but it doesn't work why not? } } print "processed $count lines";
In reply to Re^2: parsing a directory of log files.
by learn2earn
in thread parsing a directory of log files.
by learn2earn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |