in reply to parsing a directory of log files.

How do you 'expect' it to work? What does it actually do? See How do I post a question effectively?.

Replies are listed 'Best First'.
Re^2: parsing a directory of log files.
by learn2earn (Acolyte) on Feb 25, 2010 at 15:09 UTC
    I now have it working using www.mechanize but want to add the following functionality.. check date, and check for login..if both exist it should increment count for that match and also store the login time. the next thing it should look for is logout. it should also count session ends, and capture login/logout times, which I am not sure how to do. anyways here is my code so far..
    #Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://operations/idslogs/latest.html"; # url gets clobbered by pagelist added latest.html to end of array # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page # this is the list of logfiles to process @pagelist= qw{aps_2009-11-05_11-07-26-633.html aps_2009-11-10_11-30-44-204.html aps_2009-11-17_11-30-53-298.html aps_2009-11-24_11-31-02-401.html aps_2009-12-01_11-31-11-606.html aps_2009-12-08_11-31-20-751.html aps_2009-12-10_12-51-29-069.html aps_2009-12-10_12-57-31-621.html aps_2009-12-10_13-00-06-560.html aps_2009-12-10_13-09-05-531.html aps_2009-12-15_03-37-42-906.html aps_2009-12-17_15-50-58-140.html aps_2009-12-18_03-05-35-625.html aps_2009-12-19_03-06-39-703.html aps_2009-12-20_03-04-17-265.html aps_2009-12-21_03-05-41-125.html aps_2009-12-21_12-11-58-078.html aps_2009-12-22_03-07-48-265.html aps_2009-12-23_03-07-16-265.html aps_2009-12-23_17-00-05-997.html aps_2009-12-24_03-06-38-765.html aps_2009-12-25_03-06-42-734.html aps_2009-12-26_03-04-40-546.html aps_2009-12-27_03-05-33-125.html aps_2009-12-28_03-06-18-640.html aps_2009-12-29_03-06-48-937.html aps_2009-12-30_03-05-58-812.html aps_2009-12-31_03-05-24-000.html aps_2010-01-01_03-04-54-031.html aps_2010-01-02_03-05-29-421.html aps_2010-01-03_03-06-57-968.html aps_2010-01-04_03-06-08-046.html aps_2010-01-05_03-06-25-046.html aps_2010-01-06_03-07-26-953.html aps_2010-01-07_03-07-26-750.html aps_2010-01-08_03-08-19-859.html aps_2010-01-09_03-07-48-015.html aps_2010-01-10_03-07-46-906.html aps_2010-01-11_03-05-28-734.html aps_2010-01-12_03-07-37-265.html aps_2010-01-13_03-09-14-609.html aps_2010-01-14_03-07-46-328.html aps_2010-01-15_03-07-03-359.html aps_2010-01-16_03-06-19-421.html aps_2010-01-16_22-07-56-921.html aps_2010-01-17_03-06-47-812.html aps_2010-01-18_03-06-15-156.html aps_2010-01-19_03-06-50-250.html aps_2010-01-20_03-07-55-359.html aps_2010-01-21_03-09-13-843.html aps_2010-01-22_03-07-09-453.html aps_2010-01-23_03-06-24-343.html aps_2010-01-24_03-07-24-578.html aps_2010-01-25_03-08-38-812.html aps_2010-01-25_17-05-12-843.html aps_2010-01-26_03-07-15-750.html aps_2010-01-27_03-08-56-171.html aps_2010-01-28_03-07-54-078.html aps_2010-01-28_10-37-28-218.html aps_2010-01-29_03-04-43-703.html aps_2010-01-30_03-03-58-640.html aps_2010-01-31_01-41-31-125.html aps_2010-01-31_03-05-49-359.html aps_2010-02-01_03-05-57-890.html aps_2010-02-02_03-06-01-046.html aps_2010-02-03_03-06-34-828.html aps_2010-02-04_03-06-30-343.html aps_2010-02-05_03-04-44-218.html aps_2010-02-06_02-31-31-968.html aps_2010-02-06_03-16-57-750.html aps_2010-02-06_12-02-37-792.html aps_2010-02-07_03-06-34-718.html aps_2010-02-08_03-08-46-125.html latest.html}; #getting size of pagelist ie number of elements $pagelist=@pagelist; #setting value=number of array elements $value=$pagelist; $value=$value-1; #taking last element off of value for url foreach $paper (@pagelist){ $url="http://operations/idslogs/$pagelist[$value]"; #my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # how will I count ip's and also need times $match_count+=()=($page=~/Memphis/g); $actris_count+=()=($page=~/ACTRIS/g); $sef_count+=()=($page=~/South East Florida/g); } #closing for loop so count doesn't get cleared { print "Logins for Memphis $match_count\t"; print "Logins for Actris $actris_count\t"; print "Logins for SEF $sef_count\t"; }
    its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1.
    #Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://someservername/TESTLOG.html"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; my $ipcount=0; my $match_count=()=($page=~/Memphis/g);{ print "Logins for Memphis $match_count\t"; } my $actris_count=()=($page=~/ACTRIS/g);{ print "Logins for Austin $actris_count\t"; } my $sef_count=()=($page=~/South East Florida/g);{ print "Logins for SEF $sef_count\t"; }
    However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2
    #Read in files for IDSLOGS. use:OLE; use IO::File; $dir= shift || '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; print $file; open (FH, "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>; close FH; foreach my $line (@file_lines) { #search for a match here like I do above # but it doesn't work why not? } } print "processed $count lines";
    sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".

      The point I was raising is that we shouldn't have to draw information out of you piecemeal, you've been posting here for years, you aren't a new user, you should know how this works by now. If you can't be bothered to explain what you think is going wrong (again see How do I post a question effectively?), why should anyone bother to spend their free time to investigate your problem? Generally the better you describe your problem (a proper description of the problem, example code, sample input data, sample output data etc) the better response you'll get.