I now have it working using www.mechanize but want to add the following functionality.. check date, and check for login..if both exist it should increment count for that match and also store the login time. the next thing it should look for is logout. it should also count session ends, and capture login/logout times, which I am not sure how to do. anyways here is my code so far..
#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://operations/idslogs/latest.html"; # url gets clobbered by pagelist added latest.html to end of array # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page # this is the list of logfiles to process @pagelist= qw{aps_2009-11-05_11-07-26-633.html aps_2009-11-10_11-30-44-204.html aps_2009-11-17_11-30-53-298.html aps_2009-11-24_11-31-02-401.html aps_2009-12-01_11-31-11-606.html aps_2009-12-08_11-31-20-751.html aps_2009-12-10_12-51-29-069.html aps_2009-12-10_12-57-31-621.html aps_2009-12-10_13-00-06-560.html aps_2009-12-10_13-09-05-531.html aps_2009-12-15_03-37-42-906.html aps_2009-12-17_15-50-58-140.html aps_2009-12-18_03-05-35-625.html aps_2009-12-19_03-06-39-703.html aps_2009-12-20_03-04-17-265.html aps_2009-12-21_03-05-41-125.html aps_2009-12-21_12-11-58-078.html aps_2009-12-22_03-07-48-265.html aps_2009-12-23_03-07-16-265.html aps_2009-12-23_17-00-05-997.html aps_2009-12-24_03-06-38-765.html aps_2009-12-25_03-06-42-734.html aps_2009-12-26_03-04-40-546.html aps_2009-12-27_03-05-33-125.html aps_2009-12-28_03-06-18-640.html aps_2009-12-29_03-06-48-937.html aps_2009-12-30_03-05-58-812.html aps_2009-12-31_03-05-24-000.html aps_2010-01-01_03-04-54-031.html aps_2010-01-02_03-05-29-421.html aps_2010-01-03_03-06-57-968.html aps_2010-01-04_03-06-08-046.html aps_2010-01-05_03-06-25-046.html aps_2010-01-06_03-07-26-953.html aps_2010-01-07_03-07-26-750.html aps_2010-01-08_03-08-19-859.html aps_2010-01-09_03-07-48-015.html aps_2010-01-10_03-07-46-906.html aps_2010-01-11_03-05-28-734.html aps_2010-01-12_03-07-37-265.html aps_2010-01-13_03-09-14-609.html aps_2010-01-14_03-07-46-328.html aps_2010-01-15_03-07-03-359.html aps_2010-01-16_03-06-19-421.html aps_2010-01-16_22-07-56-921.html aps_2010-01-17_03-06-47-812.html aps_2010-01-18_03-06-15-156.html aps_2010-01-19_03-06-50-250.html aps_2010-01-20_03-07-55-359.html aps_2010-01-21_03-09-13-843.html aps_2010-01-22_03-07-09-453.html aps_2010-01-23_03-06-24-343.html aps_2010-01-24_03-07-24-578.html aps_2010-01-25_03-08-38-812.html aps_2010-01-25_17-05-12-843.html aps_2010-01-26_03-07-15-750.html aps_2010-01-27_03-08-56-171.html aps_2010-01-28_03-07-54-078.html aps_2010-01-28_10-37-28-218.html aps_2010-01-29_03-04-43-703.html aps_2010-01-30_03-03-58-640.html aps_2010-01-31_01-41-31-125.html aps_2010-01-31_03-05-49-359.html aps_2010-02-01_03-05-57-890.html aps_2010-02-02_03-06-01-046.html aps_2010-02-03_03-06-34-828.html aps_2010-02-04_03-06-30-343.html aps_2010-02-05_03-04-44-218.html aps_2010-02-06_02-31-31-968.html aps_2010-02-06_03-16-57-750.html aps_2010-02-06_12-02-37-792.html aps_2010-02-07_03-06-34-718.html aps_2010-02-08_03-08-46-125.html latest.html}; #getting size of pagelist ie number of elements $pagelist=@pagelist; #setting value=number of array elements $value=$pagelist; $value=$value-1; #taking last element off of value for url foreach $paper (@pagelist){ $url="http://operations/idslogs/$pagelist[$value]"; #my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # how will I count ip's and also need times $match_count+=()=($page=~/Memphis/g); $actris_count+=()=($page=~/ACTRIS/g); $sef_count+=()=($page=~/South East Florida/g); } #closing for loop so count doesn't get cleared { print "Logins for Memphis $match_count\t"; print "Logins for Actris $actris_count\t"; print "Logins for SEF $sef_count\t"; }
its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1.
#Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://someservername/TESTLOG.html"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; my $ipcount=0; my $match_count=()=($page=~/Memphis/g);{ print "Logins for Memphis $match_count\t"; } my $actris_count=()=($page=~/ACTRIS/g);{ print "Logins for Austin $actris_count\t"; } my $sef_count=()=($page=~/South East Florida/g);{ print "Logins for SEF $sef_count\t"; }
However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2
#Read in files for IDSLOGS. use:OLE; use IO::File; $dir= shift || '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; print $file; open (FH, "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>; close FH; foreach my $line (@file_lines) { #search for a match here like I do above # but it doesn't work why not? } } print "processed $count lines";
sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".

In reply to Re^2: parsing a directory of log files. by learn2earn
in thread parsing a directory of log files. by learn2earn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.