parsing a directory of log files.

learn2earn has asked for the wisdom of the Perl Monks concerning the following question:

doh, I was referencing in scalar context, not enough coffee yet today I guess. I tried the following but I am still not getting what I expected, not sure why this is occurring, I copied all my logfiles which are in html format to a webdirectory, I tried storing the file list in an array and then using www.mechanize to grab and parse the files. I expected this code would print out the url but instead I get a Hash address. I removed the rest of the files as retrieving one would be enough if I can get that to work properly I can easily add the rest to the arraylist.

 
use OLE;
use LWP;
use WWW::Mechanize;
use Mail::Sender;

$url="http://operations/idslogs/latest.html";
# Create a new instance of WWW::Mechanize
# enabling autoheck checks each request to ensure it was successful,
# producing an error if not.
my $mechanize = WWW::Mechanize->new(autocheck => 1);
 
# Retrieve the page

@pagelist={'aps_2009-11-10_11-30-44-204.html'};
foreach $page (@pagelist){
$url="http://operations/idslogs/$pagelist[0]";

#$mechanize->get($url);

print $url;

# Assign the page content to $page
my $page = $mechanize->content;
my $ipcount=0;

my $match_count=()=($page=~/Memphis/g);{

print "Logins for Memphis $match_count\t";

}
my $actris_count=()=($page=~/ACTRIS/g);{

print "Logins for Austin $actris_count\t";
}

my $sef_count=()=($page=~/South East Florida/g);{
print "Logins for SEF $sef_count\t";
}
}
[download]

Are Updates supposed to go in the top or in the bottom? anyway I tried to break down my example to something easier for me to understand and I am still having trouble with it. here is what I have now, which should open the files and print their lines,its not doing that either. strict bitches about everything and basically annoys the be-jesus out of me. I know I should get used to it and love it and all, but its frigging annoying when you make mistakes like I do to be constantly reminded of them

use IO::File;



$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
    next if $file=~/^\./;
        open (FH, ">", "$file") or die "Can't open file $file: $!\n";
        my @file_lines=<FH>;
        foreach my $line (@file_lines) {
        print $line "\n";
        }
     
                   
        }
[download]

I have some logfiles, I would like to process but its not working as expected, I probably missing something simple.. here is my attempt.

use:OLE;

use IO::File;

$dir= shift || '.';

opendir DIR, $dir or die "Can't open directory $dir: $!\n";

while ($file= readdir DIR) {
    next if $file=~/^\./;
        print $file;
    open (FH, "$file") or die "Can't open file $file: $!\n";
       
        my @file_lines=<FH>;
        
        #close FH;
        foreach my $line (@file_lines) {
         chomp();
         my $match_count=()=($line=~/Memphis/g);
         my $actris_count=()=($line=~/ACTRIS/g);
         my $sef_count=()=($line=~/South East Florida/g);{
         }
               
        }
       
}
print "Logins for Memphis $match_count \t", "Logins for Actris $actris
+_count \t", "Logins for SEF $sef_count\t";
[download]

Comment on parsing a directory of log files. Select or Download Code

Replies are listed 'Best First'.
Re: parsing a directory of log files. by shmem (Chancellor) on Feb 25, 2010 at 15:09 UTC
Turn on strict and warnings. You declared `$match_count, $actris_count, $sef_count` as my for the 'foreach' block - they are visible only there! The variables at the line `print "Logins for Memphis $match_count \t", "Logins for Actris $actris +_count \t", "Logins for SEF $sef_count\t";` [download] are completely unrelated package variables. BTW, `use:OLE;` [download] what's this? Bizarrely, that parses (without strict), but is doesn't make any sense whatsoever.	[reply] [d/l] [select]
Re: parsing a directory of log files. by cdarke (Prior) on Feb 25, 2010 at 14:57 UTC
Your count variables ($match_count, etc.) are declared inside the foreach loop, yet you try to print them outside. I suggest you declare them before the outer while loop and then: `$match_count +=()=($line=~/Memphis/g); $actris_count+=()=($line=~/ACTRIS/g); $sef_count +=()=($line=~/South East Florida/g);` [download] Not sure what the trailing block { } is after the last one, bad copy/paste? Oh, and the chomp will chomp $_ by default, you probably mean `chomp($line)` although it does not actually seem to be needed here. Update: ...and you are missing a `closedir(DIR);` after the while loop.	[reply] [d/l] [select]
Re: parsing a directory of log files. by Ratazong (Monsignor) on Feb 25, 2010 at 15:01 UTC
`foreach my $line (@file_lines) { chomp(); my $match_count=()=($line=~/Memphis/g);` [download] two observations: With the code above you declare `$match_count` as a local variable of the foreach-loop. Therefore it is not valid when you try to print it => best define that variable before the `while`-loop I suppose you want to count the lines containing `Memphis, ...`; however you overwrite all your intermediate values with the value of the last line of the file .. => think of using the += operator instead of the = HTH, Rata	[reply] [d/l] [select]
Re: parsing a directory of log files. by ssandv (Hermit) on Feb 25, 2010 at 21:12 UTC
Strict "bitches about" crappy programming practices, mostly. If you want help from other people, it's to your benefit learn to program in a way that quiets strict down. If you can't be bothered to do that, why should we be bothered to help? Asking for help but refusing to use strict is like asking how to get to the store without crossing the street when the store is on the other side.	[reply]
Re: parsing a directory of log files. by marto (Cardinal) on Feb 25, 2010 at 14:48 UTC
How do you 'expect' it to work? What does it actually do? See How do I post a question effectively?.	[reply]
Re^2: parsing a directory of log files. by learn2earn (Acolyte) on Feb 25, 2010 at 15:09 UTC
I now have it working using www.mechanize but want to add the following functionality.. check date, and check for login..if both exist it should increment count for that match and also store the login time. the next thing it should look for is logout. it should also count session ends, and capture login/logout times, which I am not sure how to do. anyways here is my code so far.. #Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://operations/idslogs/latest.html"; # url gets clobbered by pagelist added latest.html to end of array # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page # this is the list of logfiles to process @pagelist= qw{aps_2009-11-05_11-07-26-633.html aps_2009-11-10_11-30-44-204.html aps_2009-11-17_11-30-53-298.html aps_2009-11-24_11-31-02-401.html aps_2009-12-01_11-31-11-606.html aps_2009-12-08_11-31-20-751.html aps_2009-12-10_12-51-29-069.html aps_2009-12-10_12-57-31-621.html aps_2009-12-10_13-00-06-560.html aps_2009-12-10_13-09-05-531.html aps_2009-12-15_03-37-42-906.html aps_2009-12-17_15-50-58-140.html aps_2009-12-18_03-05-35-625.html aps_2009-12-19_03-06-39-703.html aps_2009-12-20_03-04-17-265.html aps_2009-12-21_03-05-41-125.html aps_2009-12-21_12-11-58-078.html aps_2009-12-22_03-07-48-265.html aps_2009-12-23_03-07-16-265.html aps_2009-12-23_17-00-05-997.html aps_2009-12-24_03-06-38-765.html aps_2009-12-25_03-06-42-734.html aps_2009-12-26_03-04-40-546.html aps_2009-12-27_03-05-33-125.html aps_2009-12-28_03-06-18-640.html aps_2009-12-29_03-06-48-937.html aps_2009-12-30_03-05-58-812.html aps_2009-12-31_03-05-24-000.html aps_2010-01-01_03-04-54-031.html aps_2010-01-02_03-05-29-421.html aps_2010-01-03_03-06-57-968.html aps_2010-01-04_03-06-08-046.html aps_2010-01-05_03-06-25-046.html aps_2010-01-06_03-07-26-953.html aps_2010-01-07_03-07-26-750.html aps_2010-01-08_03-08-19-859.html aps_2010-01-09_03-07-48-015.html aps_2010-01-10_03-07-46-906.html aps_2010-01-11_03-05-28-734.html aps_2010-01-12_03-07-37-265.html aps_2010-01-13_03-09-14-609.html aps_2010-01-14_03-07-46-328.html aps_2010-01-15_03-07-03-359.html aps_2010-01-16_03-06-19-421.html aps_2010-01-16_22-07-56-921.html aps_2010-01-17_03-06-47-812.html aps_2010-01-18_03-06-15-156.html aps_2010-01-19_03-06-50-250.html aps_2010-01-20_03-07-55-359.html aps_2010-01-21_03-09-13-843.html aps_2010-01-22_03-07-09-453.html aps_2010-01-23_03-06-24-343.html aps_2010-01-24_03-07-24-578.html aps_2010-01-25_03-08-38-812.html aps_2010-01-25_17-05-12-843.html aps_2010-01-26_03-07-15-750.html aps_2010-01-27_03-08-56-171.html aps_2010-01-28_03-07-54-078.html aps_2010-01-28_10-37-28-218.html aps_2010-01-29_03-04-43-703.html aps_2010-01-30_03-03-58-640.html aps_2010-01-31_01-41-31-125.html aps_2010-01-31_03-05-49-359.html aps_2010-02-01_03-05-57-890.html aps_2010-02-02_03-06-01-046.html aps_2010-02-03_03-06-34-828.html aps_2010-02-04_03-06-30-343.html aps_2010-02-05_03-04-44-218.html aps_2010-02-06_02-31-31-968.html aps_2010-02-06_03-16-57-750.html aps_2010-02-06_12-02-37-792.html aps_2010-02-07_03-06-34-718.html aps_2010-02-08_03-08-46-125.html latest.html}; #getting size of pagelist ie number of elements $pagelist=@pagelist; #setting value=number of array elements $value=$pagelist; $value=$value-1; #taking last element off of value for url foreach $paper (@pagelist){ $url="http://operations/idslogs/$pagelist[$value]"; #my $mechanize = WWW::Mechanize->new(autocheck => 1); $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; # how will I count ip's and also need times $match_count+=()=($page=~/Memphis/g); $actris_count+=()=($page=~/ACTRIS/g); $sef_count+=()=($page=~/South East Florida/g); } #closing for loop so count doesn't get cleared { print "Logins for Memphis $match_count\t"; print "Logins for Actris $actris_count\t"; print "Logins for SEF $sef_count\t"; } [download] its supposed to return the number of matches for each regex, basically just to count the logins for remote users, but it doesn't appear to be doing that at all. I am wondering if it would be easier to read in the files one by one using www.mechanize, I can put them in a web directory easily enough but then I am not sure how to get all the files in the directory that I want. I was trying to combine this code with some other code to accomplish something I need. example code 1. #Include the WWW::Mechanize module use OLE; use LWP; use WWW::Mechanize; use Mail::Sender; $url="http://someservername/TESTLOG.html"; # Create a new instance of WWW::Mechanize # enabling autoheck checks each request to ensure it was successful, # producing an error if not. my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); # Assign the page content to $page my $page = $mechanize->content; my $ipcount=0; my $match_count=()=($page=~/Memphis/g);{ print "Logins for Memphis $match_count\t"; } my $actris_count=()=($page=~/ACTRIS/g);{ print "Logins for Austin $actris_count\t"; } my $sef_count=()=($page=~/South East Florida/g);{ print "Logins for SEF $sef_count\t"; } [download] However I have an entire directory of these files to process and they are in html format, so I was hoping to change it to use readdir and glob them from there but I must be doing it wrong. my logfiles contain..the text I am looking for and there should be plenty of matches but I am getting zero, so it must not be looking at the actual lines. code 2 `#Read in files for IDSLOGS. use:OLE; use IO::File; $dir= shift \|\| '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; print $file; open (FH, "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>; close FH; foreach my $line (@file_lines) { #search for a match here like I do above # but it doesn't work why not? } } print "processed $count lines";` [download] sample from log file.. 2010-02-14 15:23:37.992 Memphis on servernamechangedtoprotecttheinnocent, aps (1500) User Memphis logged on to session "Logon372 on servernamechangedtoprotecttheinnocent", and the session was renamed to "Memphis on servernamechangedtoprotecttheinnocent".	[reply] [d/l] [select]
Re^3: parsing a directory of log files. by marto (Cardinal) on Feb 25, 2010 at 15:25 UTC
The point I was raising is that we shouldn't have to draw information out of you piecemeal, you've been posting here for years, you aren't a new user, you should know how this works by now. If you can't be bothered to explain what you think is going wrong (again see How do I post a question effectively?), why should anyone bother to spend their free time to investigate your problem? Generally the better you describe your problem (a proper description of the problem, example code, sample input data, sample output data etc) the better response you'll get.	[reply]
Re: parsing a directory of log files. by graff (Chancellor) on Feb 26, 2010 at 04:18 UTC
It seems like there are so many problems with the OP code that the replies so far simply haven't been able to cover them all. There are still a couple whoppers that no one has pointed out yet. The first OP snippet starts out like this: `$dir= shift \|\| '.'; opendir DIR, $dir or die "Can't open directory $dir: $!\n"; while ($file= readdir DIR) { next if $file=~/^\./; open (FH, ">", "$file") or die "Can't open file $file: $!\n"; my @file_lines=<FH>;` [download] You truncate/open a file for output and then try to read from it? I hope you made a backup of the directory before you ran that script with $dir set to "." because the script would have obliterated all the data files. (Running it with some directory name in @ARGV would have saved your input data from oblivion, but wouldn't have gotten anything done.) The second snippet avoids that problem, but still shares another problem with the first version: if $dir is set to something other than "." (i.e. via @ARGV), the open statement would need to be like this in order to do what you want: `open (FH, "$dir/$file") ...` [download] Luckily, when you do it that way, it still works when $dir is set to "." Put all that together with the other replies, and you should get pretty close to a working script.	[reply] [d/l] [select]