in reply to Re: Deleting a matching string in an array
in thread Deleting a matching string in an array

Thanks, I did the double slashes since I am using a unc path and did it the way I learned like 9 years ago in a PERL class.

I got the code to remove the unwanted lines but it is not elegant, but I wanted to post it for more comments and advice on how to improve.

I did try a switch statement but could not get it to work so just built the ugly if statement

Thanks again for the help and direction


#!perl #use strict; use Text::CSV_XS; use warnings; use FileHandle; # declarations my($log_file_path)="\\\\dt00mx84\\LogArchive\\www.ksdot.org\\dt00m +h77\\"; my($robot_file)="\\\\dt00mx84\\LogArchive\\Webrobots.txt"; my($second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $day +OfWeek, $dayOfYear, $daylightSavings) = localtime(); my$x="ex"; my($extension)=".log"; my($year)=1900+$yearOffset; my($month_new)=1+$month; my $day=$dayOfMonth-1; my @logmessages; # build filename format # this is how log files are named using the yy/mm/dd formatex040101.lo +g if (length($month_new)< 2) { $month_new="0".$month_new }; if (length($dayOfMonth)< 2) { $dayOfMonth="0".$dayOfMonth }; # build input file name and file path to read from my($filename)=$x.substr($year,2).$month_new.$day; my($log_file)=$log_file_path.$filename.$extension; # build output file name and path to write file my($file_name)=substr($year,2).$month_new.$day; my($out_file)=$log_file_path.$file_name.$extension; # Declare the FileHandles and open the input and output files my $fh= new FileHandle; open(LOG, "<$log_file") or die "Could not open file"; @logmessages=<LOG>; close(LOG); my $outfile= new FileHandle; open(OUTFILE, ">$out_file") or die "Could not open file"; foreach $LogLine (@logmessages){ if(($LogLine!~/Slurp/) && ($LogLine!~/Jeeves/)&&($LogLine!~/Go +oglebot/)&&($LogLine!~/FunWebProducts/)&&($LogLine!~/msnbot.htm/)&&($ +LogLine!~/PeoplePal/)&&($LogLine!~/ventura5/)&&($LogLine!~/Speedy/)&& +($LogLine!~/GovDelivery/)&&($LogLine !~ /gif/)&&($LogLine !~ /jpg/)&& +($LogLine !~ /ico/)&&($LogLine !~ /css/)&&($LogLine !~ /js/)&&($LogLi +ne !~ /archive/)&&($LogLine !~ /CazoodleBot/)&&($LogLine !~ /WebTrend +s/)&&($LogLine !~ /ShopWiki/)&&($LogLine !~ /Ultraseek/)&&($LogLine ! +~ /msrbot/)&&($LogLine !~ /Moskow/)&&($LogLine !~ /Gigabot/)) { print OUTFILE $LogLine; } }

Replies are listed 'Best First'.
Re^3: Deleting a matching string in an array
by Cristoforo (Curate) on Oct 30, 2007 at 20:48 UTC
    Why not use grep as Moritz pointed out? If you loaded the robot file into the @robots array, you would need to remove the newlines like chomp(@robots) before you used it in the grep. Your date string could be stated like:
    my ($day, $month, $year) = (localtime)[3..5]; # $date is YYMMDD format - you may want $day - 1? my $date = sprintf "%02d%02d%02d", $year % 100, $month + 1, $day; my $log_file_path = '/dt00mx84/LogArchive/www.ksdot.org/dt00mh77/'; my $log_file = $log_file_path . 'ex' . $date. '.log'; my $out_file = $log_file_path . $date. '.log';

    Chris

    Update: Not_a_Number nailed it. Didn't think about day-1 being yesterday.

      I kept getting an error: Nested quantifiers in regex; marked by <-- HERE in m/#Software: Microsoft Internet Information Services 6.0

      I figured I would get it to remove one of the filters and not get the error then move ahead from there.

      Being new to using PERL for this stuff I figured to get the code working and improve on it from there.

      I am going to use these suggestions and get it working but the ugly fix I came up with satisfies my supervisor.

        That probably means that one (or more) of your robot identifier strings contain *+ or another illegal combination of *+?{...}.

        If you look at my original posting you'll see that it maps the strings through quotemeta, which escapes those characters, making them safe to use in a regex.

Re^3: Deleting a matching string in an array
by Not_a_Number (Prior) on Oct 30, 2007 at 22:42 UTC
    I wanted to post it for more comments and advice on how to improve ;)

    Well, I can see several areas for improvement, notably the huge if block at the end.

    (BTW, why do you declare a variable $robot_file that you never subsequently use? And why have you commented out use strict; at the beginning of your program?)

    Immediately, however, one thing that springs to mind is this line:

    my $day=$dayOfMonth-1;

    It seems to me, in the context, that you're trying here to use this to get the date of the previous day. If I'm wrong, please excuse me, but if not, what do you think will happen, for example on November 1st (not to mention Jan 1st, or March 1st, when the previous day might be either Feb 28 or Feb 29)?

    In fact, 'How do I find yesterday's date?' is a faq (How do I find yesterday's date?). With Perl (not PERL, by the way), there are modules out there that deal with heaps of similar problems, and save you from reinventing the proverbial wheel :).

    Here's one way, based on the answers to the faq, to create the sort of string that you seem to require, using the DateTime module (Don't hesitate, BTW, to use whitespace in your code, to make it more human-legible):

    use DateTime; my $yesterday = DateTime->today->subtract( days => 1 )->ymd( '' ); my ( $prefix, $extension ) = ( 'ex', '.log' ); my $log_file = $prefix . substr( $yesterday, 2 ) . $extension;