Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks!
I have this program that generated this very large log file, it is now actually about 75MB long.
What I am trying to do here is delete everything prior to a certain date, my simple regular expression finds any date but how can
I accomplish once it is found the date delete everything else prior to that date and keep the rest of the file.
This small program only shows a sample of what I am trying to do, testing the regular expression. The actual log file looks like the commented code below.
#!/perl/bin/perl -w
use strict;
use warnings;
use CGI qw/:standard/;
use CGI::Carp qw(fatalsToBrowser);
print header();
my $text = "TICKETHELP###,########,1234567###,2005,X,Y,356.00###,2006-
+5-8 17:44:44,TICKETHELP###########1234567###2005XY356.00###,World Cup
+ List,John Marck";
if($text=~/(2006-5-8)/g)
{
print "found = $1";
}else{
print "Nothing";
}
=comment
TICKETHELP###,########,1234567###,2004,X,Y,35.00###,2004-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,16.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,0933456###,2005,X,Y,33.00###,2005-5-7 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
=cut
Thanks for the Help!
Re: Deleting lines prior to a date!
by McDarren (Abbot) on May 09, 2006 at 14:54 UTC
|
If it's a logfile, then one would assume that the entries are in chronological order (although you sample data isn't).
Anyway, if that's the case, you could simply open the file, then read it line-by-line using the diamond operator (<>). Skip every line until you reach the date you are interested in, and then print the remaining lines to another (new) file.
Something like this (untested):
#!/usr/bin/perl -w
use strict;
my $infile = 'some_logfile';
my $outfile = 'new_logfile';
my $datematch = qr(2005-5-3); # or whatever
open IN, "<", $infile or die "Cannot open $infile:$!\n";
open OUT, ">", $outfile or die "Cannot open $outfile:$!\n";
while (<IN>) {
next if !/$datematch/;
print OUT $_;
}
close IN;
close OUT;
Cheers,
Darren :) | [reply] [Watch: Dir/Any] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Deleting lines prior to a date!
by SamCG (Hermit) on May 09, 2006 at 14:18 UTC
|
Look at Date::Calc for date arithmetic. You could print out only the lines you want to a separate file, then delete or back up the original.
-----------------
s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
| [reply] [Watch: Dir/Any] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Deleting lines prior to a date!
by Hue-Bond (Priest) on May 09, 2006 at 14:16 UTC
|
$ cat foolog
TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44
+,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John M
+arck
TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
$ perl -ne 'print if /2005,9,10/ .. eof' foolog
TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44
+,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John M
+arck
TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4
+4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Marck
(Note that the years in your dates are not sorted).
Update: Oops, I didn't see the second date, that's why I mangled the data. Using the original data, it's almost the same but changing the regex:
$ perl -ne 'print if /2005-5-4/ .. eof' foolog
TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Deleting lines prior to a date!
by SamCG (Hermit) on May 09, 2006 at 17:05 UTC
|
I think you need something more like:
#! perl -w
use strict;
use Date::Calc qw(Delta_Days);
my $outfile = 'new_logfile.txt';
my ($yr, $mo, $day) = qw/2006 1 1/;
open OUT, ">", $outfile or die "Cannot open $outfile:$!\n";
while (<DATA>) {
my ($line_year, $line_mon, $line_day) = /(\d{4})-(\d{1,2})-(\d{1,2})
+/;
## you need the date, unless you can rely on ordered dates.
## Your sample suggests they're unordered, so I'm going with that.
## if they're ordered, there are easier ways to do this
my $delta_days = Delta_Days($line_year,$line_mon,$line_day,$yr,$mo,$
+day);
print $delta_days;
next if $delta_days > 0;
print OUT $_;
}
close OUT;
__DATA__
TICKETHELP###,########,1234567###,2004,X,Y,35.00###,2004-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,16.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,0933456###,2005,X,Y,33.00###,2005-5-7 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
produces:
TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44,
+TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma
+rck
from your data. You could also modify this to do other things like pick a specific date in the past or accept an argument as a date. I'd also suggest you look into zipping your files (it's not hard to do with perl, and can save a lot of space).
-----------------
s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks I am looking into that!
| [reply] [Watch: Dir/Any] |
Re: Deleting lines prior to a date!
by davidj (Priest) on May 10, 2006 at 02:46 UTC
|
Use the range operator: start at line 1 and stop when you reach the date you are looking for. As others have assumed, this assumes that the lines are in calendar order.
www:davidj test > cat t.txt
TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44
+ +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+ M +arck
TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4
+ +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh
+n +Marck
TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4
+ +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh
+n +Marck
TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4
+ +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh
+n +Marck
TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4
+ +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh
+n +Marck
www:davidj test > cat t.pl
#!/usr/bin/perl
#
open(FILE, "<t.txt");
open(OUT, ">out.txt");
while(<FILE>) {
print OUT $_ if 1 .. /2005-5-4/;
}
close(OUT);
close(FILE);
www:davidj test > perl t.pl
www:davidj test > cat out.txt
TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44,
+ +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+Ma +rck
TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44
+ +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John
+ M +arck
www:davidj test >
Of course the line with the date you are looking for is included, but I'm sure you can figure out how to remove it.
davidj | [reply] [Watch: Dir/Any] [d/l] |
|
|