Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Deleting lines prior to a date!

by Anonymous Monk
on May 09, 2006 at 13:53 UTC ( [id://548211]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I have this program that generated this very large log file, it is now actually about 75MB long. What I am trying to do here is delete everything prior to a certain date, my simple regular expression finds any date but how can I accomplish once it is found the date delete everything else prior to that date and keep the rest of the file. This small program only shows a sample of what I am trying to do, testing the regular expression. The actual log file looks like the commented code below.
#!/perl/bin/perl -w use strict; use warnings; use CGI qw/:standard/; use CGI::Carp qw(fatalsToBrowser); print header(); my $text = "TICKETHELP###,########,1234567###,2005,X,Y,356.00###,2006- +5-8 17:44:44,TICKETHELP###########1234567###2005XY356.00###,World Cup + List,John Marck"; if($text=~/(2006-5-8)/g) { print "found = $1"; }else{ print "Nothing"; } =comment TICKETHELP###,########,1234567###,2004,X,Y,35.00###,2004-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,16.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,0933456###,2005,X,Y,33.00###,2005-5-7 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck =cut


Thanks for the Help!

Replies are listed 'Best First'.
Re: Deleting lines prior to a date!
by McDarren (Abbot) on May 09, 2006 at 14:54 UTC
    If it's a logfile, then one would assume that the entries are in chronological order (although you sample data isn't).

    Anyway, if that's the case, you could simply open the file, then read it line-by-line using the diamond operator (<>). Skip every line until you reach the date you are interested in, and then print the remaining lines to another (new) file.

    Something like this (untested):

    #!/usr/bin/perl -w use strict; my $infile = 'some_logfile'; my $outfile = 'new_logfile'; my $datematch = qr(2005-5-3); # or whatever open IN, "<", $infile or die "Cannot open $infile:$!\n"; open OUT, ">", $outfile or die "Cannot open $outfile:$!\n"; while (<IN>) { next if !/$datematch/; print OUT $_; } close IN; close OUT;
    Cheers,
    Darren :)
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Deleting lines prior to a date!
by SamCG (Hermit) on May 09, 2006 at 14:18 UTC
    Look at Date::Calc for date arithmetic. You could print out only the lines you want to a separate file, then delete or back up the original.


    -----------------
    s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Deleting lines prior to a date!
by Hue-Bond (Priest) on May 09, 2006 at 14:16 UTC

    How about using the range operator?

    $ cat foolog TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44 +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John M +arck TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck $ perl -ne 'print if /2005,9,10/ .. eof' foolog TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44 +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John M +arck TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4 +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Marck

    (Note that the years in your dates are not sorted).

    Update: Oops, I didn't see the second date, that's why I mangled the data. Using the original data, it's almost the same but changing the regex:

    $ perl -ne 'print if /2005-5-4/ .. eof' foolog TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck

    --
    David Serrano

Re: Deleting lines prior to a date!
by SamCG (Hermit) on May 09, 2006 at 17:05 UTC
    I think you need something more like:
    #! perl -w use strict; use Date::Calc qw(Delta_Days); my $outfile = 'new_logfile.txt'; my ($yr, $mo, $day) = qw/2006 1 1/; open OUT, ">", $outfile or die "Cannot open $outfile:$!\n"; while (<DATA>) { my ($line_year, $line_mon, $line_day) = /(\d{4})-(\d{1,2})-(\d{1,2}) +/; ## you need the date, unless you can rely on ordered dates. ## Your sample suggests they're unordered, so I'm going with that. ## if they're ordered, there are easier ways to do this my $delta_days = Delta_Days($line_year,$line_mon,$line_day,$yr,$mo,$ +day); print $delta_days; next if $delta_days > 0; print OUT $_; } close OUT; __DATA__ TICKETHELP###,########,1234567###,2004,X,Y,35.00###,2004-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,16.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,0933456###,2005,X,Y,33.00###,2005-5-7 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,87.00###,2005-5-4 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1984567###,2005,X,Y,32.00###,2005-3-8 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1223698###,2000,X,Y,79.00###,2000-1-1 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1299n67###,2001,X,Y,26.00###,2001-5-5 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck
    produces:
    TICKETHELP###,########,1234567###,2006,X,Y,23.00###,2006-1-3 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck TICKETHELP###,########,1234567###,2005,X,Y,67.00###,2006-5-2 17:44:44, +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John Ma +rck
    from your data. You could also modify this to do other things like pick a specific date in the past or accept an argument as a date. I'd also suggest you look into zipping your files (it's not hard to do with perl, and can save a lot of space).



    -----------------
    s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
      Thanks I am looking into that!
Re: Deleting lines prior to a date!
by davidj (Priest) on May 10, 2006 at 02:46 UTC
    Use the range operator: start at line 1 and stop when you reach the date you are looking for.
    As others have assumed, this assumes that the lines are in calendar order.
    www:davidj test > cat t.txt TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44 + +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John + M +arck TICKETHELP###,########,1984567###,2005,11,12,32.00###,2005-3-8 17:44:4 + +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh +n +Marck TICKETHELP###,########,1234567###,2005,13,14,67.00###,2006-5-2 17:44:4 + +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh +n +Marck TICKETHELP###,########,1223698###,2000,15,16,79.00###,2000-1-1 17:44:4 + +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh +n +Marck TICKETHELP###,########,1299n67###,2001,17,18,26.00###,2001-5-5 17:44:4 + +4,TICKETHELP###########1234567###2005XY356.00###,World Cup List,Joh +n +Marck www:davidj test > cat t.pl #!/usr/bin/perl # open(FILE, "<t.txt"); open(OUT, ">out.txt"); while(<FILE>) { print OUT $_ if 1 .. /2005-5-4/; } close(OUT); close(FILE); www:davidj test > perl t.pl www:davidj test > cat out.txt TICKETHELP###,########,1234567###,2004,1,2,35.00###,2004-5-2 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2005,3,4,16.00###,2005-3-8 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2006,5,6,23.00###,2006-1-3 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,0933456###,2005,7,8,33.00###,2005-5-7 17:44:44, + +TICKETHELP###########1234567###2005XY356.00###,World Cup List,John +Ma +rck TICKETHELP###,########,1234567###,2005,9,10,87.00###,2005-5-4 17:44:44 + +,TICKETHELP###########1234567###2005XY356.00###,World Cup List,John + M +arck www:davidj test >
    Of course the line with the date you are looking for is included, but I'm sure you can figure out how to remove it.

    davidj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://548211]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2024-03-29 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found