Sorting a logfile by date to print to html

PerlGoblin has asked for the wisdom of the Perl Monks concerning the following question:

I apologize in advance for the obvious display of ignorance in this question. I have searched for a previously posted solution to this problem but have yet to find one that works. I have a cgi script that opens a log file and assigns each ~ delimited value to a variable. It then prints each line of the log file in a particular format in html. One of the variables in the logfile is date. I'm wondering if there is a way this script can open the file, sort by the $date variable, and print the log line item with the most current date first. If it has to rewrite the logfile, that is fine. I have read several comments about creating a temp file and rewriting it, etc., but would really like to find a way to sort by date prior to printing the html. Below is a line from the log file and a copy of the code that opens the log file and prints: The Dow Dips to All-Time Low ~ http://www.yahoofinance.com ~ Yahoo Finance ~ 09/15/01 ~ The Dow Jones dipped to an all-time....

  $from = "news.dat";

open(FILE, "$from") || die "Can't open $from!\n";
while (<FILE>) {
   chop;
   ($title,$url,$site,$date,$description) = split(/~/,$_);
    
    print "<table width=400><tr>";
    print "<td><a href=\"$url\"><font color=blue>$title</font></a><br>
+<font color=black>$date,$site<br>$description</font></td>";
    print "</tr></table><br>";
[download]

Comment on Sorting a logfile by date to print to html Select or Download Code

Replies are listed 'Best First'.
Re: Sorting a logfile by date to print to html by CubicSpline (Friar) on Oct 09, 2001 at 21:59 UTC
Here's my approach to this problem. I'd use a hash to store this information, which will make it easy to sort on and quick to get back out when it's time to print. Here's an example of how'd I'd do what I think you're asking: $from = "news.dat.txt"; my %h; open(FILE, "$from") \|\| die "Can't open $from!\n"; while (<FILE>) { chomp; ($title,$url,$site,$date,$description) = split(/~/,$_); # if the date has already been seen, tack this entry on to the hash + value if( $h{$date} ){ $h{$date} .= "\|$title~$url~$site~$description"; } # date hasn't been seen, so create a new hash entry else { $h{$date} = "$title~$url~$site~$description"; } } #sort the dates in desc. order foreach $key (sort {$b cmp $a} keys %h) { #do for each item for that date @items = split '\\|', $h{$key}; foreach $item (@items) { #split back out the values and print the table my($title,$url,$site,$description) = split /~/,$item; print "<table width=400><tr>"; print "<td><a href=\"$url\"><font color=blue>$title</font></a> +<br><font color=black>$key,$site<br>$description</font></td>"; print "</tr></table><br>"; } } [download]	[reply] [d/l]
Re: Re: Sorting a logfile by date to print to html by Anonymous Monk on Oct 10, 2001 at 00:59 UTC
Thanks a million for all the help. I ended up using Cubicspline's suggestion and it worked perfectly. Now, I just need to disect the code so that I have a better understanding on how hashes work. Thanks.	[reply]
Re: Sorting a logfile by date to print to html by mikeB (Friar) on Oct 09, 2001 at 21:23 UTC
How large is your logfile, compared to available memory? If you can keep the whole thing in memory, check out perl's sort command. I prefer to sort dates by formatting them yyyymmdd, at which point they can be numerically sorted (or character sorted, too, in ASCII :) There may be better ways, but this one's quick and easy. CPAN has numerous date functions that may be of help.	[reply]
Re: Sorting a logfile by date to print to html by cjensen (Sexton) on Oct 09, 2001 at 21:50 UTC
Your date format isn't perfect for sorting, but you can pipe your file through the unix sort command using ~ as your separator, on the fourth (0, 1, 2, 3) column, ignoring blank spaces: sort -b -t '~' +3 That's assuming your OS has a compatible 'sort'. You can reverse it for descending order: sort -b -r -t '~' +3 It would be better if date was in a fully sortable format like yyyymmdd, as referenced above, or even yy/mm/dd How about this mess: cat <log file> \| \ perl -p -e 's\|(\d\d)/(\d\d)/(\d\d)\|$3/$1/$2\|' \| \ sort -r -b -t '~' +3 \| \ perl -p -e 's\|(\d\d)/(\d\d)/(\d\d)\|$2/$3/$1\|' Note: You can do a lot on the command-line, but it's not always the best way to do things. If you want to parse a few decades of historical data and aren't worried about your script being around in 50 years or so, you can go way overboard: cat <log file> \| \ perl -p -e 's\|(\d\d)/(\d\d)/(\d\d)\|($3 > 50 ? "19$3" : "20$3")."$1$2"\|e' \| \ sort -r -b -t '~' +3 \| \ perl -p -e 's\|(\d\d)(\d\d)(\d\d)(\d\d)\|$3/$4/$2\|'	[reply]