The Squid log entries look like this (Yes, these are real entries):use strict; use warnings; use File::Basename; use File::stat; use File::Find; use Cwd; my ($root) = getcwd =~ /(.*)/; my $total; find( { untaint_pattern=>'.*', no_chdir => 1, wanted => sub { return unless /MyFoo.*\z/; my $v_snap_file = $File::Find::name; my $basefile = basename($v_snap_file); # I know this is evil, it's a hack. my $count = `/bin/grep $basefile /var/log/squid/access.log | /usr/bin/wc -l`; $count =~ s/^\s+//g; my $v_sb = stat("$v_snap_file"); my $v_filesize = $v_sb->size; my $v_bprecise = sprintf "%.0f", ($v_filesize); my $v_bsize = insert_commas($v_bprecise); my $v_kprecise = sprintf "%.0f", ($v_filesize/1024); my $v_ksize = insert_commas($v_kprecise); my $v_filedate = scalar localtime $v_sb->mtime; my $basename_v = basename($v_snap_file); print "File Name..: $basename_v\n"; print "File Size..: $v_bsize bytes ($v_ksize kb)\n"; print "Downloads..: ", insert_commas($count); my $tbytes = $v_filesize * $count; print "Total bytes: ", insert_commas($tbytes), "\n\n"; $total += $tbytes; } }, $root); print "\n", "-"x40, "\n"; print "Final total bytes: ", insert_commas($total), "\n\n"; sub insert_commas { my $text = reverse $_[0]; $text =~ s/(\d{3})(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; }
The numeric value right before the "TCP_MISS:DIRECT" is the file size. Notice that this generated two hits for what basically is one download. The real final file size for 'file.zip' is 8224380 bytes; just a little over 8 megs.wdcsun28.usdoj.gov - - [07/Aug/2003:04:58:15 -0700] "GET http://dl.dom +ain.org/MyFoo-file.zip HTTP/1.0" 200 1607158 TCP_MISS:DIRECT wdcsun28.usdoj.gov - - [07/Aug/2003:05:03:33 -0700] "GET http://dl.dom +ain.org/MyFoo-file.zip HTTP/1.0" 200 8224380 TCP_MISS:DIRECT
When I count these hits in the logs, and generate the stats for the number of bytes downloaded, I'd like to ignore the ones that are not "full" file downloads, by looking at that file size.
Any ideas how I can do this? The code above works, it just counts ALL hits in the logs, not "completed" hits in the logs. Did that make sense?
In reply to File download statistics parsing by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |