Given that your Anonymous, I doubt this result will ever be found. However, in some of my spare time I put together a script that indexes the apache log file like I suggested in my post to you

I also converted the script to use DateTime instead of Date::Manip as the former is faster.

Note, the script took about 41 minutes per gig of data on my development machine, but then takes an infinitesimal amount of time for any subsequent run given the data is ordered and indexed.

#!/usr/bin/perl use DateTime; use Fcntl qw(:seek); use strict; use warnings; my $infile = 'access_log'; my $indexfile = $infile . '.pst'; my $outfile = 'file.txt'; my %mon = do { my $i = 1; map {$_ => $i++} qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov De +c) }; my $search_date = DateTime->now()->subtract(days => 10)->strftime("%Y% +m%d"); print "Search date is $search_date\n"; # Get Last Index Location my @index_last = ('', 0); my @index_start; my @index_stop; if (-e $indexfile) { open my $fh, $indexfile or die "$indexfile: $!"; while (<$fh>) { chomp; @index_last = split "\t"; @index_stop = @index_last if @index_start && !@index_stop; @index_start = @index_last if $index_last[0] eq $search_date; } } open my $oh, '>', $outfile or die "$outfile: $!"; open my $ih, $infile or die "$infile: $!"; my ($lastday, $index) = @index_start ? @index_start : @index_last; seek $ih, $index, SEEK_SET; while (<$ih>) { my $day; # If w/i indexes, no need to reparse day if (@index_stop) { # End reached last if $index >= $index_stop[1]; $day = $search_date; # Parse Date } elsif (m{\[(\d+)/(\w+)/(\d+):(\d+):(\d+):(\d+)\s+([+-]\d+)}) { my $dt = DateTime->new( year => $3, month => $mon{$2}, day => $1, hour => $4, minute => $5, second => $6, time_zone => $7, ); $dt->set_time_zone('America/Los_Angeles'); $day = $dt->strftime("%Y%m%d"); } else { warn "Invalid date on: $_"; next; } # New Date if ($day ne $lastday) { # Add to index if necessary if ($day > $index_last[0]) { @index_last = ($day, $index); open my $oh, '>>', $indexfile or die "$indexfile: $!"; print $oh join("\t", @index_last), "\n"; close $oh; } # End if past search date last if $day > $search_date; print "Processing $day, $index on " . scalar(localtime) . "\n" +; $lastday = $day; } # Matches search date if ($day eq $search_date) { # Do whatever print $oh $_; } $index = tell $ih; } close $ih; __END__

In reply to Re: Optimize apache access_log parsing with index file by wind
in thread Optimise the script by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.