This program reads IIS logs and parses them into NCSA format in order to be processed by a statistics program. Again, I would like this done in the most efficient manner possible, so any suggestions to improve efficieny are more than welcome.
#!/usr/bin/perl # Microsoft IIS Log Format # fields are separated by commas # a hyphen '-' serves as placeholder if no valid data present # my $user_ip_addr; my $username; # username of user RFC931 ? my $date; # MM/DD/YY my $time; # H:MM:SS my $ms_service; # such as W3SVC1 my $server_hostname; # such as NTPUB1 my $server_ip_addr; my $time_elapsed; # in seconds my $bytes_received; my $bytes_sent; my $service_status_code; # HTTP code my $mswin_status_code; # MS Windows NT status code my $op_name; # such as GET, POST, HEAD, PUT my $op_target; # such as index.html my $junk; my $date_year; my $date_mon; my $date_dd; my @months = qw( NUL Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ) +; my $line; my $utc_offset = '-0500'; my $century = '20'; while (defined ($line = <STDIN>)) { chomp $line; $line =~ tr/\000//d; $line =~ s# "HTTP/1\.# HTTP/1.#; next if $line eq ''; ( $user_ip_addr, $username, $date, $time, $ms_service, $server_hostname, $server_ip_addr, $time_elapsed, $bytes_received, $bytes_sent, $service_status_code, $mswin_status_code, $op_name, $op_target, $junk ) = split(/, /, $line, 15); next if $op_target eq '' or ! defined $op_target; ($date_mon, $date_dd, $date_year) = split(m#/#, $date, 3); $century = ( $date_year < 90 ) ? '20' : '19'; $date_mon = sprintf("%02s",$date_mon); $date_dd = sprintf("%02s",$date_dd); $date_mon = $months[$date_mon]; # $time =~ s/([0-9]:[0-9][0-9]:[0-9][0-9])/0$1/; $time = sprintf "%08s", $time; #NCSA combined log format: #$remote_host $remote_logname $remote_user $time_commonlog "$reque +st" $status $bytes_sent "$http_referer" "$http_user_agent" print "$user_ip_addr $username - [$date_dd/$date_mon/$century$date +_year:$time $utc_offset] \"$op_name $op_target\" $service_status_code + $bytes_sent \"-\" \"-\"\n"; # print join(' ', # $user_ip_addr, # $username, # '-', # join('', '[', $date_dd, '/', $date_mon, '/', $century, # $date_year, ':', $time, ' ', $utc_offset, ']'), # join('', '"', $op_name, ' ', $op_target, '"'), # $service_status_code, # $bytes_sent, # '"-"', # HTTP_REFERER # '"-"' # HTTP_USER_AGENT # ); # print "\n"; }

In reply to Efficiency revisited by tekniko

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.