alexolivan has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody.... my first post...

Well, having no idea of perl, I gave it a shot, but I'm obviously missing it, so is time to ask for help on the comunity.

Actually I'm trying to get AWSTATS to better work with shoutcast w3c logs, so it can squeeze al juice from it. Actually somebody on the shoutcast comunity created a perl script to parse shout w3c log files so it can be awstats readable, but it still misses a key part of info: the player used.

So, I have a w3c file, perl correctly parsed, but I need to further parsing it:
The 7th word/string of each line on the log file, is the player.
Problem: the string is mostly a ugly, long, barely useable word.
Target: replace it to a useable one, looking for some popular players, and, for the rest of cases replace the string to simply OTHER or UNKNOWN
The approach: this 7th word starts always with the key letters (for instance, for VLC player, we read vlc%2F1%2E1%2E5, while an iTunes entry yelds iTunes%2F10%2E3%2E1%20%28Macintosh%3B%20Intel%20Mac%20OS%20X%2010%2E6%2E7%29%20AppleWebKit%2F533%2E21%2E1...and so on) Since the amount of available players is overkill, the idea is create some logic to compare this first occurences at the start of the string with some limited number of matchings, secuencially, replacing as necessary, and if finally none is matched, replace whatever string with the final one (unknown, other, or so...)

Here a sample of how a rew log line reads
95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream?title=Unknown 200 vlc%2F1%2E1%2E5 19261930 1193 129160 Here hot it reads after correct awstats useable parsing
95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream?title=Unknown 200 vlc%2F1%2E1%2E5 19261930 1193 129160 GET And here how it should read (for instance)
/stream?title=Unknown 200 VLC 19261930 1193 129160 GET

And finally the sc_parse.pl script (all credits to its author!!!!) that does the trick, with some of my nonworking mods comented out:

#!/usr/bin/perl -w # -*- cperl - # # Parse ShoutCast v1.9.8 W3C log and append "GET" to each log line to +pretend it's a web logfile # # Usage: perl sc_parse.pl -c /full/path/to/shoutcast/sc_w3c.log # # Written by Ryan Gehrig # use Getopt::Std; our $opt_c; getopts('c:'); # Open the log file if (-e $opt_c) { print "Parsing log '$opt_c' ...\n"; open FILE, "$opt_c" or die "ERROR: Failed to open log file\n"; my @lines = <FILE>; foreach(@lines) { # Ignore comments if("$_" !~ /^\#/) { # Lose newlines $_ =~ s/\n//; #### Here I start editing/adding # First approach: # This works, but I cant add such a line for # every known and future player! there should be a way # to get the 7th word replaced anyway: # Look for flashplayer #$_ =~ s/MPEG\%20OVERRIDE/FlashPlayer/; # More realistic approach: # Obviously it fails for an unexperienced programmer as me +: # 7th string are players, focus on it #my @words = split(' ', $_); #if(@words[6] =~ m/MPEG\%20OVERRIDE/){ # @words[6] = "FlashPlayer"; #} #elsif(@words[6] =~ m/^vlc/){ # @words[6] = "VLC"; #} #I could add more occurences here #else { # @words[6] = "other"; #} #### Here ends my editing/adding # Write this line to new log file "sc_w3c.log_x" open (MYFILE, '>>'.$opt_c.'_x'); print MYFILE "$_" . ' GET' . "\n"; close (MYFILE); } } close(FILE); } else { print "ERROR: The specified log file doesnt exist. Exiting.\n"; }

Should I have it working... and if I manage AWSTATS to use the resulted log as intended, this would be incredible!!!

Replies are listed 'Best First'.
Re: a perl, awstats and SHOUTCast history
by Anonymous Monk on Mar 27, 2012 at 11:41 UTC

    code tags are for both code and data, anyway, use CGI

    #!/usr/bin/perl -- use strict; use warnings; Main( @ARGV ); sub Main{ my $raw = q{95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream?ti +tle=Unknown 200 vlc%2F1%2E1%2E5 19261930 1193 129160}; my $awmost = q{95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream +?title=Unknown 200 vlc%2F1%2E1%2E5 19261930 1193 129160 GET}; my $wanted = q{/stream?title=Unknown 200 VLC 19261930 1193 129160 +GET}; print " $raw $awmost $wanted "; w3clogToAwstats( \$raw , \*STDOUT ); #~ w3clogToAwstats( #~ '/full/path/to/shoutcast/sc_w3c.log', #~ '/same/for/new/awstats_sc_w3c.log', #~ ); } sub w3clogToAwstats { my( $infile, $outfile ) = @_; open my($in), '<', $infile or die "Cannot open($infile): $!"; open my($out), '>', $outfile or die "Cannot open($outfile): $!"; while(<$in>){ my(@F)=split ' ', $_; #~ use DDS; Dump\@F; my($ipa,$ipb, $date,$time, $halfurl, $status, $player, $longnu +m, $shortnum, $mednum ) = @F; use CGI::Util q/unescape/; $player = unescape($player); print $out "$halfurl $status $player $longnum $shortnum $mednu +m GET\n"; } close $in; close $out; } __END__ $ perl awjunk 95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream?title=Unknown 200 +vlc%2F1%2E1%2E5 19261930 1193 129160 95.61.50.98 95.61.50.98 2011-07-14 16:04:17 /stream?title=Unknown 200 +vlc%2F1%2E1%2E5 19261930 1193 129160 GET /stream?title=Unknown 200 VLC 19261930 1193 129160 GET
      Doh!  w3clogToAwstats( \$raw , \*STDOUT ); wouldn't print to STDOUT , but it prints to a random file of the form "GLOB(0x99a46c)"
Re: a perl, awstats and SHOUTCast history
by remiah (Hermit) on Mar 28, 2012 at 02:11 UTC
    Hello alexolivan.

    vlc%2F1%2E1%2E5

    This is hexadecimal escaped character. '%' and two hex string indicates encoded character. if you replace this string with

    perl -e 'my $s="vlc%2F1%2E1%2E5"; $s=~ s/%(..)/chr(hex($1))/eg; print +$s;';
    It will show
    vlc/1.1.5
    here, %2F='/', %2E='.'

    So, How about replacing $word[6] with this regex or using module like URI::Escape?

      wow... this comunity really rocks

      Well, I actually was aware about hex format on the shoutcast log, but I simply considered the straightest way to get something useable, so, simply getting "VLC" "Winamp" "iTunes" and so would be revolutionary enough ;-)
      Obviously, if, at the same time we could reformat the log file in such a way to pass to AWSTATS the full string it would be great.
      Neverthless, in my first attempts, awstats refused to accept the unformated 7th string as valid data, so I also though to stick to documentation and give it a simple string, without any kind of symbols/characters that may lead it to ignore it.
      It should be noted that some players like iTunes appears as an incredibly long (and useless) string that among version includes Operating system info, OSX info and so...

      I think it would be interesting to have the 2 shots: one sc_parse.pl script that changes player string to simply "vlc" and another one with "vlc/1.1.5" so both could be tested against awstats.

      Anyways I'm still unable to rebuild the script with the proposed changes here, since I don't see the algorithm behind them, and I'm still investigating how should I put all pieces together in the now working parsing script

      Thank you all guys!!! :-D

        doh!
        some ours on it and I got no results, I dont know how to tell the script to replace the string with any of the proposed expressions while inside a loop. gave up again
        trashing everything, stick on the original script, and awaiting to a little more input from the comunity before investing more time on this.

        thanks but again!