redhotpenguin has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I'm a bit of a klutz with regular expressions, but I've taken a shot at making this script which parses log entries for a Pure Ftpd server I run. I'm looking for ways to improve it's efficiency and readability. The log entries are separated by carraige returns and the elements by single spaces.

#!/usr/bin/perl use strict; use warnings; my $data = 'Jul 30 19:39:06 server pure-ftpd: (user@68.158.32.189) [NOTICE] /home/shows//Genesis/Genesis 1978-06-24 FLAC/genesis1978-06-24-prefm-d1t04.flac downloaded (71074015 bytes, 60.83KB/sec) Jul 30 19:39:23 server pure-ftpd: (user@68.17.66.125) [NOTICE] /home/shows//Simon and Garfunkel/Simon and Garfunkel 2003-10-16/SG2003-10-16-d2t03.shn downloaded (2075533 bytes, 115.25KB/sec)'; my @lines = split "\n", $data; foreach my $line (@lines) { $line =~ m/^(\w+)\s(\d+)\s(\S+)\s\w+\s\D+\s\D(\w+) \x40(\d+\W\d+\W\d+\W\d+)\D\s\S+\s(.*) \s+\w+\s+\S?(\d+)\s\w+\S\s(\d+\W\d+)/; print "Month is: $1\n"; print "Day is: $2\n"; print "Time is: $3\n"; print "User is: $4\n"; print "IP is: $5\n"; print "File is: $6\n"; print "Size is : $7\n"; print "Speed is : $8\n"; } 1;

Replies are listed 'Best First'.
Re: Pure Ftpd log regex
by Zaxo (Archbishop) on Jul 31, 2004 at 06:08 UTC

    If you're testing with this snippet, you should replace the $data assignment with the exact data - exact carriage returns and all - in the DATA "file" by including it after an __END__ or __DATA__ tag, or else accepting long lines in the assignment. The edited version you assign does not correspond to the data format you describe.

    After Compline,
    Zaxo

Re: Pure Ftpd log regex
by BrowserUk (Patriarch) on Jul 31, 2004 at 07:29 UTC

    Try this. I've assumed that you just wrapped the two lines for posting.

    It's simpler and commented, which usually makes it more efficient and readable. Though the former is a matter of trial more than know-how and the latter is in the eye of the beholder.

    my $re = qr| ( \S+ ) \s # Month $1 ( \S+ ) \s # Day $2 ( \S+ ) [^\(]+ # Time $3 \( ( [^@]+ ) # User $4 @ ( [^\)]+ ) # IP $5 \) [^/]+ ( .* )\sdownloaded\s\( # File $6 ( \d+ ) \D+ # Size $7 ( [^\)]+ ) # Speed $8 |x;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Pure Ftpd log regex
by Dietz (Curate) on Jul 31, 2004 at 09:58 UTC
    Here's my try:

    #!/usr/bin/perl -w use strict; my $data = 'Jul 30 19:39:06 server pure-ftpd: (user@68.158.32.189) [NOTICE] /home/shows//Genesis/Genesis 1978-06-24 FLAC/genesis1978-06-24-prefm-d1t04.flac downloaded (71074015 bytes, 60.83KB/sec) Jul 30 19:39:23 server pure-ftpd: (user@68.17.66.125) [NOTICE] /home/shows//Simon and Garfunkel/Simon and Garfunkel 2003-10-16/SG2003-10-16-d2t03.shn downloaded (2075533 bytes, 115.25KB/sec)'; my @result = $data =~ / ^(\w+)\s # month ($1) (\d+)\s # day ($2) ([\d:]+) # time ($3) .*?\((\w+) # user ($4) \@ # @ ([\d.]+?)\)(?:.|\n)*? # ip ($5) \[.+?\].*? # [NOTICE] until matching: (\/(?:.|\n)+?) # file ($6) until matching: (?:\s downloaded) # space and downloaded (?:.|\n)*? # optional dot or \n until matchin +g: \((\d+\s bytes) # size ($7) .+?([\d.]+KB\/sec)\) # speed ($8) /mxg; foreach (@result) { $_ =~ s/\n//; print $_, $/; print "\n" if $_ =~ /\/sec/; # separate entries } __END__ __OUTPUT__ Jul 30 19:39:06 user 68.158.32.189 /home/shows//Genesis/Genesis 1978-06-24 FLAC/genesis1978-06-24-prefm-d +1t04.flac 71074015 bytes 60.83KB/sec Jul 30 19:39:23 user 68.17.66.125 /home/shows//Simon and Garfunkel/Simon and Garfunkel 2003-10-16/SG2003 +-10-16-d2t03.shn 2075533 bytes 115.25KB/sec

Re: Pure Ftpd log regex
by jfroebe (Parson) on Jul 31, 2004 at 06:58 UTC

    Hi

    It's 2am here so I might get this wrong due to lack of sleep and excessive Lineage2 game play.... This would work for your requirement of the elements separated by spaces. The data you provided in your example doesn't match up to it though

    foreach my $my_line (@lines) { my ($month, $day, $time, $user, $ip, $file, $size, $speed) = split +/ /, $my_line; print .... }

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

      Well, one problem with that is that his filenames (and paths) seem to be able to include spaces too. So simply splitting on whitespace won't do the trick. Also, his date/time has whitespace, but not where you need it.

      Dave

        lol! Yup, this is why I shouldn't post anything after midnight or before noon ;-)

        Jason L. Froebe

        No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Pure Ftpd log regex
by Aristotle (Chancellor) on Jul 31, 2004 at 20:01 UTC

    The problematic part in the format is the path, which can contain any characters, unescaped. But we know that the beginning and the end of the string easily parsed, fixed formats, so we can take advantage of that. Something along these lines should give you the desired results:

    my ($month, $day, $time, $host, $process, $user, $level, $path, $bytes +, $speed); ($month, $day, $time, $host, $process, $user, $level, $path) = split / + /, $_, 8; $path =~ s/ downloaded \((\d+) bytes, ([\d.]+)KB/sec\)\z// and ($bytes + = $1, $speed = $2);

    Beware that this will not deal gracefully with variations in the data format; depending on how far out of its way ProFTPd goes to make its logfiles hard to parse, it may require more or less extensive adjustments.

    Makeshifts last the longest.