nysus has asked for the wisdom of the Perl Monks concerning the following question:

Monks, let me start with the code:
sub sort_by_ip { { no warnings; @sorted_by_ip_date_time = sort { my @a = $a->[0] =~ split/./; #split IP address on periods my @b = $b->[0] =~ split/./; #split IP address on periods my @c = $a->[3] =~ split/\//; #split date on forward slashes my @d = $b->[3] =~ split/\//; #format is '27/Jun/2001' my @e = $a->[4] =~ split/:/; #split time on colons my @f = $b->[4] =~ split/:/; #format is '02:23:15' if ($c[1] == 'Jun') { ##convert the months into numbers (on +ly two months in log)## $c[1] = 6; #6 if 'Jun' } else { $c[1] = 7; } #7 if 'Jul' if ($d[1] == 'Jun') { $d[1] = 6; #6 if 'Jun' } else { $d[1] = 7; } #7 if 'Jul' $a[0] <=> $b[0] || #sort by 1st part of IP else if same... $a[1] <=> $b[1] || #sort by 2nd part of IP else if same... $a[2] <=> $b[2] || #sort by 3rd part of IP else if same... $a[3] <=> $b[3] || #sort by 4th part of IP else if same... $c[2] <=> $d[2] || #sort by year else if same... $c[1] <=> $d[1] || #sort by month, 6 (June) or 7 (July) only else + if same... $c[0] <=> $d[0] || #sort by day else if same... $e[0] <=> $f[0] || #sort by hour else if same... $e[1] <=> $f[1] || #sort by minute else if same... $e[2] <=> $f[2] #sort by second (phew!) } @logentry; } }
I'm trying to sort an array (@logentry) of an anonymous array on 3 elements of the anonymous array in this order: IP (element 0), Date (element 3), and Time (element 4). Hopefully my comments make everything else abundantly clear.

I don't get any errors and it runs quickly. However, when I print out the results, the @sorted_by_ip_date_time array is not sorted in any recognizeable order. It looks almost completely random. It's doing something because the order is different than the original log. I've spent an hour and a half on this. I've never attempted such a complex sort; please tell me the error of my ways.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop";
$nysus = $PM . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Hella sort
by Abigail (Deacon) on Jul 03, 2001 at 04:55 UTC
    You use
    $a -> [0] =~ split /./;
    which is wrong in various ways. First of all, =~ binds variables to regular expressions, substitutions or transliterations. split however splits its second parameter, or $_ if that parameter is not present.

    Second, split /./ doesn't split on periods. It splits on anything that isn't a newline. You want split /\./ here, that splits on a period.

    -- Abigail

(ar0n) Re: Hella sort
by ar0n (Priest) on Jul 03, 2001 at 03:29 UTC

    You have warnings off; no wonder it runs without any messages :)

    You're doing way too much work. If you simply make the ip address one number, it'll be a lot easier to sort on. I do that below by zero padding and joing the four numbers. You don't need to do anything with the time, since it's zero-padded, so you can sort it easily. For the date you can do the same: left-pad the numbers with zeros.

    sub by_ip { my (%months, $i); $months{$_} = $i++ for qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct +Nov Dec); my $a_ip = join "", map { sprintf("%.3d", $_) } split /[.]/, $a->[ +0]; my $b_ip = join "", map { sprintf("%.3d", $_) } split /[.]/, $b->[ +0]; my @a_date = split '/', $a->[3]; my @b_date = split '/', $b->[3]; my $a_date = sprintf "%.2d-%.2d-%.4d", $a_date[0], $months{$a_date +[1]}, $a_date[2]; my $b_date = sprintf "%.2d-%.2d-%.4d", $b_date[0], $months{$b_date +[1]}, $b_date[2]; ($a_ip <=> $b_ip) || ($a_date <=> $b_date) || ($a->[4] <=> $b->[4] +); } sort by_ip @logentry;

    I haven't tested it, since I do not know what your logfiles look like (post!).

    Anyway, hope this helps.


    ar0n ]

Re: Hella sort
by Aighearach (Initiate) on Jul 03, 2001 at 03:23 UTC
    Your split is using a regex like construct instead of split PATTERN, EXPR and the result (I think) is that you're splitting on $_, which isn't being set in the sub and who knows what it holds. Try:
    my @a = split /./, $a->[0]; #split IP address on periods ...
    You didn't give any sample data so I can't say for certain if this is the problem, but it might be it.
    --
    Snazzy tagline here
Re: Hella sort
by I0 (Priest) on Jul 04, 2001 at 09:14 UTC
    sub ST(&@){ my $metric=shift; map {$_->[0]} sort {$a->[1] cmp $b->[1]} map {[$_,&{$metric}]} @_ } @sorted_by_ip_date_time = ST{ my @d=split'/',$_->[3]; $d[1] = ${{'Jun'=>6,'Jul'=>7}}{$d[1]}; sprintf"%4d"x10, (split/\./,$_->[0]), @d[2,1,0], (split/:/,$_->[4]) }@logentry;