citi2015 has asked for the wisdom of the Perl Monks concerning the following question:

I want to sort the output from du -sh, from the big size to small size, here is an example

33G /data/A 37M /data/B 44G /data/C 46G /data/D 68G /data/E 114G /data/F 148G /data/G 169M /data/H

her is my test code

my $file_dist="../data/disk_usage.dat.new.bkp"; ##open the usage file open(my $fh ,$file_dist) or die "can not open file"; my @arr = (); my @sorted = (); while(my $line=<$fh>){ chomp $line; push(@arr, $line); } @sorted = sort sort_arr @arr; print Dumper \@sorted; sub sort_arr { my ($size_a, $type_a); if($a =~ /(\d*?\.*\d*)([M|G|K])/){ $size_a = $1; $type_a = $2; } my ($type_b, $size_b); if($b =~ /(\d*?\.*\d*)([M|G|K])/){ $size_b = $1; $type_b = $2; } my $a1 = return_base($type_a) * $size_a; my $b1 = return_base($type_b) * $size_b; $b1 <=> $a1 ; } sub return_base{ my $type = shift; return 1024 if ($type eq "M"); return 1024*1024 if($type eq "G"); return 1 if($type eq "K"); } sub trim{ my $s=shift; $s=~s/\s+$//; $s=~s/^\s+//; return $s; }

My question is why in sub sort_arr, I don't need to use defination, like my $a, my $b?

Replies are listed 'Best First'.
Re: sort file with your own logic
by Discipulus (Canon) on Mar 27, 2015 at 08:46 UTC
    Hello, you are using a bareword sort function which is described in What to avoid section of Modern Perl. In the docs for sort is explained that a custom function must return less than 0, 0 or more than 0. The function receives the comparison terms in @_ so you do not need $a and $b.

    Anyway if you avoid the --human_redable switch of du and let Perl to humanize it, will have less work to do as in the following pipe of commands:
    du -b PATH | sort -n | perl -ane '++$n and $F[0] /= 1024 until $F[0] +< 1024; printf "%.2f %s %s",$F[0], ( qw[ bytes KB MB GB ] )[ $n ], $F +[1];print qq(\n); $n = 0'

    HtH
    L*
    Update: see also chapter 15.4. Advanced Sorting in Learning Perl
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: sort file with your own logic
by hdb (Monsignor) on Mar 27, 2015 at 09:11 UTC

    The roles of $a and $b are explained in detail in the documentation with many examples.

Re: sort file with your own logic
by johngg (Canon) on Mar 27, 2015 at 10:26 UTC

    Note that there is also a -h option for sort on some systems. For those that lack it I have been using this code, piping the du -h output to it. The %so look-up hash gives to sort order for the scale suffix and allows for systems that use no suffix for bytes as well as those that use "B" instead.

    use strict; use warnings; use Getopt::Std; my %opts = ( a => 0 ); getopts( q{a}, \ %opts ) or die qq{\n}; my %so = ( q{} => 0, B => 0, K => 1, M => 2, G => 3, T => 4, E => 5, P => 6, Z => 7, Y => 8, ); my $rcSortAscending = sub { my( $raA, $raB ) = @_; return $so{ $raA->[ 2 ] } <=> $so{ $raB->[ 2 ] } || $raA->[ 1 ] <=> $raB->[ 1 ] }; print map { my( $line, $path ) = @$_[ 0, 3 ]; if ( $path !~ m{/$} and -d $path and not -l $path ) { $line =~ s{\n}{/\n}; } $line; } sort { $opts{ a } ? $rcSortAscending->( $a, $b ) : $rcSortAscending->( $b, $a ) } map { [ $_, m{^\s*([\d.]+)([BKMGTEPZY]?)\s+(.*)} ] } <>;

    I hope this is helpful.

    Cheers,

    JohnGG

      How will that sort "4321K" vs "2M"? Sorting by normalized value would be my preference.

      sub val { m/([\d.]*)([BKMGTEPZY]?)/ and $1 * $scale{$2} } ... sort { val($a) <=> val($b) } ...

Re: sort file with your own logic
by Laurent_R (Canon) on Mar 27, 2015 at 23:16 UTC
    I agree with Anonymous Monk in post Re^2: sort file with your own logic that the best is to start by normalizing the values. This can easily be done in a Schwartzian Transform:
    use strict; use warnings; my %scale = (K => 1024, M => 1024**2, G => 1024 ** 3); print map {$_->[1]} sort {$a->[0] <=> $b->[0]} map {/^(\d+)(\w)/; [$1 * $scale{$2}, $_];} <DATA>; __DATA__ 33G /data/A 37M /data/B 44G /data/C 46G /data/D 68G /data/E 114G /data/F 148G /data/G 169M /data/H 17K /data/I
    This produces the following output:
    $ perl sort_du.pl 17K /data/I 37M /data/B 169M /data/H 33G /data/A 44G /data/C 46G /data/D 68G /data/E 114G /data/F 148G /data/G
    Note that I have used powers-of-two for K, M and G unit prefixes, it is easy enough to change them to powers of 10 if one wants to comply with newer standards (it is even easier, because you can use a regex to add groups of three 0's), but I strongly suspect that the du utility still computes data volumes in power-of-two units.

    Also note that I am not using the Schwartzian Transform for performance reasons, but only because I think it is very efficient "code-wise", just two instructions and 4 lines of actual code to solve the problem.

    Je suis Charlie.