sort file with your own logic

citi2015 has asked for the wisdom of the Perl Monks concerning the following question:

I want to sort the output from du -sh, from the big size to small size, here is an example

33G     /data/A
37M     /data/B
44G     /data/C
46G     /data/D
68G     /data/E
114G    /data/F
148G    /data/G
169M    /data/H
[download]

her is my test code

my $file_dist="../data/disk_usage.dat.new.bkp";
##open the usage file
open(my $fh ,$file_dist) or die "can not open file";

my @arr = ();
my @sorted = ();

while(my $line=<$fh>){
    chomp $line;
    push(@arr, $line);
}

@sorted = sort sort_arr @arr;
print Dumper \@sorted;

sub sort_arr {
      my ($size_a, $type_a);
      if($a =~ /(\d*?\.*\d*)([M|G|K])/){
         $size_a = $1;
         $type_a = $2;
      }
            
      my ($type_b, $size_b);
      if($b =~ /(\d*?\.*\d*)([M|G|K])/){
         $size_b = $1;
             $type_b = $2;
      }
            
      my $a1 = return_base($type_a) * $size_a;
      my $b1 = return_base($type_b) * $size_b;

      $b1 <=> $a1 ;            
}    

sub return_base{
    my $type = shift;    
    return 1024 if ($type eq "M");
    return 1024*1024 if($type  eq "G");
    return 1 if($type  eq "K");
}

sub trim{
    my $s=shift;
    $s=~s/\s+$//;
    $s=~s/^\s+//;
    return $s;
}
[download]

My question is why in sub sort_arr, I don't need to use defination, like my $a, my $b?

Comment on sort file with your own logic Select or Download Code

Replies are listed 'Best First'.
Re: sort file with your own logic by Discipulus (Canon) on Mar 27, 2015 at 08:46 UTC
Hello, you are using a bareword sort function which is described in What to avoid section of Modern Perl. In the docs for sort is explained that a custom function must return less than 0, 0 or more than 0. The function receives the comparison terms in @_ so you do not need $a and $b. Anyway if you avoid the --human_redable switch of du and let Perl to humanize it, will have less work to do as in the following pipe of commands: `du -b PATH \| sort -n \| perl -ane '++$n and $F[0] /= 1024 until $F[0] +< 1024; printf "%.2f %s %s",$F[0], ( qw[ bytes KB MB GB ] )[ $n ], $F +[1];print qq(\n); $n = 0'` [download] HtH L* Update: see also chapter 15.4. Advanced Sorting in Learning Perl There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re: sort file with your own logic by hdb (Monsignor) on Mar 27, 2015 at 09:11 UTC
The roles of `$a` and `$b` are explained in detail in the documentation with many examples.	[reply] [d/l] [select]
Re: sort file with your own logic by johngg (Canon) on Mar 27, 2015 at 10:26 UTC
Note that there is also a `-h` option for `sort` on some systems. For those that lack it I have been using this code, piping the `du -h` output to it. The `%so` look-up hash gives to sort order for the scale suffix and allows for systems that use no suffix for bytes as well as those that use "B" instead. use strict; use warnings; use Getopt::Std; my %opts = ( a => 0 ); getopts( q{a}, \ %opts ) or die qq{\n}; my %so = ( q{} => 0, B => 0, K => 1, M => 2, G => 3, T => 4, E => 5, P => 6, Z => 7, Y => 8, ); my $rcSortAscending = sub { my( $raA, $raB ) = @_; return $so{ $raA->[ 2 ] } <=> $so{ $raB->[ 2 ] } \|\| $raA->[ 1 ] <=> $raB->[ 1 ] }; print map { my( $line, $path ) = @$_[ 0, 3 ]; if ( $path !~ m{/$} and -d $path and not -l $path ) { $line =~ s{\n}{/\n}; } $line; } sort { $opts{ a } ? $rcSortAscending->( $a, $b ) : $rcSortAscending->( $b, $a ) } map { [ $_, m{^\s([\d.]+)([BKMGTEPZY]?)\s+(.)} ] } <>; [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: sort file with your own logic by Anonymous Monk on Mar 27, 2015 at 19:57 UTC
How will that sort "4321K" vs "2M"? Sorting by normalized value would be my preference. `sub val { m/([\d.])([BKMGTEPZY]?)/ and $1 $scale{$2} } ... sort { val($a) <=> val($b) } ...` [download]	[reply] [d/l]
Re: sort file with your own logic by Laurent_R (Canon) on Mar 27, 2015 at 23:16 UTC
I agree with Anonymous Monk in post Re^2: sort file with your own logic that the best is to start by normalizing the values. This can easily be done in a Schwartzian Transform: `use strict; use warnings; my %scale = (K => 1024, M => 10242, G => 1024 3); print map {$_->[1]} sort {$a->[0] <=> $b->[0]} map {/^(\d+)(\w)/; [$1 * $scale{$2}, $_];} <DATA>; __DATA__ 33G /data/A 37M /data/B 44G /data/C 46G /data/D 68G /data/E 114G /data/F 148G /data/G 169M /data/H 17K /data/I` [download] This produces the following output: `$ perl sort_du.pl 17K /data/I 37M /data/B 169M /data/H 33G /data/A 44G /data/C 46G /data/D 68G /data/E 114G /data/F 148G /data/G` [download] Note that I have used powers-of-two for K, M and G unit prefixes, it is easy enough to change them to powers of 10 if one wants to comply with newer standards (it is even easier, because you can use a regex to add groups of three 0's), but I strongly suspect that the `du` utility still computes data volumes in power-of-two units. Also note that I am not using the Schwartzian Transform for performance reasons, but only because I think it is very efficient "code-wise", just two instructions and 4 lines of actual code to solve the problem. Je suis Charlie.	[reply] [d/l] [select]