Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

sort du -sH output to find that disk hog

by Plankton (Vicar)
on Aug 04, 2004 at 20:47 UTC ( [id://380115]=CUFP: print w/replies, xml ) Need Help??

Hi All, Here's a script I have had to write a few times. I figured I'd post incase I lose it again. Basically it sorts the output of "du -sH". I use the script when I want to find out what directories are hogging up disk space.
#!/usr/bin/perl -w use strict; my %byFile; sub sortBySize { my $a_units = $byFile{$a} || 'x'; my $b_units = $byFile{$b} || 'x'; my $a_value = $byFile{$a}; my $b_value = $byFile{$b}; $a_units =~ s/\d+[\.\d+]+([k|kB|MB|GB])/$1/g; $b_units =~ s/\d+[\.\d+]+([k|kB|MB|GB])/$1/g; $a_value =~ s/$a_units//g; $b_value =~ s/$b_units//g; if ( $a_units eq $b_units ) { my $v = $a_value <=> $b_value; return $v; } elsif ( ($a_units eq "k" or $a_units eq "kB") and ($b_units +eq "k" or $b_units eq "kB") ) { my $v = $a_value <=> $b_value; return $v; } elsif ( $a_units eq "GB" ) { return 1; } elsif ( $b_units eq "GB" ) { return -1; } elsif ( $a_units eq "MB" ) { return 1; } elsif ( $b_units eq "MB" ) { return -1; } elsif ( $a_units eq "kB" or $a_units eq "k" ) { return 1; } elsif ( $b_units eq "kB" or $a_units eq "k" ) { return -1; } else { return 0; } } open ( DUSH, "du -sH *|") or die __FILE__ . "[" . __LINE__ . "] Can't +execute du -SH *:$!\n"; for (<DUSH>) { my ( $size, $name ) = split /\s+/; $byFile{$name} = $size; } for ( reverse sort sortBySize keys %byFile ) { print "$_\t=>$byFile{$_}\n"; }
Please post any improvements. Thanks!

Plankton: 1% Evil, 99% Hot Gas.

Replies are listed 'Best First'.
Re: sort du -sH output to find that disk hog
by Aristotle (Chancellor) on Aug 04, 2004 at 21:57 UTC

    Why jump through so many hoops to infer the relative order of two sizes with a unit each when it would be so much easier to postpone the humanreadablification?

    $ du -s | sort -rn | perl -ple'sub humanreadable { my ($s, $q, $u) = ( +$_[0], 1, 0); $s/=$q, $q*=1024, $u++ while $s > 1024; return sprintf +"%.3g%s", $s, (" ", "K", "M", "G", "T")[$u] } s/^(\d+)\s+/sprintf "%1 +0s ", humanreadable($1)/e;'

    Ok, you'll probably want to stick that code in a script at this point.

    #!/usr/bin/perl -pl use strict; use warnings; sub humanreadable { my ($s, $q, $u) = ($_[0], 1, 0); $s /= $q, $q *= 1024, $u++ while $s > 1024; return sprintf "%.3g%s", $s, (" ", "K", "M", "G", "T")[$u]; } s/^(\d+)\s+/sprintf "%10s ", humanreadable($1)/e;

    This could probably stand a little parametrization à la sort(1) so it can be told which field in its input to humanreadablify.

    On an unrelated note, I'm surprised that there doesn't seem to be a module on CPAN to format byte sizes in human readable form.

    Makeshifts last the longest.

Re: sort du -sH output to find that disk hog
by eserte (Deacon) on Aug 04, 2004 at 22:25 UTC
    I often use du -a | xdu to get a graphical and optionally sorted output of du.
Re: sort du -sH output to find that disk hog
by etcshadow (Priest) on Aug 05, 2004 at 01:05 UTC
    What I do:
    du -sk | sort -rn | head

    Actually, I usually do:

    du -sk > du sort -rn du | head
    since I'm likely to want to look at it over and over, and du -sk on /home takes me about a half-hour (/home is huge and network-mounted).
    ------------ :Wq Not an editor command: Wq
      I use: du -am / | sort -nr | head -500 > /hogdirs.txt &
Re: sort du -sH output to find that disk hog
by Anonymous Monk on Aug 04, 2004 at 21:09 UTC
      How is that an improvement? Did you try and run that say from /home as root? I use du -s so that I don't recurse down the filesystem. I don't really care about the size of
       ./dogfacemonkey/.openoffice/user/config/registry/instance/org/openoffice/Office
      
      I also use -H because I don't what to have to figure out if 13474481621 is 1347 Megabytes of 13 Gigabytes. But that's just me.

      Plankton: 1% Evil, 99% Hot Gas.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://380115]
Approved by etcshadow
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2024-04-18 16:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found