rendler has asked for the wisdom of the Perl Monks concerning the following question:

Someone asked a question yesterday on how he could go about replacing du(1) with perl. The answer I gave him was perl -MFile::Find -le 'find( sub { $size += -s }, "."); print $size'. He went away happy, but then half an hour later came back and said the the result returned by that was not the same as that of du(1). So after a little tweaking and testing I came to the very clear conclusion that I really didn't understand why it was doing what it was.

I came on here and asked in the chatterbox, and got some very good responses including one from tye but the problem was it all went over the top of my head and I basically didn't understand any of what he was explaining :(

Along with the explanation tye came up with this little bit of code perl -MFile::Find -le 'find( sub { $size += int( (511 + -s $_)/512 ) }, "."); print 512*$size' which is meant to produce results closer to du(1).

When I also went to test it on files with du -b * the filesizes returned by it was also not the same as that returned by -s or from ls -l.

So if anyone could please explain it into newbie terms I would be greatly appreciative. Thanks.

Replies are listed 'Best First'.
Re: du and -s
by IlyaM (Parson) on Jul 24, 2002 at 11:30 UTC
    Most filesystems store files as set of blocks. So even if your file has only one byte phisically it takes one block on the disk.

    To reflect this fact some programs report file sizes in blocks. For example:

    # creating two bytes long file bash-2.05b$ echo 1 > test # ls -l reports logical size of file bash-2.05b$ ls -l test -rw-r--r-- 1 ilya ilya 2 Jul 24 15:25 test # du -b reports physical size of file # (4096 - size of block on my filesystem) bash-2.05b$ du -b test 4096 test

    Perl's -s reports logical file size like ls -l This is why its results doesn't match du results.

    Update: BTW size of blocks depends on filesystem so in general tye's solution is incorrect. There are exist module Filesys::Statvfs which allows to query filesystem for size of blocks.

    --
    Ilya Martynov (http://martynov.org/)

Re: du and -s
by Abigail-II (Bishop) on Jul 24, 2002 at 11:36 UTC
    Note that du is much smarter than a simple File::Find routine. du knows it has seen a file before if you have multiple links to a file. The File::Find solution would include the size multiple times.

    Abigail

      Well that would be pretty easy to fix, wouldn't it? You just have to stat each file and cache the inode, to determine whether you have seen the file or not already before accumulating.

      The only question, depending on the number of file and the number of inodes of the filesystem, is what would use less memory, a hash or a bitvec?


      print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
Re: du and -s
by jmcnamara (Monsignor) on Jul 24, 2002 at 11:46 UTC

    There are a couple of things to look out for. The first is that the files occupy a whole numbers of blocks and the block size is probably 4096 bytes. Therefore, a file that ls shows as having 1 byte will be seen by du as having 4096 bytes.

    Also, du counts the number of blocks allocated to . and .. and any hidden file. Therefore, it would be better to compare du -b with ls -al.

    However, this is based on the file-systems that I'm used to. YMMV.

    --
    John.

Re: du and -s
by rendler (Pilgrim) on Jul 25, 2002 at 08:32 UTC
    Thanks for the great explanation guys, much appreciated :)

    (PS. On here (Linux/ext3) the blocksize is 1024).
      PS. On here (Linux/ext3) the blocksize is 1024

      I think it depends on options used during building of partition with mkfs. Ext2/3 should also support 2048 and 4096 block sizes.

      --
      Ilya Martynov (http://martynov.org/)