du and -s

rendler has asked for the wisdom of the Perl Monks concerning the following question:

Someone asked a question yesterday on how he could go about replacing du(1) with perl. The answer I gave him was perl -MFile::Find -le 'find( sub { $size += -s }, "."); print $size'. He went away happy, but then half an hour later came back and said the the result returned by that was not the same as that of du(1). So after a little tweaking and testing I came to the very clear conclusion that I really didn't understand why it was doing what it was.

I came on here and asked in the chatterbox, and got some very good responses including one from tye but the problem was it all went over the top of my head and I basically didn't understand any of what he was explaining :(

Along with the explanation tye came up with this little bit of code perl -MFile::Find -le 'find( sub { $size += int( (511 + -s $_)/512 ) }, "."); print 512*$size' which is meant to produce results closer to du(1).

When I also went to test it on files with du -b * the filesizes returned by it was also not the same as that returned by -s or from ls -l.

So if anyone could please explain it into newbie terms I would be greatly appreciative. Thanks.

Comment on du and -s Select or Download Code

Replies are listed 'Best First'.
Re: du and -s by IlyaM (Parson) on Jul 24, 2002 at 11:30 UTC
Most filesystems store files as set of blocks. So even if your file has only one byte phisically it takes one block on the disk. To reflect this fact some programs report file sizes in blocks. For example: `# creating two bytes long file bash-2.05b$ echo 1 > test # ls -l reports logical size of file bash-2.05b$ ls -l test -rw-r--r-- 1 ilya ilya 2 Jul 24 15:25 test # du -b reports physical size of file # (4096 - size of block on my filesystem) bash-2.05b$ du -b test 4096 test` [download] Perl's `-s` reports logical file size like `ls -l` This is why its results doesn't match `du` results. Update: BTW size of blocks depends on filesystem so in general tye's solution is incorrect. There are exist module Filesys::Statvfs which allows to query filesystem for size of blocks. -- Ilya Martynov (http://martynov.org/)	[reply] [d/l] [select]
Re: du and -s by Abigail-II (Bishop) on Jul 24, 2002 at 11:36 UTC
Note that `du` is much smarter than a simple `File::Find` routine. `du` knows it has seen a file before if you have multiple links to a file. The `File::Find` solution would include the size multiple times. Abigail	[reply] [d/l] [select]
Re:x2 du and -s by grinder (Bishop) on Jul 24, 2002 at 13:14 UTC
Well that would be pretty easy to fix, wouldn't it? You just have to stat each file and cache the inode, to determine whether you have seen the file or not already before accumulating. The only question, depending on the number of file and the number of inodes of the filesystem, is what would use less memory, a hash or a bitvec? print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'	[reply]
Re: du and -s by jmcnamara (Monsignor) on Jul 24, 2002 at 11:46 UTC
There are a couple of things to look out for. The first is that the files occupy a whole numbers of blocks and the block size is probably 4096 bytes. Therefore, a file that `ls` shows as having 1 byte will be seen by `du` as having 4096 bytes. Also, `du` counts the number of blocks allocated to `.` and `..` and any hidden file. Therefore, it would be better to compare `du -b` with `ls -al`. However, this is based on the file-systems that I'm used to. YMMV. -- John.	[reply]
Re: du and -s by rendler (Pilgrim) on Jul 25, 2002 at 08:32 UTC
Thanks for the great explanation guys, much appreciated :) `(PS. On here (Linux/ext3) the blocksize is 1024).`	[reply] [d/l]
Re: Re: du and -s by IlyaM (Parson) on Jul 25, 2002 at 10:22 UTC
PS. On here (Linux/ext3) the blocksize is 1024 I think it depends on options used during building of partition with mkfs. Ext2/3 should also support 2048 and 4096 block sizes. -- Ilya Martynov (http://martynov.org/)	[reply]