in reply to Total size of each Directory

Whatever you do, be careful about interpreting the results. In general, a file takes more space on disk than the file size itself would indicate. Files are stored on disk in clusters, which is some whole multiple of 512 or 1024 bytes (= one sector). The size of a cluster is related to the disk size, in general, the larger the disk, the larger the cluster size. A value of 16 to 32k is quite common.

Thus, even if your file is only 1 byte long, it will take up a whole cluster. In a similar way, if your file is just 1 byte too big to fit into one cluster, it will take up two.

In pop speak: you should round up the file size for each to the next multiple of the cluster size, and only add up those results. Only that result will actually tell you how much space is taken up by a directory.

Replies are listed 'Best First'.
Re: Re: Total size of each Directory
by liz (Monsignor) on Jul 28, 2003 at 14:31 UTC
    ...Only that result will actually tell you how much space is taken up by a directory...

    On the other hand, if you're interested in finding out how much space would be used by the directory e.g. when written to a tar-file, then simply adding the numbers will give you a rough idea of the final size of the tar-file.

    And of course, if you're using a filesystem like ReiserFS, who knows how much space you are actually using in a directory...

    Liz

Re: Re: Total size of each Directory
by revdiablo (Prior) on Jul 28, 2003 at 21:11 UTC

    As other monks have mentioned, there is always the venerable unix du utility. This tells how much actual disk space is being used by the files, as opposed to the combined size of the files. Pretty handy. (Though, also noted by another monk, sometimes it's useful to simply know the total combined size of some files. e.g. for burning to a cd or creating a tar.)

      Both CD-ROMs and tar files use sector blocks.

      For tar files, the block size is 512 bytes.

      The block size on a CD is a somewhat over 2300 bytes. For data, the surplus is used for error detection and correction, so the block size for actual data is 2048 bytes. For details, take a look here or here.

      Thus, in summary, for both CD-ROM and for tar files, to get the actual size on disk disk, you need to apply the same calculations, but with different sector sizes than those that apply to the harddisk.

        I see. So the reason a sum-of-all-file-sizes seems more accurate for cd filesystems and tar files is not that they don't use blocks, just that they use much smaller blocks than most hard disk filesystems? I knew the numbers never worked out perfectly, but never bothered to investigate why. Thanks for this info. bart++