wilsond has asked for the wisdom of the Perl Monks concerning the following question:

EDIT: I rebooted my WinXP box and now everything works fine. It doesn't make any sense to me that it would be magically fixed. I truly dislike this OS. Sorry for the inconvenience of reading/helping with this. I appreciate the help I've received.


Here's some code:

my $size = -s "myfile.pl";

This returns 24576 bytes, but WinXP reports 26755 bytes (difference of 2179 bytes).

This is being used in Amazon::S3. The problem causes the uploaded file to be about 2kb short, which means I'm missing data. It does this with all non-binary files I try it on, but not the same difference in size. It seems to be correct with all the binary files I've tried.

(stat($filename))[7] returns the same (incorrect) number.

Does anyone have any ideas?

EDIT: After some tests, I'm not sure if there's something wrong with the file itself or what, but writing to a file in Perl and then testing the filesize comes out as one would expect. I've yet to figure out what's going on. In the mean time, I'm just slurping the file and doing what I need to in Perl without -s or (stat())7.

Replies are listed 'Best First'.
Re: Filesize (-s) is consistenly reporting too small of size in Win32
by BrowserUk (Patriarch) on Jan 17, 2009 at 10:51 UTC

    The problem is not a discrepancy between the size reported by WinXP (via say dir or explorer) and that returned by -s, they will be exactly the same.

    The problem is the length of the file as loaded into you code if you do not use binmode.

    Without binmode, linefeeds within the file on disk, are translated into carriage return/linefeed pairs (and reverse done on output).

    Hence:

    open I1, '<:raw', 'zz.pl';; $d1 = do{ local $/; <I1> };; print length $d1;; 6712 open I2, '<', 'zz.pl';; $d2 = do{ local $/; <I2> };; print length $d2;; 6571 !wc -l zz.pl;; 141 zz.pl print unpack 'H*', substr $d1, 0, 100;; 23 21 20 70 65 72 6c 20 2d 73 6c 77 0d 0a 75 73 ... print unpack 'H*', substr $d2, 0, 100;; 23 21 20 70 65 72 6c 20 2d 73 6c 77 0a 75 73 ...

    In your example, myfile.pl has 2179 lines.

    The solution is to use binmode (or '<:raw' & '>:raw' etc.).


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      My file has 788 lines, including the final blank line in the file.

Re: Filesize (-s) is consistenly reporting too small of size in Win32
by Anonymous Monk on Jan 17, 2009 at 10:33 UTC
    I think your filesystem may be corrupt, try chkdsk. Also try this test
    Q:\>perl -e"open F, q!>!, q!test! or die $!; warn print F q,x, x 2675 +5 ; warn close F;die -s q!test!" 1 at -e line 1. 1 at -e line 1. 26755 at -e line 1. Q:\>dir test 01/17/2009 02:27 AM 26,755 test 1 File(s) 26,755 bytes

      With this test, both the OS and Perl agree that the file is 26755 bytes.

      Writing "x\r\n" produces 107020 bytes, which the OS and Perl agree on.

        heh, You mean writing "x\r\r\n" (by printing "x\r\n"). 107020 / 26755 = 4.
Re: Filesize (-s) is consistenly reporting too small of size in Win32
by igelkott (Priest) on Jan 17, 2009 at 10:32 UTC

    Just a guess but it might be counting the \r\n as a single line-ending. Is your test file 2179 lines long?

    Seems like you could treat all files as binary and skip the automatic line-ending conversion (see binmode).

      No, not possible. -s/stat doesn't do any counting, it consults the filesystem.
Re: Filesize (-s) is consistenly reporting too small of size in Win32
by cdarke (Prior) on Jan 17, 2009 at 18:18 UTC
    I'm not clear what you mean by "WinXP" when you report the filesize. Is it possible that the file has one or more Additional Data Streams? Notepad can add an ADS, for example. These are not detected in the normal way, but it does depend on how you are measuring it. Perl -s, like most utilities, only reports the size of the primary $DATA stream. You mention dir in one of the posts, that does not show ADS files either. See Win32::StreamNames

      This is very unlikely to be anything to do with streams.

      I don't think there is any way of obtaining a filesize that conflates the sizes of the different streams into a single number. For the most part streams have to treated as if they were entirely different files, whether you are reading them or querying information about them.

      The only APIs that treat them as compound entities are the backup APIs, and they are all but inaccessible to most user code.

      Despite all the OPs invective about hating windows, this is far more likely to be incompatibilities, or latency in catching, between the underlying Linux OS, the VMWare virtualisation and the hosted Windows code, than any inherent problem in Windows itself.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      "WinXP" meaning the "dir" command in Windows.

      Interesting. That's an idea, ain't it. The weirdness is that after I rebooted, the file sizes came out "normal". The file being tested was just a simple text file generated in Komodo on an Ubuntu box and stored via SMB on the Windows XP box. I doubt what happened is as you describe, but that's something else I will test for if it ever happens again. Thanks for that input.


      If you want to do evil, science provides the most powerful weapons to do evil; but equally, if you want to do good, science puts into your hands the most powerful tools to do so.
      - Richard Dawkins