wrkrbeee has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I downloaded a file from the SEC.GOV website, then used Perl's test operator "-s" to determine the size of the file, and received a result of 6,324,458 bytes. The same file is posted on the SEC's website with a size of 6,255,650 bytes. Why is my file size slightly larger? Tags? Other formatting? You can see the SEC's file at: http://www.sec.gov/Archives/edgar/data/6201/000000620109000009/0000006201-09-000009-index.htm Thanks!! Rick

Replies are listed 'Best First'.
Re: File size discrepancy
by Corion (Patriarch) on Nov 25, 2014 at 15:56 UTC

    How did you transfer the file to your local machine?

    What do other local tools to check the file size output?

    dir 000000620109000009/0000006201-09-000009-index.htm

    Most likely, the file has been saved with Windows newlines, that is \r\n, whereas the SEC posts the file size using "Unix newlines", that is, \n.

      Thanks for the insight! I used a simple FTP to download the file. The newlines idea makes sense, any thoughts for downloading files with Unix newlines rather than Windows newlines? I am grateful for your help!! Rick
        FTP has 2 modes: ascii and bin. The bin mode transfers the file exactly as it is, the ascii (text) mode makes end-of-line conversions between Unix and Dos. Having said that, if you transfered it in ascii mode and want to get it back to Unix format, you don't need to download it again, you can just use this command:
        perl -pi.bak -e 's/\r//g' file
        to convert it back to Unix format. Or your system may very well have a dos2unix utility doing just the same. On the AIX system where I do most of my work, dos2unix did not not exist, so that I created an alias for it using the Perl one-liner above. On most Linux systems, however, I would think that dos2unix should exist.
Re: File size discrepancy
by Laurent_R (Canon) on Nov 25, 2014 at 19:55 UTC
    This is not really a Perl question, but since I gave you a Perl solution to solve your problem, I thought I could still approve your question on this forum.
      I am grateful for your consideration. The responses are tremendously helpful! :-)