in reply to Re^6: Extracting Data from a second line
in thread Extracting Data from a second line

Must be a windows vs. unix issue then. The numbers work out on my system. Also, in your previous code, the numbers were all 560 on my system.
  • Comment on Re^7: Extracting Data from a second line

Replies are listed 'Best First'.
Re^8: Extracting Data from a second line
by ikegami (Patriarch) on Feb 16, 2006 at 19:41 UTC

    Of course it is. In text mode in Windows, \x0D\x0A is two bytes, but the single character.

    But it's not just a Windows issue; it's also an encoding issue. If you set the encoding of DATA or of your FILE to be a multi-byte encoding such as UTF-8, then you'll have a discrepency between the number of characters and the number of bytes (even on unix) if you have any multi-byte characters in your DATA or in your FILE.

    The whole point is that there is no way of knowing the number of characters in a file, since there's no relation between the size of the file and the number of characters in it.

    It is therefore useless to "optimize" the value passed to read's third parameter, a number of characters (not bytes outside of :raw mode). Any value at least as large as the number of bytes in the file will do nicely.

      So, what is the answer then? When reading DATA, do we continue to manually count the characters and use that? Or do we use a number that we know will be sufficient (1000000 or something similar) but is guaranteed too big? These are more rhetorical questions than anything. I would just like a way to find the size of DATA, programmatically.

        If you wish to know the size of DATA, in characters, use the following. It will work on with all line terminators and will all encodings.

        $size_of_DATA = read(DATA, $buf='', -s DATA);

        It's safe to use -s FILE because the number of characters in a file should never be bigger than the number of bytes in a file.